reference period parsing and extraction
Code details noticed while working on flexible doy description
Reference period defined as 19810101000000-20101231235959.
NOTE: It would be good if a definition like 1981-2010 would default to the definition above, i.e. from the start of the first year to the end of the last year.
doy_statistics, _extract_reference_period(self, cube):
idx_0 = cftime.date2index(
self.base.start, times, calendar=time_units.calendar, select="after"
)
idx_n = cftime.date2index(
self.base.end, times, calendar=time_units.calendar, select="before"
)
This leads to an extracted time series that covers 1981-01-02 to 2010-12-30
Suggestion: Change to:
from datetime import datetime,timedelta
dt_small=timedelta(microseconds=1)
idx_0 = cftime.date2index(
self.base.start-dt_small, times, calendar=time_units.calendar, select="after"
)
idx_n = cftime.date2index(
self.base.end+dt_small, times, calendar=time_units.calendar, select="before"
)
doy_statistics, _build_indices(self, time):
np_indices = np.zeros(index_shape, int)
The initialization by np.zeros hid that the too short time selection as described above leads to non-defined np_indices for the first day in the first year and the last day for the last year. We can get index vectors like that one for doy=364: [ 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 714 715 716 717 718 ... 10219 10220 10221 10222 10223 10224 10225 10226 10227 10228 10229 10230 10231 10232 10233 10234 10569 10570 10571 10572 10573 10574 10575 10576 10577 10578 10579 10580 10581 10582 10583 10584 10585 10586 10587 10588 10589 10590 10591 10592 10593 10594 10595 10596 10597 10598 10599 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Note that zero does not occur except at the end. Zero is a valid index and time steps corresponding to index 0 will be selected in the calibration routine. This would not matter if zero was amongst the selected indices, as only unique numbers are allowed and hence, ties will be removed. In the above example, the zero-padding at the end will constitute a new unique number in the calibrate process, thereby potentially altering the result as, for e.g., a winter day might be selected into a summertime moving window.
doy_statistics, DOYStatistics.init:
self.last_year = self.base.end.year - 1
Why is it -1? When I defined the reference period as above, I ran into problems that np_indices is to small. Probably, that is related to the possible bugs above, i.e. if reference period is defined as 1981-2010, the actual time series extracted is 1981-01-02 to 2009-12-31, which in turn means that the last year has to be redefined by -1.