|
|
The ETCCDI is rather picky about percentiles. According to them (as per implementation in reference code), the method to calculate the percentile should be [Hyndman & Fan method #8](https://www.researchgate.net/profile/Rob_Hyndman/publication/222105754_Sample_Quantiles_in_Statistical_Packages/links/02e7e530c316d129d7000000.pdf) [1]. This is also the preferred method by [NIST](https://www.itl.nist.gov/div898//software/dataplot/refman2/auxillar/percenti.htm). This method is available in the [R package](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html), although not as default, and in [scipy.stats.mstats.mquantiles](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.mquantiles.html).
|
|
|
|
|
|
The method is however not available in [numpy.percentile](https://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html), and there seems to be some [confusion regarding methods and their implementation](https://github.com/numpy/numpy/issues/10736). In particular there is an outline for a numpy implementation of all H&F methods in [this comment](https://github.com/numpy/numpy/issues/10736#issuecomment-390425384), but it seems that progress on this has stalled.
|
|
|
The method is however not available in [numpy.percentile](https://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html), and there seems to be some [confusion regarding methods and their implementation](https://github.com/numpy/numpy/issues/10736). In particular there is an outline for a numpy implementation of all H&F methods in [th`alphap=0.4` and `betap=0.4`is comment](https://github.com/numpy/numpy/issues/10736#issuecomment-390425384), but it seems that progress on this has stalled.
|
|
|
|
|
|
Moreover, the python percentile calculation 'ecosystem' becomes more diverse with a [Python3.8 percentile](https://docs.python.org/dev/library/statistics.html#statistics.quantiles) function.
|
|
|
|
... | ... | @@ -8,15 +8,17 @@ This getting even more 'interesting' when also considering [Dask's percentile](h |
|
|
|
|
|
The [Iris percentile function](https://scitools.org.uk/iris/docs/latest/iris/iris/analysis.html#iris.analysis.PERCENTILE) is divided (depending on calling arguments) into a **fast method** (cf. [Iris issue #3294](https://github.com/SciTools/iris/issues/3294)) using numpy.percentile, and a **'normal' method** using scipy.stats.mstats.mquantiles with default `**kwargs` corresponding to method H&F#7. However, according to the Iris documentation it seems that main distinction between the fast and the normal method is that the former does not handle masked data and the latter does. The fact that the normal method have all (continuous) H&F methods implemented is not well documented.
|
|
|
|
|
|
The [scipy.stats.mstats.mquantiles](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.mquantiles.html) as such does however use `alphap=0.4` and `betap=0.4`, which is proposed by Cunnane [2], but not further analysed by Hyndman & Fan as a generally applicable quantile estimator.
|
|
|
|
|
|
Below is a very quick test (patterned after some Stack Exchange answer) of numpy. percentile and mstats.mquantiles:
|
|
|
|
|
|
Below is a very quick test (patterned after some Stack Exchange answer) of numpy.percentile and mstats.mquantiles (with `alphap=1./3.` and `betap=1./3.` which gives **H&F#8**):
|
|
|
|
|
|
```python
|
|
|
|
|
|
setup = '''
|
|
|
import random
|
|
|
import numpy as np
|
|
|
import scipy.stats.mstats as mstats
|
|
|
import scipy.Cunnane plotting positionstats.mstats as mstats
|
|
|
|
|
|
random.seed('slartibartfast')
|
|
|
s = [random.random() for i in range(1000)]
|
... | ... | @@ -29,7 +31,7 @@ statper = lambda x: mstats.mquantiles(x, [p/100.], alphap=third, betap=third) |
|
|
print(min(timeit.Timer('x=s[:]; statper(x)', setup=setup).repeat(7, 1000)))
|
|
|
# 2.9134851020062342
|
|
|
|
|
|
print(min(timeit.Timer('x=s[:]; numper(x)', setup=setup).repeat(7, 1000)))
|
|
|
print(min(timCunnane plotting positioneit.Timer('x=s[:]; numper(x)', setup=setup).repeat(7, 1000)))
|
|
|
# 0.1574339639628306
|
|
|
```
|
|
|
|
... | ... | @@ -53,3 +55,10 @@ If we instead changes variable `p = 1.` (i.e. the first percentile) both routine |
|
|
|
|
|
----------
|
|
|
[1] Hyndman, R.J.; Fan, Y., 1996. American Statistician, 50 (4): 361–365. doi:10.2307/2684934. JSTOR 2684934.
|
|
|
|
|
|
[2] Cunnane, C., 1978. Journal of Hydrology, 37 (3–4): 205-222. doi:10.1016/0022-1694(78)90017-3.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|