... | ... | @@ -9,4 +9,47 @@ This getting even more 'interesting' when also considering [Dask's percentile](h |
|
|
The [Iris percentile function](https://scitools.org.uk/iris/docs/latest/iris/iris/analysis.html#iris.analysis.PERCENTILE) is divided (depending on calling arguments) into a **fast method** (cf. [Iris issue #3294](https://github.com/SciTools/iris/issues/3294)) using numpy.percentile, and a **'normal' method** using scipy.stats.mstats.mquantiles with default `**kwargs` corresponding to method H&F#7. However, according to the Iris documentation it seems that main distinction between the fast and the normal method is that the former does not handle masked data and the latter does. The fact that the normal method have all (continuous) H&F methods implemented is not well documented.
|
|
|
|
|
|
|
|
|
Below is a very quick test (patterned after some Stack Exchange answer) of numpy. percentile and mstats.mquantiles:
|
|
|
|
|
|
```python
|
|
|
|
|
|
setup = '''
|
|
|
import random
|
|
|
import numpy as np
|
|
|
import scipy.stats.mstats as mstats
|
|
|
|
|
|
random.seed('slartibartfast')
|
|
|
s = [random.random() for i in range(1000)]
|
|
|
p = 35
|
|
|
third = 1./3. # this particular value gives H&F#8
|
|
|
numper = lambda x: np.percentile(x, p)
|
|
|
statper = lambda x: mstats.mquantiles(x, [p/100.], alphap=third, betap=third)
|
|
|
'''
|
|
|
|
|
|
print(min(timeit.Timer('x=s[:]; statper(x)', setup=setup).repeat(7, 1000)))
|
|
|
# 2.9134851020062342
|
|
|
|
|
|
print(min(timeit.Timer('x=s[:]; numper(x)', setup=setup).repeat(7, 1000)))
|
|
|
# 0.1574339639628306
|
|
|
```
|
|
|
|
|
|
The resulting percentile differs slightly:
|
|
|
|
|
|
* mstats.mquantiles yields 0.351 *63862*
|
|
|
* numpy.percentiles yields 0.351 *8381167322538*
|
|
|
|
|
|
If however variable `third = 1.` in the setup (corresponding to Hyndman & Fan # 7) the result is equal to numerical precision
|
|
|
|
|
|
* mstats.mquantiles yields 0.3518381 2
|
|
|
* numpy.percentiles yields 0.3518381 *167322538*
|
|
|
|
|
|
Conclusion: numpy.percentile seems equivalent to mstats.mquantile with H&F#7
|
|
|
|
|
|
If we instead changes variable `p = 1.` (i.e. the first percentile) both routines produce the same result for H&F#7, but the difference between methods becomes distinct if we use `third = 1./3.` (H&F#8):
|
|
|
|
|
|
* mstats.mquantiles yields 0.01 *455989*
|
|
|
* numpy.percentiles yields 0.01 *5479419937626816*
|
|
|
|
|
|
|
|
|
----------
|
|
|
[1] Hyndman, R.J.; Fan, Y., 1996. American Statistician, 50 (4): 361–365. doi:10.2307/2684934. JSTOR 2684934. |