...  @@ 9,4 +9,47 @@ This getting even more 'interesting' when also considering [Dask's percentile](h 
...  @@ 9,4 +9,47 @@ This getting even more 'interesting' when also considering [Dask's percentile](h 

The [Iris percentile function](https://scitools.org.uk/iris/docs/latest/iris/iris/analysis.html#iris.analysis.PERCENTILE) is divided (depending on calling arguments) into a **fast method** (cf. [Iris issue #3294](https://github.com/SciTools/iris/issues/3294)) using numpy.percentile, and a **'normal' method** using scipy.stats.mstats.mquantiles with default `**kwargs` corresponding to method H&F#7. However, according to the Iris documentation it seems that main distinction between the fast and the normal method is that the former does not handle masked data and the latter does. The fact that the normal method have all (continuous) H&F methods implemented is not well documented.


The [Iris percentile function](https://scitools.org.uk/iris/docs/latest/iris/iris/analysis.html#iris.analysis.PERCENTILE) is divided (depending on calling arguments) into a **fast method** (cf. [Iris issue #3294](https://github.com/SciTools/iris/issues/3294)) using numpy.percentile, and a **'normal' method** using scipy.stats.mstats.mquantiles with default `**kwargs` corresponding to method H&F#7. However, according to the Iris documentation it seems that main distinction between the fast and the normal method is that the former does not handle masked data and the latter does. The fact that the normal method have all (continuous) H&F methods implemented is not well documented.












Below is a very quick test (patterned after some Stack Exchange answer) of numpy. percentile and mstats.mquantiles:








```python








setup = '''




import random




import numpy as np




import scipy.stats.mstats as mstats








random.seed('slartibartfast')




s = [random.random() for i in range(1000)]




p = 35




third = 1./3. # this particular value gives H&F#8




numper = lambda x: np.percentile(x, p)




statper = lambda x: mstats.mquantiles(x, [p/100.], alphap=third, betap=third)




'''








print(min(timeit.Timer('x=s[:]; statper(x)', setup=setup).repeat(7, 1000)))




# 2.9134851020062342








print(min(timeit.Timer('x=s[:]; numper(x)', setup=setup).repeat(7, 1000)))




# 0.1574339639628306




```








The resulting percentile differs slightly:








* mstats.mquantiles yields 0.351 *63862*




* numpy.percentiles yields 0.351 *8381167322538*








If however variable `third = 1.` in the setup (corresponding to Hyndman & Fan # 7) the result is equal to numerical precision








* mstats.mquantiles yields 0.3518381 2




* numpy.percentiles yields 0.3518381 *167322538*








Conclusion: numpy.percentile seems equivalent to mstats.mquantile with H&F#7








If we instead changes variable `p = 1.` (i.e. the first percentile) both routines produce the same result for H&F#7, but the difference between methods becomes distinct if we use `third = 1./3.` (H&F#8):








* mstats.mquantiles yields 0.01 *455989*




* numpy.percentiles yields 0.01 *5479419937626816*















[1] Hyndman, R.J.; Fan, Y., 1996. American Statistician, 50 (4): 361–365. doi:10.2307/2684934. JSTOR 2684934. 

[1] Hyndman, R.J.; Fan, Y., 1996. American Statistician, 50 (4): 361–365. doi:10.2307/2684934. JSTOR 2684934. 