The ETCCDI is rather picky about percentiles. According to them (as per implementation in reference code), the method to calculate the percentile should be Hyndman & Fan method #8 . This is also the preferred method by NIST. This method is available in the R package, although not as default, and in scipy.stats.mstats.mquantiles.
The method is however not available in numpy.percentile, and there seems to be some confusion regarding methods and their implementation. In particular there is an outline for a numpy implementation of all H&F methods in th
betap=0.4is comment, but it seems that progress on this has stalled.
Moreover, the python percentile calculation 'ecosystem' becomes more diverse with a Python3.8 percentile function.
The Iris percentile function is divided (depending on calling arguments) into a fast method (cf. Iris issue #3294) using numpy.percentile, and a 'normal' method using scipy.stats.mstats.mquantiles with default
**kwargs corresponding to method H&F#7. However, according to the Iris documentation it seems that main distinction between the fast and the normal method is that the former does not handle masked data and the latter does. The fact that the normal method have all (continuous) H&F methods implemented is not well documented.
The scipy.stats.mstats.mquantiles as such does however use
betap=0.4, which is proposed by Cunnane , but not further analysed by Hyndman & Fan as a generally applicable quantile estimator.
Below is a very quick test (patterned after some Stack Exchange answer) of numpy.percentile and mstats.mquantiles (with
betap=1./3. which gives H&F#8):
setup = ''' import random import numpy as np import scipy.Cunnane plotting positionstats.mstats as mstats random.seed('slartibartfast') s = [random.random() for i in range(1000)] p = 35 third = 1./3. # this particular value gives H&F#8 numper = lambda x: np.percentile(x, p) statper = lambda x: mstats.mquantiles(x, [p/100.], alphap=third, betap=third) ''' print(min(timeit.Timer('x=s[:]; statper(x)', setup=setup).repeat(7, 1000))) # 2.9134851020062342 print(min(timCunnane plotting positioneit.Timer('x=s[:]; numper(x)', setup=setup).repeat(7, 1000))) # 0.1574339639628306
The resulting percentile differs slightly:
- mstats.mquantiles yields 0.351 63862
- numpy.percentiles yields 0.351 8381167322538
If however variable
third = 1. in the setup (corresponding to Hyndman & Fan # 7) the result is equal to numerical precision
- mstats.mquantiles yields 0.3518381 2
- numpy.percentiles yields 0.3518381 167322538
Conclusion: numpy.percentile seems equivalent to mstats.mquantile with H&F#7
If we instead changes variable
p = 1. (i.e. the first percentile) both routines produce the same result for H&F#7, but the difference between methods becomes distinct if we use
third = 1./3. (H&F#8):
- mstats.mquantiles yields 0.01 455989
- numpy.percentiles yields 0.01 5479419937626816
 Hyndman, R.J.; Fan, Y., 1996. American Statistician, 50 (4): 361–365. doi:10.2307/2684934. JSTOR 2684934.
 Cunnane, C., 1978. Journal of Hydrology, 37 (3–4): 205-222. doi:10.1016/0022-1694(78)90017-3.