Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • C climix
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 48
    • Issues 48
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 5
    • Merge requests 5
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • climix
  • climix
  • Wiki
  • Memo on percentile methods

Memo on percentile methods · Changes

Page history
Update Memo on percentile methods authored Mar 13, 2020 by Lars Bärring's avatar Lars Bärring
Hide whitespace changes
Inline Side-by-side
Memo-on-percentile-methods.md
View page @ f011ab70
The IRIS "Fast percentile issue" [#3294](https://github.com/SciTools/iris/issues/3294): This touches on our needs because the ETCCDI is rather picky about percentiles. According to them (as per implementation in reference code), the method to calculate the percentile should be [Hyndman & Fan method #8](https://www.researchgate.net/profile/Rob_Hyndman/publication/222105754_Sample_Quantiles_in_Statistical_Packages/links/02e7e530c316d129d7000000.pdf) [1]. This is also the preferred method by [NIST](https://www.itl.nist.gov/div898//software/dataplot/refman2/auxillar/percenti.htm). This method is available in the [R package](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html) and in [scipy.stats.mstats.mquantile](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.mquantiles.html). The method is however not available in [numpy.percentile](https://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html), and there seems to be some [confusion regarding methods and their implementation](https://github.com/numpy/numpy/issues/10736).
The ETCCDI is rather picky about percentiles. According to them (as per implementation in reference code), the method to calculate the percentile should be [Hyndman & Fan method #8](https://www.researchgate.net/profile/Rob_Hyndman/publication/222105754_Sample_Quantiles_in_Statistical_Packages/links/02e7e530c316d129d7000000.pdf) [1]. This is also the preferred method by [NIST](https://www.itl.nist.gov/div898//software/dataplot/refman2/auxillar/percenti.htm). This method is available in the [R package](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html), although not as default, and in [scipy.stats.mstats.mquantiles](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.mquantiles.html).
The method is however not available in [numpy.percentile](https://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html), and there seems to be some [confusion regarding methods and their implementation](https://github.com/numpy/numpy/issues/10736). In particular there is an outline for a numpy implementation of all H&F methods in [this comment](https://github.com/numpy/numpy/issues/10736#issuecomment-390425384), but it seems that progress on this has stalled.
Moreover, the python percentile calculation 'ecosystem' becomes more diverse with a [Python3.8 percentile](https://docs.python.org/dev/library/statistics.html#statistics.quantiles) function.
This getting even more 'interesting' when also considering [Dask's percentile](https://docs.dask.org/en/latest/array-api.html#dask.array.percentile), which is not without problems, see e.g. [dask issue #1225](https://github.com/dask/dask/issues/1225).
The [Iris percentile function](https://scitools.org.uk/iris/docs/latest/iris/iris/analysis.html#iris.analysis.PERCENTILE) is divided (depending on arguments) into a **fast method** (cf. [Iris issue #3294](https://github.com/SciTools/iris/issues/3294)) using numpy.percentile, and a **normal method** using scipy.stats.mstats.mquantiles with default **kwargs corresponding to method H&F#7. However, according to the Iris documentation it seems that main distinction between the fast and the normal method is that the former does not handle masked data and the latter does. The fact that the normal method have all (continuous) H&F methods implemented is not well documented.
[1] Hyndman, R.J.; Fan, Y., 1996. American Statistician, 50 (4): 361–365. doi:10.2307/2684934. JSTOR 2684934.
Clone repository
  • Decisions made
  • Memo on adding new index functions
  • Memo on installing
    • Memo on installing and using the metadata table (editor)
  • Memo on percentile methods
  • Home