Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • C climix
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 53
    • Issues 53
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • climix
  • climix
  • Wiki
  • Memo on percentile methods

Memo on percentile methods · Changes

Page history
Update Memo on percentile methods authored Mar 13, 2020 by Lars Bärring's avatar Lars Bärring
Hide whitespace changes
Inline Side-by-side
Memo-on-percentile-methods.md
View page @ 79633daa
......@@ -9,4 +9,47 @@ This getting even more 'interesting' when also considering [Dask's percentile](h
The [Iris percentile function](https://scitools.org.uk/iris/docs/latest/iris/iris/analysis.html#iris.analysis.PERCENTILE) is divided (depending on calling arguments) into a **fast method** (cf. [Iris issue #3294](https://github.com/SciTools/iris/issues/3294)) using numpy.percentile, and a **'normal' method** using scipy.stats.mstats.mquantiles with default `**kwargs` corresponding to method H&F#7. However, according to the Iris documentation it seems that main distinction between the fast and the normal method is that the former does not handle masked data and the latter does. The fact that the normal method have all (continuous) H&F methods implemented is not well documented.
Below is a very quick test (patterned after some Stack Exchange answer) of numpy. percentile and mstats.mquantiles:
```python
setup = '''
import random
import numpy as np
import scipy.stats.mstats as mstats
random.seed('slartibartfast')
s = [random.random() for i in range(1000)]
p = 35
third = 1./3. # this particular value gives H&F#8
numper = lambda x: np.percentile(x, p)
statper = lambda x: mstats.mquantiles(x, [p/100.], alphap=third, betap=third)
'''
print(min(timeit.Timer('x=s[:]; statper(x)', setup=setup).repeat(7, 1000)))
# 2.9134851020062342
print(min(timeit.Timer('x=s[:]; numper(x)', setup=setup).repeat(7, 1000)))
# 0.1574339639628306
```
The resulting percentile differs slightly:
* mstats.mquantiles yields 0.351 *63862*
* numpy.percentiles yields 0.351 *8381167322538*
If however variable `third = 1.` in the setup (corresponding to Hyndman & Fan # 7) the result is equal to numerical precision
* mstats.mquantiles yields 0.3518381 2
* numpy.percentiles yields 0.3518381 *167322538*
Conclusion: numpy.percentile seems equivalent to mstats.mquantile with H&F#7
If we instead changes variable `p = 1.` (i.e. the first percentile) both routines produce the same result for H&F#7, but the difference between methods becomes distinct if we use `third = 1./3.` (H&F#8):
* mstats.mquantiles yields 0.01 *455989*
* numpy.percentiles yields 0.01 *5479419937626816*
----------
[1] Hyndman, R.J.; Fan, Y., 1996. American Statistician, 50 (4): 361–365. doi:10.2307/2684934. JSTOR 2684934.
Clone repository
  • Aliases_for_var_names
  • Decisions made
  • Memo on adding new index functions
  • Memo on installing
    • Memo on installing and using the metadata table (editor)
  • Memo on percentile methods
  • Home
  • priority 1 indices