Discretize into three equal-sized bins:
import numpy as np
import pandas as pd
pd.cut(np.array([2, 7, 5, 4, 6, 8]), 3)
# doctest: +ELLIPSIS
pd.cut(np.array([2, 7, 5, 4, 6, 8]), 3, retbins=True)
# doctest: +ELLIPSIS
Discovers the same bins, but assign them specific labels. Notice that the returned Categorical’s
categories are labels and is ordered.
pd.cut(np.array([2, 7, 5, 4, 6, 8]),
3, labels=["small", "big", "large"])
labels=False implies you just want the bins back.
pd.cut([0, 1, 1, 2], bins=4, labels=False)
Passing a Series as an input returns a Series with categorical dtype:
s = pd.Series(np.array([4, 5, 6, 8, 10]),
index=['p', 'q', 'r', 's', 't'])
pd.cut(s, 3)
# doctest: +ELLIPSIS
Passing a Series as an input returns a Series with mapping value. It is used to map numerically to intervals
based on bins.
s = pd.Series(np.array([4, 5, 6, 8, 10]),
index=['p', 'q', 'r', 's', 't'])
pd.cut(s, [0, 4, 5, 6, 8, 10], labels=False, retbins=True, right=False)
# doctest: +ELLIPSIS
Use drop optional when bins is not unique
pd.cut(s, [0, 4, 5, 6, 10, 10], labels=False, retbins=True,
right=False, duplicates='drop')
# doctest: +ELLIPSIS
Passing an IntervalIndex for bins results in those categories exactly. Notice that values not covered
by the IntervalIndex are set to NaN. 0 is to the left of the first bin (which is closed on the right),
and 1.5 falls between two bins.
bins = pd.IntervalIndex.from_tuples([(0, 2), (3, 4), (5, 6)])
pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)