Examples
Start by creating a series with 8 one minute timestamps:
import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series
Downsample the series into 3 minute bins and sum the values of the timestamps
falling into a bin.
series.resample('3T').sum()
Downsample the series into 3 minute bins as above, but label each bin using the right edge
instead of the left:
series.resample('3T', label='right').sum()
Downsample the series into 3 minute bins as above, but close the right side
of the bin interval.
series.resample('3T', label='right', closed='right').sum()
Upsample the series into 30 second bins.
series.resample('30S').asfreq()[0:5] # Select first 5 rows
Upsample the series into 30 second bins and fill the NaN values using the pad method.
series.resample('30S').pad()[0:5]
Upsample the series into 30 second bins and fill the NaN values using the bfill method.
series.resample('30S').bfill()[0:5]
Pass a custom function via apply
def custom_resampler(array_like):
return np.sum(array_like) + 5
series.resample('3T').apply(custom_resampler)
For a Series with a PeriodIndex, the keyword convention can be used to control whether
to use the start or end of rule.
Resample a year by quarter using ‘start’ convention. Values are assigned to the first
quarter of the period:
s = pd.Series([1, 2], index=pd.period_range('2018-01-01',
freq='A',
periods=2))
s
s.resample('Q', convention='start').asfreq()
Resample quarters by month using ‘end’ convention. Values are assigned to the last
month of the period.
q = pd.Series([1, 2, 3, 4], index=pd.period_range('2019-01-01',
freq='Q',
periods=4))
q
q.resample('M', convention='end').asfreq()
For DataFrame objects, the keyword on can be used to specify the column instead
of the index for resampling:
d = dict({'price': [20, 21, 19, 23, 24, 28, 27, 29],
'volume': [60, 70, 50, 100, 60, 100, 40, 60]})
df = pd.DataFrame(d)
df['week_starting'] = pd.date_range('01/01/2019',
periods=8,
freq='W')
df
df.resample('M', on='week_starting').mean()
For a DataFrame with MultiIndex, the keyword level can be used to specify on which level
the resampling needs to take place.
days = pd.date_range('1/1/2019', periods=4, freq='D')
d2 = dict({'price': [20, 21, 19, 23, 24, 28, 27, 29],
'volume': [60, 70, 50, 100, 60, 100, 50, 60]})
df2 = pd.DataFrame(d2,
index=pd.MultiIndex.from_product([days,
['morning',
'afternoon']]
))
df2
df2.resample('D', level=0).sum()