Pandas Series: resample() function
Resample Pandas time-series data
The resample() function is used to resample time-series data.
Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.
Syntax:
Series.resample(self, rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0, on=None, level=None)
Parameters:
Name | Description | Type/Default Value | Required / Optional |
---|---|---|---|
rule | The offset string or object representing target conversion. | DateOffset, Timedelta or str | Required |
axis | Which axis to use for up- or down-sampling. For Series this will default to 0, i.e. along the rows. Must be DatetimeIndex, TimedeltaIndex or PeriodIndex. | {0 or ‘index’, 1 or ‘columns’} Default Value: 0 |
Required |
closed | Which side of bin interval is closed. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’. | {‘right’, ‘left’} Default Value: None |
Required |
label | Which bin edge label to label bucket with. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’. | {‘right’, ‘left’} Default Value: None |
Required |
convention | For PeriodIndex only, controls whether to use the start or end of rule. | {‘start’, ‘end’, ‘s’, ‘e’} Default Value: ‘start’ |
Required |
kind | Pass ‘timestamp’ to convert the resulting index to a DateTimeIndex or ‘period’ to convert it to a PeriodIndex. By default the input representation is retained. | {‘timestamp’, ‘period’} Default Value: None |
Optional |
loffset | Adjust the resampled time labels. | timedelta Default Value: None |
Required |
base | For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0. | int Default Value: 0 |
Required |
on | For a DataFrame, column to use instead of index for resampling. Column must be datetime-like. | str |
Optional |
level | For a MultiIndex, level (name or number) to use for resampling. level must be datetime-like. | str or int |
Optional |
Returns: Resampler object
Example - Start by creating a series with 9 one minute timestamps:
Python-Pandas Code:
import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series
Output:
2019-01-01 00:00:00 0 2019-01-01 00:01:00 1 2019-01-01 00:02:00 2 2019-01-01 00:03:00 3 2019-01-01 00:04:00 4 2019-01-01 00:05:00 5 2019-01-01 00:06:00 6 2019-01-01 00:07:00 7 Freq: T, dtype: int64
Example - Downsample the series into 3 minute bins and sum the values of the timestamps falling into a bin:
Python-Pandas Code:
import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('3T').sum()
Output:
2019-01-01 00:00:00 3 2019-01-01 00:03:00 12 2019-01-01 00:06:00 13 Freq: 3T, dtype: int64
Example - Downsample the series into 3 minute bins as above, but label each bin using the right edge instead of the left:
Python-Pandas Code:
import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('3T', label='right').sum()
Output:
2019-01-01 00:03:00 3 2019-01-01 00:06:00 12 2019-01-01 00:09:00 13 Freq: 3T, dtype: int64
Example - Downsample the series into 3 minute bins as above, but close the right side of the bin interval:
Python-Pandas Code:
import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('3T', label='right', closed='right').sum()
Output:
2019-01-01 00:00:00 0 2019-01-01 00:03:00 6 2019-01-01 00:06:00 15 2019-01-01 00:09:00 7 Freq: 3T, dtype: int64
Example - Upsample the series into 30 second bins:
Python-Pandas Code:
import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('30S').asfreq()[0:5] # Select first 5 rows
Output:
2019-01-01 00:00:00 0.0 2019-01-01 00:00:30 NaN 2019-01-01 00:01:00 1.0 2019-01-01 00:01:30 NaN 2019-01-01 00:02:00 2.0 Freq: 30S, dtype: float64
Example - Upsample the series into 30 second bins and fill the NaN values using the pad method:
Python-Pandas Code:
import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('30S').pad()[0:5]
Output:
2019-01-01 00:00:00 0 2019-01-01 00:00:30 0 2019-01-01 00:01:00 1 2019-01-01 00:01:30 1 2019-01-01 00:02:00 2 Freq: 30S, dtype: int64
Example - Upsample the series into 30 second bins and fill the NaN values using the bfill method:
Python-Pandas Code:
import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('30S').bfill()[0:5]
Output:
2019-01-01 00:00:00 0 2019-01-01 00:00:30 1 2019-01-01 00:01:00 1 2019-01-01 00:01:30 2 2019-01-01 00:02:00 2 Freq: 30S, dtype: int64
Example - Pass a custom function via apply:
Python-Pandas Code:
import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
def custom_resampler(array_like):
return np.sum(array_like) + 5
series.resample('3T').apply(custom_resampler)
Output:
2019-01-01 00:00:00 8 2019-01-01 00:03:00 17 2019-01-01 00:06:00 18 Freq: 3T, dtype: int64
For a Series with a PeriodIndex, the keyword convention can be used to control whether to use the start or end of rule.
Example - Resample a year by quarter using ‘start’ convention. Values are assigned to the first quarter of the period:
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series([1, 2], index=pd.period_range('2018-01-01',
freq='A',
periods=2))
s
Output:
2018 1 2019 2 Freq: A-DEC, dtype: int64
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series([1, 2], index=pd.period_range('2018-01-01',
freq='A',
periods=2))
s.resample('Q', convention='start').asfreq()
Output:
2018Q1 1.0 2018Q2 NaN 2018Q3 NaN 2018Q4 NaN 2019Q1 2.0 2019Q2 NaN 2019Q3 NaN 2019Q4 NaN Freq: Q-DEC, dtype: float64
Example - Resample quarters by month using ‘end’ convention. Values are assigned to the last month of the period:
Python-Pandas Code:
import numpy as np
import pandas as pd
q = pd.Series([2, 3, 4, 5], index=pd.period_range('2019-01-01',
freq='Q',
periods=4))
q
Output:
2019Q1 2 2019Q2 3 2019Q3 4 2019Q4 5 Freq: Q-DEC, dtype: int64
Python-Pandas Code:
import numpy as np
import pandas as pd
q = pd.Series([2, 3, 4, 5], index=pd.period_range('2019-01-01',
freq='Q',
periods=4))
q.resample('M', convention='end').asfreq()
Output:
2019-03 2.0 2019-04 NaN 2019-05 NaN 2019-06 3.0 2019-07 NaN 2019-08 NaN 2019-09 4.0 2019-10 NaN 2019-11 NaN 2019-12 5.0 Freq: M, dtype: float64
Example - For DataFrame objects, the keyword on can be used to specify the column instead, of the index for resampling:
Python-Pandas Code:
import numpy as np
import pandas as pd
d = dict({'price': [8, 9, 7, 11, 12, 16, 15, 17],
'volume': [40, 50, 30, 80, 40, 80, 30, 40]})
df = pd.DataFrame(d)
df['week_starting'] = pd.date_range('01/01/2019',
periods=8,
freq='W')
df
Output:
price volume week_starting 0 8 40 2019-01-06 1 9 50 2019-01-13 2 7 30 2019-01-20 3 11 80 2019-01-27 4 12 40 2019-02-03 5 16 80 2019-02-10 6 15 30 2019-02-17 7 17 40 2019-02-24
Python-Pandas Code:
import numpy as np
import pandas as pd
d = dict({'price': [8, 9, 7, 11, 12, 16, 15, 17],
'volume': [40, 50, 30, 80, 40, 80, 30, 40]})
df = pd.DataFrame(d)
df['week_starting'] = pd.date_range('01/01/2019',
periods=8,
freq='W')
df.resample('M', on='week_starting').mean()
Output:
price volume week_starting 2019-01-31 8.75 50.0 2019-02-28 15.00 47.5
Example - For a DataFrame with MultiIndex, the keyword level can be used to specify on which level the resampling needs to take place:
Python-Pandas Code:
days = pd.date_range('1/1/2019', periods=4, freq='D')
d2 = dict({'price': [8, 9, 7, 11, 12, 16, 15, 17],
'volume': [40, 50, 30, 80, 40, 80, 30, 40]})
df2 = pd.DataFrame(d2,
index=pd.MultiIndex.from_product([days,
['morning',
'afternoon']]
))
df2
Output:
price volume 2019-01-01 morning 8 40 afternoon 9 50 2019-01-02 morning 7 30 afternoon 11 80 2019-01-03 morning 12 40 afternoon 16 80 2019-01-04 morning 15 30 afternoon 17 40
Python-Pandas Code:
days = pd.date_range('1/1/2019', periods=4, freq='D')
d2 = dict({'price': [8, 9, 7, 11, 12, 16, 15, 17],
'volume': [40, 50, 30, 80, 40, 80, 30, 40]})
df2 = pd.DataFrame(d2,
index=pd.MultiIndex.from_product([days,
['morning',
'afternoon']]
))
df2.resample('D', level=0).sum()
Output:
price volume 2019-01-01 17 90 2019-01-02 18 110 2019-01-03 28 120 2019-01-04 32 70
Previous: Series shift() function
Next: Localize tz-naive index of a Pandas Series
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics