Pandas Series: resample() function

Last update on September 15 2022 12:57:42 (UTC/GMT +8 hours)

Resample Pandas time-series data

The resample() function is used to resample time-series data.

Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

Syntax:

Series.resample(self, rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0, on=None, level=None)

Parameters:

Name	Description	Type/Default Value	Required / Optional
rule	The offset string or object representing target conversion.	DateOffset, Timedelta or str	Required
axis	Which axis to use for up- or down-sampling. For Series this will default to 0, i.e. along the rows. Must be DatetimeIndex, TimedeltaIndex or PeriodIndex.	{0 or ‘index’, 1 or ‘columns’} Default Value: 0	Required
closed	Which side of bin interval is closed. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’.	{‘right’, ‘left’} Default Value: None	Required
label	Which bin edge label to label bucket with. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’.	{‘right’, ‘left’} Default Value: None	Required
convention	For PeriodIndex only, controls whether to use the start or end of rule.	{‘start’, ‘end’, ‘s’, ‘e’} Default Value: ‘start’	Required
kind	Pass ‘timestamp’ to convert the resulting index to a DateTimeIndex or ‘period’ to convert it to a PeriodIndex. By default the input representation is retained.	{‘timestamp’, ‘period’} Default Value: None	Optional
loffset	Adjust the resampled time labels.	timedelta Default Value: None	Required
base	For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0.	int Default Value: 0	Required
on	For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.	str	Optional
level	For a MultiIndex, level (name or number) to use for resampling. level must be datetime-like.	str or int	Optional

Returns: Resampler object

Example - Start by creating a series with 9 one minute timestamps:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series

Output:

2019-01-01 00:00:00    0
2019-01-01 00:01:00    1
2019-01-01 00:02:00    2
2019-01-01 00:03:00    3
2019-01-01 00:04:00    4
2019-01-01 00:05:00    5
2019-01-01 00:06:00    6
2019-01-01 00:07:00    7
Freq: T, dtype: int64

Example - Downsample the series into 3 minute bins and sum the values of the timestamps falling into a bin:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('3T').sum()

Output:

2019-01-01 00:00:00     3
2019-01-01 00:03:00    12
2019-01-01 00:06:00    13
Freq: 3T, dtype: int64

Example - Downsample the series into 3 minute bins as above, but label each bin using the right edge instead of the left:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('3T', label='right').sum()

Output:

2019-01-01 00:03:00     3
2019-01-01 00:06:00    12
2019-01-01 00:09:00    13
Freq: 3T, dtype: int64

Example - Downsample the series into 3 minute bins as above, but close the right side of the bin interval:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('3T', label='right', closed='right').sum()

Output:

2019-01-01 00:00:00     0
2019-01-01 00:03:00     6
2019-01-01 00:06:00    15
2019-01-01 00:09:00     7
Freq: 3T, dtype: int64

Example - Upsample the series into 30 second bins:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('30S').asfreq()[0:5]   # Select first 5 rows

Output:

2019-01-01 00:00:00    0.0
2019-01-01 00:00:30    NaN
2019-01-01 00:01:00    1.0
2019-01-01 00:01:30    NaN
2019-01-01 00:02:00    2.0
Freq: 30S, dtype: float64

Example - Upsample the series into 30 second bins and fill the NaN values using the pad method:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('30S').pad()[0:5]

Output:

2019-01-01 00:00:00    0
2019-01-01 00:00:30    0
2019-01-01 00:01:00    1
2019-01-01 00:01:30    1
2019-01-01 00:02:00    2
Freq: 30S, dtype: int64

Example - Upsample the series into 30 second bins and fill the NaN values using the bfill method:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('30S').bfill()[0:5]

Output:

2019-01-01 00:00:00    0
2019-01-01 00:00:30    1
2019-01-01 00:01:00    1
2019-01-01 00:01:30    2
2019-01-01 00:02:00    2
Freq: 30S, dtype: int64

Example - Pass a custom function via apply:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
def custom_resampler(array_like):
    return np.sum(array_like) + 5
series.resample('3T').apply(custom_resampler)

Output:

2019-01-01 00:00:00     8
2019-01-01 00:03:00    17
2019-01-01 00:06:00    18
Freq: 3T, dtype: int64

For a Series with a PeriodIndex, the keyword convention can be used to control whether to use the start or end of rule.

Example - Resample a year by quarter using ‘start’ convention. Values are assigned to the first quarter of the period:

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series([1, 2], index=pd.period_range('2018-01-01',
                                            freq='A',
                                            periods=2))
s

Output:

2018    1
2019    2
Freq: A-DEC, dtype: int64

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series([1, 2], index=pd.period_range('2018-01-01',
                                            freq='A',
                                            periods=2))
s.resample('Q', convention='start').asfreq()

Output:

2018Q1    1.0
2018Q2    NaN
2018Q3    NaN
2018Q4    NaN
2019Q1    2.0
2019Q2    NaN
2019Q3    NaN
2019Q4    NaN
Freq: Q-DEC, dtype: float64

Example - Resample quarters by month using ‘end’ convention. Values are assigned to the last month of the period:

Python-Pandas Code:

import numpy as np
import pandas as pd
q = pd.Series([2, 3, 4, 5], index=pd.period_range('2019-01-01',
                                                  freq='Q',
                                                  periods=4))
q

Output:

2019Q1    2
2019Q2    3
2019Q3    4
2019Q4    5
Freq: Q-DEC, dtype: int64

Python-Pandas Code:

import numpy as np
import pandas as pd
q = pd.Series([2, 3, 4, 5], index=pd.period_range('2019-01-01',
                                                  freq='Q',
                                                  periods=4))
q.resample('M', convention='end').asfreq()

Output:

2019-03    2.0
2019-04    NaN
2019-05    NaN
2019-06    3.0
2019-07    NaN
2019-08    NaN
2019-09    4.0
2019-10    NaN
2019-11    NaN
2019-12    5.0
Freq: M, dtype: float64

Example - For DataFrame objects, the keyword on can be used to specify the column instead, of the index for resampling:

Python-Pandas Code:

import numpy as np
import pandas as pd
d = dict({'price': [8, 9, 7, 11, 12, 16, 15, 17],
          'volume': [40, 50, 30, 80, 40, 80, 30, 40]})
df = pd.DataFrame(d)
df['week_starting'] = pd.date_range('01/01/2019',
                                    periods=8,
                                    freq='W')
df

Output:

  price	volume	   week_starting
0	8	     40	        2019-01-06
1	9	     50	        2019-01-13
2	7	     30	        2019-01-20
3	11	     80	        2019-01-27
4	12	     40	        2019-02-03
5	16	     80	        2019-02-10
6	15	     30	        2019-02-17
7	17	     40	        2019-02-24

Python-Pandas Code:

import numpy as np
import pandas as pd
d = dict({'price': [8, 9, 7, 11, 12, 16, 15, 17],
          'volume': [40, 50, 30, 80, 40, 80, 30, 40]})
df = pd.DataFrame(d)
df['week_starting'] = pd.date_range('01/01/2019',
                                    periods=8,
                                    freq='W')
df.resample('M', on='week_starting').mean()

Output:

               price	volume
week_starting		
2019-01-31	 8.75	 50.0
2019-02-28	 15.00	 47.5

Example - For a DataFrame with MultiIndex, the keyword level can be used to specify on which level the resampling needs to take place:

Python-Pandas Code:

days = pd.date_range('1/1/2019', periods=4, freq='D')
d2 = dict({'price': [8, 9, 7, 11, 12, 16, 15, 17],
           'volume': [40, 50, 30, 80, 40, 80, 30, 40]})
df2 = pd.DataFrame(d2,
                   index=pd.MultiIndex.from_product([days,
                                                    ['morning',
                                                     'afternoon']]
                                                    ))
df2

Output:

                              price	volume
2019-01-01	morning	           8	40
            afternoon	           9	50
2019-01-02	morning	           7	30
            afternoon	          11	80
2019-01-03	morning	          12	40
            afternoon	          16	80
2019-01-04	morning	          15	30
            afternoon	          17	40

Python-Pandas Code:

days = pd.date_range('1/1/2019', periods=4, freq='D')
d2 = dict({'price': [8, 9, 7, 11, 12, 16, 15, 17],
           'volume': [40, 50, 30, 80, 40, 80, 30, 40]})
df2 = pd.DataFrame(d2,
                   index=pd.MultiIndex.from_product([days,
                                                    ['morning',
                                                     'afternoon']]
                                                    ))
df2.resample('D', level=0).sum()

Output:

            price	volume
2019-01-01	17	 90
2019-01-02	18	 110
2019-01-03	28	 120
2019-01-04	32	 70

Previous: Series shift() function
Next: Localize tz-naive index of a Pandas Series