w3resource

Pandas Series: resample() function

Resample Pandas time-series data

The resample() function is used to resample time-series data.

Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

Syntax:

Series.resample(self, rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0, on=None, level=None)
Pandas Series resample image

Parameters:

Name Description Type/Default Value Required / Optional
rule   The offset string or object representing target conversion.  DateOffset, Timedelta or str Required
axis Which axis to use for up- or down-sampling. For Series this will default to 0, i.e. along the rows. Must be DatetimeIndex, TimedeltaIndex or PeriodIndex. {0 or ‘index’, 1 or ‘columns’}
Default Value: 0
Required
closed Which side of bin interval is closed. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’. {‘right’, ‘left’}
Default Value: None
Required
label Which bin edge label to label bucket with. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’. {‘right’, ‘left’}
Default Value: None
Required
convention For PeriodIndex only, controls whether to use the start or end of rule. {‘start’, ‘end’, ‘s’, ‘e’}
Default Value: ‘start’
Required
kind Pass ‘timestamp’ to convert the resulting index to a DateTimeIndex or ‘period’ to convert it to a PeriodIndex. By default the input representation is retained. {‘timestamp’, ‘period’}
Default Value: None
Optional
loffset Adjust the resampled time labels. timedelta
Default Value: None
Required
base For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0. int
Default Value: 0
Required
on For a DataFrame, column to use instead of index for resampling. Column must be datetime-like. str
Optional
level For a MultiIndex, level (name or number) to use for resampling. level must be datetime-like.  str or int
Optional

Returns: Resampler object

Example - Start by creating a series with 9 one minute timestamps:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series

Output:

2019-01-01 00:00:00    0
2019-01-01 00:01:00    1
2019-01-01 00:02:00    2
2019-01-01 00:03:00    3
2019-01-01 00:04:00    4
2019-01-01 00:05:00    5
2019-01-01 00:06:00    6
2019-01-01 00:07:00    7
Freq: T, dtype: int64
Pandas Series resample image

Example - Downsample the series into 3 minute bins and sum the values of the timestamps falling into a bin:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('3T').sum()

Output:

2019-01-01 00:00:00     3
2019-01-01 00:03:00    12
2019-01-01 00:06:00    13
Freq: 3T, dtype: int64

Example - Downsample the series into 3 minute bins as above, but label each bin using the right edge instead of the left:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('3T', label='right').sum()

Output:

2019-01-01 00:03:00     3
2019-01-01 00:06:00    12
2019-01-01 00:09:00    13
Freq: 3T, dtype: int64

Example - Downsample the series into 3 minute bins as above, but close the right side of the bin interval:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('3T', label='right', closed='right').sum()

Output:

2019-01-01 00:00:00     0
2019-01-01 00:03:00     6
2019-01-01 00:06:00    15
2019-01-01 00:09:00     7
Freq: 3T, dtype: int64

Example - Upsample the series into 30 second bins:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('30S').asfreq()[0:5]   # Select first 5 rows

Output:

2019-01-01 00:00:00    0.0
2019-01-01 00:00:30    NaN
2019-01-01 00:01:00    1.0
2019-01-01 00:01:30    NaN
2019-01-01 00:02:00    2.0
Freq: 30S, dtype: float64

Example - Upsample the series into 30 second bins and fill the NaN values using the pad method:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('30S').pad()[0:5]

Output:

2019-01-01 00:00:00    0
2019-01-01 00:00:30    0
2019-01-01 00:01:00    1
2019-01-01 00:01:30    1
2019-01-01 00:02:00    2
Freq: 30S, dtype: int64

Example - Upsample the series into 30 second bins and fill the NaN values using the bfill method:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
series.resample('30S').bfill()[0:5]

Output:

2019-01-01 00:00:00    0
2019-01-01 00:00:30    1
2019-01-01 00:01:00    1
2019-01-01 00:01:30    2
2019-01-01 00:02:00    2
Freq: 30S, dtype: int64

Example - Pass a custom function via apply:

Python-Pandas Code:

import numpy as np
import pandas as pd
index = pd.date_range('1/1/2019', periods=8, freq='T')
series = pd.Series(range(8), index=index)
def custom_resampler(array_like):
    return np.sum(array_like) + 5
series.resample('3T').apply(custom_resampler)

Output:

2019-01-01 00:00:00     8
2019-01-01 00:03:00    17
2019-01-01 00:06:00    18
Freq: 3T, dtype: int64

For a Series with a PeriodIndex, the keyword convention can be used to control whether to use the start or end of rule.

Example - Resample a year by quarter using ‘start’ convention. Values are assigned to the first quarter of the period:

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series([1, 2], index=pd.period_range('2018-01-01',
                                            freq='A',
                                            periods=2))
s

Output:

2018    1
2019    2
Freq: A-DEC, dtype: int64

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series([1, 2], index=pd.period_range('2018-01-01',
                                            freq='A',
                                            periods=2))
s.resample('Q', convention='start').asfreq()

Output:

2018Q1    1.0
2018Q2    NaN
2018Q3    NaN
2018Q4    NaN
2019Q1    2.0
2019Q2    NaN
2019Q3    NaN
2019Q4    NaN
Freq: Q-DEC, dtype: float64

Example - Resample quarters by month using ‘end’ convention. Values are assigned to the last month of the period:

Python-Pandas Code:

import numpy as np
import pandas as pd
q = pd.Series([2, 3, 4, 5], index=pd.period_range('2019-01-01',
                                                  freq='Q',
                                                  periods=4))
q

Output:

2019Q1    2
2019Q2    3
2019Q3    4
2019Q4    5
Freq: Q-DEC, dtype: int64

Python-Pandas Code:

import numpy as np
import pandas as pd
q = pd.Series([2, 3, 4, 5], index=pd.period_range('2019-01-01',
                                                  freq='Q',
                                                  periods=4))
q.resample('M', convention='end').asfreq()

Output:

2019-03    2.0
2019-04    NaN
2019-05    NaN
2019-06    3.0
2019-07    NaN
2019-08    NaN
2019-09    4.0
2019-10    NaN
2019-11    NaN
2019-12    5.0
Freq: M, dtype: float64

Example - For DataFrame objects, the keyword on can be used to specify the column instead, of the index for resampling:

Python-Pandas Code:

import numpy as np
import pandas as pd
d = dict({'price': [8, 9, 7, 11, 12, 16, 15, 17],
          'volume': [40, 50, 30, 80, 40, 80, 30, 40]})
df = pd.DataFrame(d)
df['week_starting'] = pd.date_range('01/01/2019',
                                    periods=8,
                                    freq='W')
df

Output:

  price	volume	   week_starting
0	8	     40	        2019-01-06
1	9	     50	        2019-01-13
2	7	     30	        2019-01-20
3	11	     80	        2019-01-27
4	12	     40	        2019-02-03
5	16	     80	        2019-02-10
6	15	     30	        2019-02-17
7	17	     40	        2019-02-24

Python-Pandas Code:

import numpy as np
import pandas as pd
d = dict({'price': [8, 9, 7, 11, 12, 16, 15, 17],
          'volume': [40, 50, 30, 80, 40, 80, 30, 40]})
df = pd.DataFrame(d)
df['week_starting'] = pd.date_range('01/01/2019',
                                    periods=8,
                                    freq='W')
df.resample('M', on='week_starting').mean()

Output:

               price	volume
week_starting		
2019-01-31	 8.75	 50.0
2019-02-28	 15.00	 47.5

Example - For a DataFrame with MultiIndex, the keyword level can be used to specify on which level the resampling needs to take place:

Python-Pandas Code:

days = pd.date_range('1/1/2019', periods=4, freq='D')
d2 = dict({'price': [8, 9, 7, 11, 12, 16, 15, 17],
           'volume': [40, 50, 30, 80, 40, 80, 30, 40]})
df2 = pd.DataFrame(d2,
                   index=pd.MultiIndex.from_product([days,
                                                    ['morning',
                                                     'afternoon']]
                                                    ))
df2

Output:

                              price	volume
2019-01-01	morning	           8	40
            afternoon	           9	50
2019-01-02	morning	           7	30
            afternoon	          11	80
2019-01-03	morning	          12	40
            afternoon	          16	80
2019-01-04	morning	          15	30
            afternoon	          17	40

Python-Pandas Code:

days = pd.date_range('1/1/2019', periods=4, freq='D')
d2 = dict({'price': [8, 9, 7, 11, 12, 16, 15, 17],
           'volume': [40, 50, 30, 80, 40, 80, 30, 40]})
df2 = pd.DataFrame(d2,
                   index=pd.MultiIndex.from_product([days,
                                                    ['morning',
                                                     'afternoon']]
                                                    ))
df2.resample('D', level=0).sum()

Output:

            price	volume
2019-01-01	17	 90
2019-01-02	18	 110
2019-01-03	28	 120
2019-01-04	32	 70

Previous: Series shift() function
Next: Localize tz-naive index of a Pandas Series



Follow us on Facebook and Twitter for latest update.