Pandas: Series

Last update on September 15 2022 12:54:33 (UTC/GMT +8 hours)

Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

There are different types of data:

a Python dict
an ndarray
a scalar value

Example - From ndarray:

If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1].

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(data, index=index)
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s

Output:

p   -0.310263
q   -0.703727
r    0.760450
n    0.350622
t    0.195871
v    0.739086
dtype: float64

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s.index

Output:

Index(['p', 'q', 'r', 'n', 't', 'v'], dtype='object')

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
pd.Series(np.random.randn(6))

Output:

0   -1.049184
1   -0.524355
2    0.659975
3   -1.122864
4    1.387395
5    0.514023
dtype: float64

Example - From dict:

Series can be instantiated from dicts:

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
n = {'q': 1, 'p': 2, 'r': 3}
pd.Series(n)

Output:

q    1
p    2
r    3
dtype: int64

In the example above, if you were on a Python version lower than 3.6 or a Pandas version lower than 0.23, the Series would be ordered by the lexical order of the dict keys (i.e. ['p', 'q', 'r'] rather than ['q', 'p', 'r']).

If an index is passed, the values in data corresponding to the labels in the index will be pulled out.

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
n = {'p': 2., 'q': 1., 'r': 3.}
pd.Series(n)

Output:

p    2.0
q    1.0
r    3.0
dtype: float64

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
n = {'p': 2., 'q': 1., 'r': 3.}
pd.Series(n, index=['q', 'r', 'n', 'p'])

Output:

q    1.0
r    3.0
n    NaN
p    2.0
dtype: float64

Example - From scalar value:

If data is a scalar value, an index must be provided. The value will be repeated to match the length of index.

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
pd.Series(4., index=['p', 'q', 'r', 'n', 't'])

Output:

p    4.0
q    4.0
r    4.0
n    4.0
t    4.0
dtype: float64

Example - Series is ndarray-like:

Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. However, operations such as slicing will also slice the index.

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s[0]

Output:

-1.0264054091334087

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s[:4]

Output:

p   -1.026405
q   -0.549446
r    0.105166
n    1.237134
dtype: float64

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s[s > s.median()]

Output:

r    0.105166
n    1.237134
v    1.099714
dtype: float64

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s[[5, 4, 3]]

Output:

v    1.099714
t   -0.357001
n    1.237134
dtype: float64

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
np.exp(s)

Output:

p    0.358293
q    0.577270
r    1.110894
n    3.445726
t    0.699772
v    3.003308
dtype: float64

Like a NumPy array, a pandas Series has a dtype.

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s.dtype

Output:

dtype('float64')

If you need the actual array backing a Series, use Series.array.

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s.array

Output:

<PandasArray>
[-1.0264054091334087,  -0.549445701791565, 0.10516552598880402,
  1.2371344986220967, -0.3570011032982442,  1.0997143297297525]
Length: 6, dtype: float64

Accessing the array can be useful when you need to do some operation without the index.

While Series is ndarray-like, if you need an actual ndarray, then use Series.to_numpy().

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s.to_numpy()

Output:

<PandasArray>
array([-1.02640541, -0.5494457 ,  0.10516553,  1.2371345 , -0.3570011 ,
        1.09971433])

Even if the Series is backed by a ExtensionArray, Series.to_numpy() will return a NumPy ndarray.

Example - Series is dict-like A Series is like a fixed-size dict in that you can get and set values by index label:

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s['q']

Output:

-0.9278660706287409

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s['n'] = 10.
s

Output:

p    -0.471729
q    -0.927866
r    -0.086945
n    10.000000
t     0.593117
v    -1.245147
dtype: float64

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
'n' in s

Output:

True

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
'd' in s

Output:

False

If a label is not contained, an exception is raised:

s['d'] KeyError: 'd'

Example - Using the get method, a missing label will return None or specified default:

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s.get('d')
s.get('d', np.nan)

Output:

nan

Example - Vectorized operations and label alignment with Series:

Series can also be passed into most NumPy methods expecting an ndarray.

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s + s

Output:

p    -0.943459
q    -1.855732
r    -0.173890
n    20.000000
t     1.186234
v    -2.490294
dtype: float64

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s * 2

Output:

p    -0.943459
q    -1.855732
r    -0.173890
n    20.000000
t     1.186234
v    -2.490294
dtype: float64

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
np.exp(s)

Output:

p        0.623922
q        0.395397
r        0.916728
n    22026.465795
t        1.809620
v        0.287899
dtype: float64

A key difference between Series and ndarray is that operations between Series automatically align the data based on label.

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s[2:] + s[:-2]

Output:

n    20.00000
p         NaN
q         NaN
r    -0.17389
t         NaN
v         NaN
dtype: float64

Example - Name attribute Series can also have a name attribute:

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), name='research')
s

Output:

0    0.765820
1   -1.014433
2    1.185444
3   -0.028960
4   -1.748811
5   -1.244340
Name: research, dtype: float64

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), name='research')
s.name

Output:

'research'

Example - You can rename a Series with the pandas.Series.rename() method:

Python-Pandas Code:

import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), name='research')
s2=s.rename("search")
s2.name

Output:

'search'

Note that s and s2 refer to different objects.

Previous: Python pandas tutorials
Next: An ExtensionArray in Pandas