Pandas: Series
Series
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index.
There are different types of data:
- a Python dict
- an ndarray
- a scalar value
Example - From ndarray:
If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1].
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(data, index=index)
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s
Output:
p -0.310263 q -0.703727 r 0.760450 n 0.350622 t 0.195871 v 0.739086 dtype: float64
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s.index
Output:
Index(['p', 'q', 'r', 'n', 't', 'v'], dtype='object')
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
pd.Series(np.random.randn(6))
Output:
0 -1.049184 1 -0.524355 2 0.659975 3 -1.122864 4 1.387395 5 0.514023 dtype: float64
Example - From dict:
Series can be instantiated from dicts:
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
n = {'q': 1, 'p': 2, 'r': 3}
pd.Series(n)
Output:
q 1 p 2 r 3 dtype: int64
In the example above, if you were on a Python version lower than 3.6 or a Pandas version lower than 0.23, the Series would be ordered by the lexical order of the dict keys (i.e. ['p', 'q', 'r'] rather than ['q', 'p', 'r']).
If an index is passed, the values in data corresponding to the labels in the index will be pulled out.
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
n = {'p': 2., 'q': 1., 'r': 3.}
pd.Series(n)
Output:
p 2.0 q 1.0 r 3.0 dtype: float64
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
n = {'p': 2., 'q': 1., 'r': 3.}
pd.Series(n, index=['q', 'r', 'n', 'p'])
Output:
q 1.0 r 3.0 n NaN p 2.0 dtype: float64
Example - From scalar value:
If data is a scalar value, an index must be provided. The value will be repeated to match the length of index.
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
pd.Series(4., index=['p', 'q', 'r', 'n', 't'])
Output:
p 4.0 q 4.0 r 4.0 n 4.0 t 4.0 dtype: float64
Example - Series is ndarray-like:
Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. However, operations such as slicing will also slice the index.
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s[0]
Output:
-1.0264054091334087
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s[:4]
Output:
p -1.026405 q -0.549446 r 0.105166 n 1.237134 dtype: float64
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s[s > s.median()]
Output:
r 0.105166 n 1.237134 v 1.099714 dtype: float64
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s[[5, 4, 3]]
Output:
v 1.099714 t -0.357001 n 1.237134 dtype: float64
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
np.exp(s)
Output:
p 0.358293 q 0.577270 r 1.110894 n 3.445726 t 0.699772 v 3.003308 dtype: float64
Like a NumPy array, a pandas Series has a dtype.
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s.dtype
Output:
dtype('float64')
If you need the actual array backing a Series, use Series.array.
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s.array
Output:
<PandasArray> [-1.0264054091334087, -0.549445701791565, 0.10516552598880402, 1.2371344986220967, -0.3570011032982442, 1.0997143297297525] Length: 6, dtype: float64
Accessing the array can be useful when you need to do some operation without the index.
While Series is ndarray-like, if you need an actual ndarray, then use Series.to_numpy().
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s.to_numpy()
Output:
<PandasArray> array([-1.02640541, -0.5494457 , 0.10516553, 1.2371345 , -0.3570011 , 1.09971433])
Even if the Series is backed by a ExtensionArray, Series.to_numpy() will return a NumPy ndarray.
Example - Series is dict-like A Series is like a fixed-size dict in that you can get and set values by index label:
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s['q']
Output:
-0.9278660706287409
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s['n'] = 10.
s
Output:
p -0.471729 q -0.927866 r -0.086945 n 10.000000 t 0.593117 v -1.245147 dtype: float64
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
'n' in s
Output:
True
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
'd' in s
Output:
False
If a label is not contained, an exception is raised:
s['d'] KeyError: 'd'
Example - Using the get method, a missing label will return None or specified default:
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s.get('d')
s.get('d', np.nan)
Output:
nan
Example - Vectorized operations and label alignment with Series:
Series can also be passed into most NumPy methods expecting an ndarray.
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s + s
Output:
p -0.943459 q -1.855732 r -0.173890 n 20.000000 t 1.186234 v -2.490294 dtype: float64
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s * 2
Output:
p -0.943459 q -1.855732 r -0.173890 n 20.000000 t 1.186234 v -2.490294 dtype: float64
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
np.exp(s)
Output:
p 0.623922 q 0.395397 r 0.916728 n 22026.465795 t 1.809620 v 0.287899 dtype: float64
A key difference between Series and ndarray is that operations between Series automatically align the data based on label.
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), index=['p', 'q', 'r', 'n', 't','v'])
s[2:] + s[:-2]
Output:
n 20.00000 p NaN q NaN r -0.17389 t NaN v NaN dtype: float64
Example - Name attribute Series can also have a name attribute:
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), name='research')
s
Output:
0 0.765820 1 -1.014433 2 1.185444 3 -0.028960 4 -1.748811 5 -1.244340 Name: research, dtype: float64
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), name='research')
s.name
Output:
'research'
Example - You can rename a Series with the pandas.Series.rename() method:
Python-Pandas Code:
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(6), name='research')
s2=s.rename("search")
s2.name
Output:
'search'
Note that s and s2 refer to different objects.
Previous: Python pandas tutorials
Next: An ExtensionArray in Pandas
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics