Pandas Series: str.contains() function
Series-str.contains() function
The str.contains() function is used to test if pattern or regex is contained within a string of a Series or Index.
Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
Syntax:
Series.str.contains(self, pat, case=True, flags=0, na=nan, regex=True)
Parameters:
Name | Description | Type/Default Value | Required / Optional |
---|---|---|---|
pat | Character sequence or regular expression. | str | Required |
case | If True, case sensitive. | bool Default Value: True |
Required |
flags | Flags to pass through to the re module, e.g. re.IGNORECASE. | int Default Value: 0 (no flags) |
Required |
na | Fill value for missing values. | Default Value: None | Required |
regex | If True, assumes the pat is a regular expression. |
bool Default Value: True |
Required |
Returns: Series or Index of boolean values
A Series or Index of boolean values indicating whether the given pattern is contained within the string of each element of the Series or Index.
Example - Returning a Series of booleans using only a literal pattern:
Python-Pandas Code:
import numpy as np
import pandas as pd
s1 = pd.Series(['Tiger', 'fox', 'house and men', '20', np.NaN])
s1.str.contains('ox', regex=False)
Output:
0 False 1 True 2 False 3 False 4 NaN dtype: object
Example - Returning an Index of booleans using only a literal pattern:
Python-Pandas Code:
import numpy as np
import pandas as pd
s1 = pd.Series(['Tiger', 'fox', 'house and men', '20', np.NaN])
ind = pd.Index(['Tiger', 'fox', 'house and men', '20.0', np.NaN])
ind.str.contains('20', regex=False)
Output:
Index([False, False, False, True, nan], dtype='object')
Example - Specifying case sensitivity using case:
Python-Pandas Code:
import numpy as np
import pandas as pd
s1 = pd.Series(['Tiger', 'fox', 'house and men', '20', np.NaN])
ind = pd.Index(['Tiger', 'fox', 'house and men', '20.0', np.NaN])
s1.str.contains('oX', case=True, regex=True)
Output:
0 False 1 False 2 False 3 False 4 NaN dtype: object
Specifying na to be False instead of NaN replaces NaN values with False. If Series or Index does not contain NaN values the resultant dtype will be bool, otherwise, an object dtype.
Python-Pandas Code:
import numpy as np
import pandas as pd
s1 = pd.Series(['Tiger', 'fox', 'house and men', '20', np.NaN])
ind = pd.Index(['Tiger', 'fox', 'house and men', '20.0', np.NaN])
s1.str.contains('ox', na=False, regex=True)
Output:
0 False 1 True 2 False 3 False 4 False dtype: bool
Example - Returning ‘house’ or ‘fox’ when either expression occurs in a string:
Python-Pandas Code:
import numpy as np
import pandas as pd
s1 = pd.Series(['Tiger', 'fox', 'house and men', '20', np.NaN])
ind = pd.Index(['Tiger', 'fox', 'house and men', '20.0', np.NaN])
s1.str.contains('house|fox', regex=True)
Output:
0 False 1 True 2 True 3 False 4 NaN dtype: object
Example - Ignoring case sensitivity using flags with regex:
Python-Pandas Code:
import numpy as np
import pandas as pd
s1 = pd.Series(['Tiger', 'fox', 'house and men', '20', np.NaN])
ind = pd.Index(['Tiger', 'fox', 'house and men', '20.0', np.NaN])
import re
s1.str.contains('MEN', flags=re.IGNORECASE, regex=True)
Output:
0 False 1 False 2 True 3 False 4 NaN dtype: object
Example - Returning any digit using regular expression:
Python-Pandas Code:
import numpy as np
import pandas as pd
s1 = pd.Series(['Tiger', 'fox', 'house and men', '20', np.NaN])
ind = pd.Index(['Tiger', 'fox', 'house and men', '20.0', np.NaN])
import re
s1.str.contains('\d', regex=True)
Output:
0 False 1 False 2 False 3 True 4 NaN dtype: object
Ensure pat is a not a literal pattern when regex is set to True. Note in the following example one might expect only s2[1] and s2[3] to return True. However, ‘.0’ as a regex matches any character followed by a 0
Python-Pandas Code:
import numpy as np
import pandas as pd
s1 = pd.Series(['Tiger', 'fox', 'house and men', '20', np.NaN])
ind = pd.Index(['Tiger', 'fox', 'house and men', '20.0', np.NaN])
import re
s2 = pd.Series(['60', '60.0', '61', '61.0', '45'])
s2.str.contains('.0', regex=True)
Output:
0 True 1 True 2 False 3 True 4 False dtype: bool
Previous: Series-str.cat() function
Next: Series-str.count() function
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics