Pandas: Extract year between 1800 to 2200 from the specified column of a given DataFrame
Pandas: String and Regular Expression Exercise-29 with Solution
Write a Pandas program to extract year between 1800 to 2200 from the specified column of a given DataFrame.
Sample Solution:
Python Code :
import pandas as pd
import re as re
pd.set_option('display.max_columns', 10)
df = pd.DataFrame({
'company_code': ['c0001','c0002','c0003', 'c0003', 'c0004'],
'year': ['year 1800','year 1700','year 2300', 'year 1900', 'year 2200']
})
print("Original DataFrame:")
print(df)
def find_year(text):
#line=re.findall(r"\b(18[0][0]|2[0-2][00])\b",text)
result = re.findall(r"\b(18[0-9]{2}|19[0-8][0-9]|199[0-9]|2[01][0-9]{2}|2200)\b",text)
return result
df['year_range']=df['year'].apply(lambda x: find_year(x))
print("\Extracting year between 1800 to 2200:")
print(df)
Sample Output:
Original DataFrame: company_code year 0 c0001 year 1800 1 c0002 year 1700 2 c0003 year 2300 3 c0003 year 1900 4 c0004 year 2200 \Extracting year between 1800 to 2200: company_code year year_range 0 c0001 year 1800 [1800] 1 c0002 year 1700 [] 2 c0003 year 2300 [] 3 c0003 year 1900 [1900] 4 c0004 year 2200 [2200]
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Pandas program to extract only phone number from the specified column of a given DataFrame.
Next: Write a Pandas program to extract only non alphanumeric characters from the specified column of a given DataFrame.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://w3resource.com/python-exercises/pandas/string/python-pandas-string-exercise-29.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics