Removing columns with too many missing values using dropna() in Pandas
Pandas: Data Cleaning and Preprocessing Exercise-13 with Solution
Write a Pandas program to remove columns with too many missing values.
Following exercise removes columns that contain too many missing values using dropna().
Sample Solution :
Code :
import pandas as pd
# Create a sample DataFrame with missing values
df = pd.DataFrame({
'Name': ['Selena', 'Annabel', 'Caeso'],
'Age': [25, None, 22],
'Salary': [None, None, 70000]
})
# Remove columns with more than 50% missing values
df_cleaned = df.dropna(thresh=2, axis=1)
# Output the result
print(df_cleaned)
Output:
Name Age 0 Selena 25.0 1 Annabel NaN 2 Caeso 22.0
Explanation:
- Created a DataFrame with multiple columns containing missing values.
- Used dropna(thresh=2, axis=1) to remove columns with more than 50% missing values.
- Returned the DataFrame with only columns that have sufficient data.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics