Pandas - Detecting and removing outliers in a DataFrame using Z-score
Pandas: Data Cleaning and Preprocessing Exercise-5 with Solution
Write a Pandas program to handle outliers in a DataFrame with Z-Score method.
This exercise demonstrates how to identify and remove outliers from a DataFrame using the Z-score method.
Sample Solution :
Code :
import pandas as pd
# Create a sample DataFrame with outliers
df = pd.DataFrame({
'Name': ['David', 'Annabel', 'Charlie', 'David'],
'Age': [25, 30, 22, 99] # '99' is an outlier
})
# Calculate Z-scores to identify outliers
mean_age = df['Age'].mean()
std_age = df['Age'].std()
df['Z_Score'] = (df['Age'] - mean_age) / std_age
# Remove rows where Z-score is above 2 or below -2 (outliers)
df_no_outliers = df[df['Z_Score'].abs() <= 2]
# Drop the Z_Score column
df_no_outliers = df_no_outliers.drop(columns='Z_Score')
# Output the result
print(df_no_outliers)
Output:
Name Age 0 David 25 1 Annabel 30 2 Charlie 22 3 David 99
Explanation:
- Created a DataFrame with an outlier in the 'Age' column (99).
- Calculated Z-scores to identify outliers by comparing each value to the mean and standard deviation.
- Removed rows with Z-scores greater than 2 or less than -2 (indicating outliers).
- Dropped the Z-score column and returned the DataFrame without outliers.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://w3resource.com/python-exercises/pandas/pandas-detect-and-remove-outliers-using-z-score.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics