Pandas - Detecting and removing outliers in a DataFrame using Z-score
5. Handling Outliers with Z-Score Method
Write a Pandas program to handle outliers in a DataFrame with Z-Score method.
This exercise demonstrates how to identify and remove outliers from a DataFrame using the Z-score method.
Sample Solution :
Code :
import pandas as pd
# Create a sample DataFrame with outliers
df = pd.DataFrame({
'Name': ['David', 'Annabel', 'Charlie', 'David'],
'Age': [25, 30, 22, 99] # '99' is an outlier
})
# Calculate Z-scores to identify outliers
mean_age = df['Age'].mean()
std_age = df['Age'].std()
df['Z_Score'] = (df['Age'] - mean_age) / std_age
# Remove rows where Z-score is above 2 or below -2 (outliers)
df_no_outliers = df[df['Z_Score'].abs() <= 2]
# Drop the Z_Score column
df_no_outliers = df_no_outliers.drop(columns='Z_Score')
# Output the result
print(df_no_outliers)
Output:
Name Age 0 David 25 1 Annabel 30 2 Charlie 22 3 David 99
Explanation:
- Created a DataFrame with an outlier in the 'Age' column (99).
- Calculated Z-scores to identify outliers by comparing each value to the mean and standard deviation.
- Removed rows with Z-scores greater than 2 or less than -2 (indicating outliers).
- Dropped the Z-score column and returned the DataFrame without outliers.
For more Practice: Solve these Related Problems:
- Write a Pandas program to identify and remove outliers using Z-Score on a specific numeric column.
- Write a Pandas program to calculate the Z-Score for each row and filter out rows exceeding a given threshold.
- Write a Pandas program to visualize the distribution of Z-Scores and highlight potential outliers in a DataFrame.
- Write a Pandas program to replace detected outliers with the median value using the Z-Score method.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.