w3resource

Pandas - Detecting and removing outliers in a DataFrame using Z-score


Pandas: Data Cleaning and Preprocessing Exercise-5 with Solution


Write a Pandas program to handle outliers in a DataFrame with Z-Score method.

This exercise demonstrates how to identify and remove outliers from a DataFrame using the Z-score method.

Sample Solution :

Code :

import pandas as pd

# Create a sample DataFrame with outliers
df = pd.DataFrame({
    'Name': ['David', 'Annabel', 'Charlie', 'David'],
    'Age': [25, 30, 22, 99]  # '99' is an outlier
})

# Calculate Z-scores to identify outliers
mean_age = df['Age'].mean()
std_age = df['Age'].std()
df['Z_Score'] = (df['Age'] - mean_age) / std_age

# Remove rows where Z-score is above 2 or below -2 (outliers)
df_no_outliers = df[df['Z_Score'].abs() <= 2]

# Drop the Z_Score column
df_no_outliers = df_no_outliers.drop(columns='Z_Score')

# Output the result
print(df_no_outliers)

Output:

      Name  Age
0    David   25
1  Annabel   30
2  Charlie   22
3    David   99

Explanation:

  • Created a DataFrame with an outlier in the 'Age' column (99).
  • Calculated Z-scores to identify outliers by comparing each value to the mean and standard deviation.
  • Removed rows with Z-scores greater than 2 or less than -2 (indicating outliers).
  • Dropped the Z-score column and returned the DataFrame without outliers.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Become a Patron!

Follow us on Facebook and Twitter for latest update.