Removing outliers from the Dataset using Z-Score method in Pandas
Pandas: Machine Learning Integration Exercise-10 with Solution
Write a Pandas program that removes outliers from a Dataset.
This exercise demonstrates how to remove outliers from a dataset using the Z-score method.
Sample Solution :
Code :
import pandas as pd
import numpy as np
from scipy import stats
# Load the dataset
df = pd.read_csv('data.csv')
# Remove outliers from the 'Age' column using Z-scores
z_scores = np.abs(stats.zscore(df['Age']))
df_cleaned = df[z_scores < 3] # Keep rows where Z-score is less than 3
# Output the cleaned dataset
print(df_cleaned)
Output:
Empty DataFrame Columns: [ID, Name, Age, Gender, Salary, Target] Index: []
Explanation:
- Loaded the dataset using Pandas.
- Calculated the Z-scores of the 'Age' column using stats.zscore().
- Removed rows where the Z-score was greater than 3 (indicating outliers).
- Displayed the cleaned dataset.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://w3resource.com/python-exercises/pandas/pandas-remove-outliers-from-the-dataset-using-z-score-method.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics