Applying Log transformation to Skewed data using Pandas
Pandas: Machine Learning Integration Exercise-17 with Solution
Write a Pandas program that applies Log Transformation to Skewed Data.
This exercise shows how to apply a log transformation to skewed numerical data to normalize its distribution
Sample Solution :
Code :
import pandas as pd
import numpy as np
# Load the dataset
df = pd.read_csv('data.csv')
# Apply log transformation to the 'Salary' column
df['Log_Salary'] = np.log(df['Salary'] + 1) # Adding 1 to avoid log(0)
# Output the transformed dataset
print(df[['Salary', 'Log_Salary']])
Output:
Salary Log_Salary 0 50000.0 10.819798 1 60000.0 11.002117 2 70000.0 11.156265 3 80000.0 11.289794 4 55000.0 10.915107 5 NaN NaN
Explanation:
- Loaded the dataset using Pandas.
- Applied log transformation to the 'Salary' column to reduce skewness.
- Displayed the original and log-transformed columns.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics