w3resource

Applying Log transformation to Skewed data using Pandas


Pandas: Machine Learning Integration Exercise-17 with Solution


Write a Pandas program that applies Log Transformation to Skewed Data.

This exercise shows how to apply a log transformation to skewed numerical data to normalize its distribution

Sample Solution :

Code :

import pandas as pd
import numpy as np

# Load the dataset
df = pd.read_csv('data.csv')

# Apply log transformation to the 'Salary' column
df['Log_Salary'] = np.log(df['Salary'] + 1)  # Adding 1 to avoid log(0)

# Output the transformed dataset
print(df[['Salary', 'Log_Salary']])

Output:

    Salary  Log_Salary
0  50000.0   10.819798
1  60000.0   11.002117
2  70000.0   11.156265
3  80000.0   11.289794
4  55000.0   10.915107
5      NaN         NaN

Explanation:

  • Loaded the dataset using Pandas.
  • Applied log transformation to the 'Salary' column to reduce skewness.
  • Displayed the original and log-transformed columns.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Become a Patron!

Follow us on Facebook and Twitter for latest update.