Pandas: Replacing NaNs using Median/Mean of the column
Pandas Handling Missing Values: Exercise-14 with Solution
Write a Pandas program to replace NaNs with median or mean of the specified columns in a given DataFrame.
Test Data:
ord_no purch_amt sale_amt ord_date customer_id salesman_id 0 70001.0 150.50 10.50 2012-10-05 3002 5002.0 1 NaN NaN 20.65 2012-09-10 3001 5003.0 2 70002.0 65.26 NaN NaN 3001 5001.0 3 70004.0 110.50 11.50 2012-08-17 3003 NaN 4 NaN 948.50 98.50 2012-09-10 3002 5002.0 5 70005.0 NaN NaN 2012-07-27 3001 5001.0 6 NaN 5760.00 57.00 2012-09-10 3001 5001.0 7 70010.0 1983.43 19.43 2012-10-10 3004 NaN 8 70003.0 NaN NaN 2012-10-10 3003 5003.0 9 70012.0 250.45 25.45 2012-06-27 3002 5002.0 10 NaN 75.29 75.29 2012-08-17 3001 5003.0 11 70013.0 3045.60 35.60 2012-04-25 3001 NaN
Sample Solution:
Python Code :
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
df = pd.DataFrame({
'ord_no':[70001,np.nan,70002,70004,np.nan,70005,np.nan,70010,70003,70012,np.nan,70013],
'purch_amt':[150.5,np.nan,65.26,110.5,948.5,np.nan,5760,1983.43,np.nan,250.45, 75.29,3045.6],
'sale_amt':[10.5,20.65,np.nan,11.5,98.5,np.nan,57,19.43,np.nan,25.45, 75.29,35.6],
'ord_date': ['2012-10-05','2012-09-10',np.nan,'2012-08-17','2012-09-10','2012-07-27','2012-09-10','2012-10-10','2012-10-10','2012-06-27','2012-08-17','2012-04-25'],
'customer_id':[3002,3001,3001,3003,3002,3001,3001,3004,3003,3002,3001,3001],
'salesman_id':[5002,5003,5001,np.nan,5002,5001,5001,np.nan,5003,5002,5003,np.nan]})
print("Original Orders DataFrame:")
print(df)
print("Using median in purch_amt to replace NaN:")
df['purch_amt'].fillna(df['purch_amt'].median(), inplace=True)
print(df)
print("Using mean to replace NaN:")
df['sale_amt'].fillna(int(df['sale_amt'].mean()), inplace=True)
print(df)
Sample Output:
Original Orders DataFrame: ord_no purch_amt ... customer_id salesman_id 0 70001.0 150.50 ... 3002 5002.0 1 NaN NaN ... 3001 5003.0 2 70002.0 65.26 ... 3001 5001.0 3 70004.0 110.50 ... 3003 NaN 4 NaN 948.50 ... 3002 5002.0 5 70005.0 NaN ... 3001 5001.0 6 NaN 5760.00 ... 3001 5001.0 7 70010.0 1983.43 ... 3004 NaN 8 70003.0 NaN ... 3003 5003.0 9 70012.0 250.45 ... 3002 5002.0 10 NaN 75.29 ... 3001 5003.0 11 70013.0 3045.60 ... 3001 NaN [12 rows x 6 columns] Using median in purch_amt to replace NaN: ord_no purch_amt ... customer_id salesman_id 0 70001.0 150.50 ... 3002 5002.0 1 NaN 250.45 ... 3001 5003.0 2 70002.0 65.26 ... 3001 5001.0 3 70004.0 110.50 ... 3003 NaN 4 NaN 948.50 ... 3002 5002.0 5 70005.0 250.45 ... 3001 5001.0 6 NaN 5760.00 ... 3001 5001.0 7 70010.0 1983.43 ... 3004 NaN 8 70003.0 250.45 ... 3003 5003.0 9 70012.0 250.45 ... 3002 5002.0 10 NaN 75.29 ... 3001 5003.0 11 70013.0 3045.60 ... 3001 NaN [12 rows x 6 columns] Using mean to replace NaN: ord_no purch_amt ... customer_id salesman_id 0 70001.0 150.50 ... 3002 5002.0 1 NaN 250.45 ... 3001 5003.0 2 70002.0 65.26 ... 3001 5001.0 3 70004.0 110.50 ... 3003 NaN 4 NaN 948.50 ... 3002 5002.0 5 70005.0 250.45 ... 3001 5001.0 6 NaN 5760.00 ... 3001 5001.0 7 70010.0 1983.43 ... 3004 NaN 8 70003.0 250.45 ... 3003 5003.0 9 70012.0 250.45 ... 3002 5002.0 10 NaN 75.29 ... 3001 5003.0 11 70013.0 3045.60 ... 3001 NaN [12 rows x 6 columns]
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Pandas program to replace NaNs with the value from the previous row or the next row in a given DataFrame.
Next: Write a Pandas program to interpolate the missing values using the Linear Interpolation method in a given DataFrame.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://w3resource.com/python-exercises/pandas/missing-values/python-pandas-missing-values-exercise-14.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics