Pandas: Remove the duplicates of a specific column in a given dataframe
5. Duplicate Removal in 'WHO region'
Write a Pandas program to remove the duplicates from 'WHO region' column of World alcohol consumption dataset.
Test Data:
Year WHO region Country Beverage Types Display Value 0 1986 Western Pacific Viet Nam Wine 0.00 1 1986 Americas Uruguay Other 0.50 2 1985 Africa Cte d'Ivoire Wine 1.62 3 1986 Americas Colombia Beer 4.27 4 1987 Americas Saint Kitts and Nevis Beer 1.98
Sample Solution:
Python Code :
import pandas as pd
# World alcohol consumption data
w_a_con = pd.read_csv('world_alcohol.csv')
print("World alcohol consumption sample data:")
print(w_a_con.head())
print("\nAfter removing the duplicates of WHO region column:")
print(w_a_con.drop_duplicates('WHO region'))
Sample Output:
World alcohol consumption sample data: Year WHO region ... Beverage Types Display Value 0 1986 Western Pacific ... Wine 0.00 1 1986 Americas ... Other 0.50 2 1985 Africa ... Wine 1.62 3 1986 Americas ... Beer 4.27 4 1987 Americas ... Beer 1.98 [5 rows x 5 columns] After removing the duplicates of WHO region column: Year WHO region ... Beverage Types Display Value 0 1986 Western Pacific ... Wine 0.00 1 1986 Americas ... Other 0.50 2 1985 Africa ... Wine 1.62 13 1984 Eastern Mediterranean ... Other 0.00 18 1984 Europe ... Spirits 1.62 20 1986 South-East Asia ... Wine 0.00 [6 rows x 5 columns]
Click to download world_alcohol.csv
For more Practice: Solve these Related Problems:
- Write a Pandas program to extract unique values from the 'WHO region' column and then sort them in descending order.
- Write a Pandas program to drop duplicate rows based solely on the 'WHO region' column while keeping the last occurrence.
- Write a Pandas program to identify duplicate entries in 'WHO region' and then create a new DataFrame with only the unique values.
- Write a Pandas program to remove duplicates in 'WHO region' and then count the frequency of each unique region in the dataset.
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous:Write a Pandas program to find and drop the missing values from World alcohol consumption dataset.
Next: Write a Pandas program to find out the alcohol consumption of a given year from the world alcohol consumption dataset.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.