w3resource

Pandas: Remove the duplicates of a specific column in a given dataframe


5. Duplicate Removal in 'WHO region'

Write a Pandas program to remove the duplicates from 'WHO region' column of World alcohol consumption dataset.

Test Data:

   Year       WHO region                Country Beverage Types  Display Value
0  1986  Western Pacific               Viet Nam           Wine           0.00
1  1986         Americas                Uruguay          Other           0.50
2  1985           Africa           Cte d'Ivoire           Wine           1.62
3  1986         Americas               Colombia           Beer           4.27
4  1987         Americas  Saint Kitts and Nevis           Beer           1.98   

Sample Solution:

Python Code :

import pandas as pd
# World alcohol consumption data
w_a_con = pd.read_csv('world_alcohol.csv')
print("World alcohol consumption sample data:")
print(w_a_con.head())

print("\nAfter removing the duplicates of WHO region column:")
print(w_a_con.drop_duplicates('WHO region'))

Sample Output:

World alcohol consumption sample data:
   Year       WHO region      ...      Beverage Types Display Value
0  1986  Western Pacific      ...                Wine          0.00
1  1986         Americas      ...               Other          0.50
2  1985           Africa      ...                Wine          1.62
3  1986         Americas      ...                Beer          4.27
4  1987         Americas      ...                Beer          1.98

[5 rows x 5 columns]

After removing the duplicates of WHO region column:
    Year             WHO region      ...      Beverage Types Display Value
0   1986        Western Pacific      ...                Wine          0.00
1   1986               Americas      ...               Other          0.50
2   1985                 Africa      ...                Wine          1.62
13  1984  Eastern Mediterranean      ...               Other          0.00
18  1984                 Europe      ...             Spirits          1.62
20  1986        South-East Asia      ...                Wine          0.00

[6 rows x 5 columns]

Click to download world_alcohol.csv


For more Practice: Solve these Related Problems:

  • Write a Pandas program to extract unique values from the 'WHO region' column and then sort them in descending order.
  • Write a Pandas program to drop duplicate rows based solely on the 'WHO region' column while keeping the last occurrence.
  • Write a Pandas program to identify duplicate entries in 'WHO region' and then create a new DataFrame with only the unique values.
  • Write a Pandas program to remove duplicates in 'WHO region' and then count the frequency of each unique region in the dataset.

Python Code Editor:


Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous:Write a Pandas program to find and drop the missing values from World alcohol consumption dataset.
Next: Write a Pandas program to find out the alcohol consumption of a given year from the world alcohol consumption dataset.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.