Pandas: Remove the duplicates of a specific column in a given dataframe
5. Duplicate Removal in 'WHO region'
Write a Pandas program to remove the duplicates from 'WHO region' column of World alcohol consumption dataset.
Test Data:
Year WHO region Country Beverage Types Display Value 0 1986 Western Pacific Viet Nam Wine 0.00 1 1986 Americas Uruguay Other 0.50 2 1985 Africa Cte d'Ivoire Wine 1.62 3 1986 Americas Colombia Beer 4.27 4 1987 Americas Saint Kitts and Nevis Beer 1.98
Sample Solution:
Python Code :
import pandas as pd
# World alcohol consumption data
w_a_con = pd.read_csv('world_alcohol.csv')
print("World alcohol consumption sample data:")
print(w_a_con.head())
print("\nAfter removing the duplicates of WHO region column:")
print(w_a_con.drop_duplicates('WHO region'))
Sample Output:
World alcohol consumption sample data:
   Year       WHO region      ...      Beverage Types Display Value
0  1986  Western Pacific      ...                Wine          0.00
1  1986         Americas      ...               Other          0.50
2  1985           Africa      ...                Wine          1.62
3  1986         Americas      ...                Beer          4.27
4  1987         Americas      ...                Beer          1.98
[5 rows x 5 columns]
After removing the duplicates of WHO region column:
    Year             WHO region      ...      Beverage Types Display Value
0   1986        Western Pacific      ...                Wine          0.00
1   1986               Americas      ...               Other          0.50
2   1985                 Africa      ...                Wine          1.62
13  1984  Eastern Mediterranean      ...               Other          0.00
18  1984                 Europe      ...             Spirits          1.62
20  1986        South-East Asia      ...                Wine          0.00
[6 rows x 5 columns]
Click to download world_alcohol.csv
For more Practice: Solve these Related Problems:
- Write a Pandas program to extract unique values from the 'WHO region' column and then sort them in descending order.
- Write a Pandas program to drop duplicate rows based solely on the 'WHO region' column while keeping the last occurrence.
- Write a Pandas program to identify duplicate entries in 'WHO region' and then create a new DataFrame with only the unique values.
- Write a Pandas program to remove duplicates in 'WHO region' and then count the frequency of each unique region in the dataset.
Go to:
PREV : Missing Value Handling.
NEXT :
 Filtering by Year.
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
