w3resource

Removing Duplicate Rows from a DataFrame Using Pandas


5. Removing Duplicate Rows from a DataFrame

Write a Pandas program to remove duplicate rows from a DataFrame.

This exercise demonstrates how to remove duplicate rows from a DataFrame using drop_duplicates().

Sample Solution :

Code :

import pandas as pd

# Create a sample DataFrame with duplicate rows
df = pd.DataFrame({
    'Name': ['Orville', 'Arturo', 'Ruth', 'Orville'],
    'Age': [25, 30, 22, 25],
    'Salary': [50000, 60000, 70000, 50000]
})

# Remove duplicate rows
df_no_duplicates = df.drop_duplicates()

# Output the result
print(df_no_duplicates)

Output:

      Name  Age  Salary
0  Orville   25   50000
1   Arturo   30   60000
2     Ruth   22   70000

Explanation:

  • Created a DataFrame with some duplicate rows.
  • Used drop_duplicates() to remove duplicate rows.
  • Returned the DataFrame without duplicates.

For more Practice: Solve these Related Problems:

  • Write a Pandas program to remove duplicate rows from a DataFrame while keeping the last occurrence of each duplicate.
  • Write a Pandas program to drop duplicate rows based on specific columns and verify the reduction in DataFrame size.
  • Write a Pandas program to remove duplicate rows and then reset the DataFrame index.
  • Write a Pandas program to remove duplicate rows and compare summary statistics before and after the operation.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.