Removing Duplicate Rows from a DataFrame Using Pandas

Last update on April 09 2025 12:55:40 (UTC/GMT +8 hours)

5. Removing Duplicate Rows from a DataFrame

Write a Pandas program to remove duplicate rows from a DataFrame.

This exercise demonstrates how to remove duplicate rows from a DataFrame using drop_duplicates().

Sample Solution :

Code :

import pandas as pd

# Create a sample DataFrame with duplicate rows
df = pd.DataFrame({
    'Name': ['Orville', 'Arturo', 'Ruth', 'Orville'],
    'Age': [25, 30, 22, 25],
    'Salary': [50000, 60000, 70000, 50000]
})

# Remove duplicate rows
df_no_duplicates = df.drop_duplicates()

# Output the result
print(df_no_duplicates)

Output:

      Name  Age  Salary
0  Orville   25   50000
1   Arturo   30   60000
2     Ruth   22   70000

Explanation:

Created a DataFrame with some duplicate rows.
Used drop_duplicates() to remove duplicate rows.
Returned the DataFrame without duplicates.

For more Practice: Solve these Related Problems:

Write a Pandas program to remove duplicate rows from a DataFrame while keeping the last occurrence of each duplicate.
Write a Pandas program to drop duplicate rows based on specific columns and verify the reduction in DataFrame size.
Write a Pandas program to remove duplicate rows and then reset the DataFrame index.
Write a Pandas program to remove duplicate rows and compare summary statistics before and after the operation.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.