w3resource

Merging DataFrames and removing duplicate rows in Pandas


Pandas: Custom Function Exercise-16 with Solution


Write a Pandas program to merge DataFrames and drop duplicates.

In this exercise, we have merged two DataFrames and then remove any duplicate rows that may arise from the merge.

Sample Solution :

Code :

import pandas as pd

# Create two sample DataFrames with potential duplicates
df1 = pd.DataFrame({
    'ID': [1, 2, 3],
    'Name': ['Annabel', 'Selena', 'Caeso']
})

df2 = pd.DataFrame({
    'ID': [2, 3, 1],
    'Name': ['Selena', 'Caeso', 'Annabel'],
    'Age': [30, 22, 25]
})

# Merge the DataFrames on the 'ID' column
merged_df = pd.merge(df1, df2, on=['ID', 'Name'])

# Drop any duplicate rows from the merged DataFrame
merged_df_no_duplicates = merged_df.drop_duplicates()

# Output the result
print(merged_df_no_duplicates)

Output:

   ID     Name  Age
0   1  Annabel   25
1   2   Selena   30
2   3    Caeso   22

Explanation:

  • Created two DataFrames df1 and df2 with overlapping data.
  • Merged the DataFrames on the 'ID' and 'Name' columns.
  • Removed any duplicate rows in the merged DataFrame using drop_duplicates().

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Become a Patron!

Follow us on Facebook and Twitter for latest update.