Merging DataFrames and removing duplicate rows in Pandas
Pandas: Custom Function Exercise-16 with Solution
Write a Pandas program to merge DataFrames and drop duplicates.
In this exercise, we have merged two DataFrames and then remove any duplicate rows that may arise from the merge.
Sample Solution :
Code :
import pandas as pd
# Create two sample DataFrames with potential duplicates
df1 = pd.DataFrame({
'ID': [1, 2, 3],
'Name': ['Annabel', 'Selena', 'Caeso']
})
df2 = pd.DataFrame({
'ID': [2, 3, 1],
'Name': ['Selena', 'Caeso', 'Annabel'],
'Age': [30, 22, 25]
})
# Merge the DataFrames on the 'ID' column
merged_df = pd.merge(df1, df2, on=['ID', 'Name'])
# Drop any duplicate rows from the merged DataFrame
merged_df_no_duplicates = merged_df.drop_duplicates()
# Output the result
print(merged_df_no_duplicates)
Output:
ID Name Age 0 1 Annabel 25 1 2 Selena 30 2 3 Caeso 22
Explanation:
- Created two DataFrames df1 and df2 with overlapping data.
- Merged the DataFrames on the 'ID' and 'Name' columns.
- Removed any duplicate rows in the merged DataFrame using drop_duplicates().
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics