Pandas Practice Set-1: Read the diamonds DataFrame and detect duplicate color

Last update on September 13 2025 12:46:46 (UTC/GMT +8 hours)

64. Read Diamonds DataFrame and Detect Duplicate 'color'

Write a Pandas program to read the diamonds DataFrame and detect duplicate color.

Note: duplicated () function returns boolean Series denoting duplicate rows, optionally only considering certain columns.

Sample Solution:

Python Code:

import pandas as pd
diamonds = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/diamonds.csv')
print("Original Dataframe:")
print(diamonds.shape)
print("\nCount the duplicate items:")
print(diamonds.clarity.duplicated().sum())

Sample Output:

Original Dataframe:
(53940, 10)

Count the duplicate items:
53932

For more Practice: Solve these Related Problems:

Write a Pandas program to identify duplicate entries in the 'color' column of the diamonds DataFrame and display a boolean mask.
Write a Pandas program to flag rows in the diamonds DataFrame where the 'color' value has already appeared.
Write a Pandas program to detect and print the indices of duplicate 'color' values in the diamonds DataFrame.
Write a Pandas program to create a new column indicating whether the 'color' value is a duplicate in the diamonds dataset.

Go to:

PREV : Sample 75% of Rows Without Replacement and Store 25% Separately.
NEXT : Count Duplicate Rows in Diamonds DataFrame.

Have another way to solve this solution? Contribute your code (and comments) through Disqus.