w3resource

Pandas Practice Set-1: Calculate various summary statistics of cut series of diamonds DataFrame


36. Calculate Summary Statistics of 'cut' Series

Write a Pandas program to calculate various summary statistics of cut series of diamonds DataFrame.

Sample Solution:

Python Code:

import pandas as pd
diamonds = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/diamonds.csv')
print("Original Dataframe:")
print(diamonds.head())
print("\nVarious summary statistics of diamonds DataFrame:")
print(diamonds.carat.describe())

Sample Output:

Original Dataframe:
   carat      cut color clarity  depth  table  price     x     y     z
0   0.23    Ideal     E     SI2   61.5   55.0    326  3.95  3.98  2.43
1   0.21  Premium     E     SI1   59.8   61.0    326  3.89  3.84  2.31
2   0.23     Good     E     VS1   56.9   65.0    327  4.05  4.07  2.31
3   0.29  Premium     I     VS2   62.4   58.0    334  4.20  4.23  2.63
4   0.31     Good     J     SI2   63.3   58.0    335  4.34  4.35  2.75

Various summary statistics of diamonds DataFrame:
count    53940.000000
mean         0.797940
std          0.474011
min          0.200000
25%          0.400000
50%          0.700000
75%          1.040000
max          5.010000
Name: carat, dtype: float64

For more Practice: Solve these Related Problems:

  • Write a Pandas program to compute summary statistics (mean, median, mode) for the 'cut' series after converting it to categorical codes.
  • Write a Pandas program to calculate descriptive statistics of the 'cut' series and then plot a box plot for its frequency distribution.
  • Write a Pandas program to generate a summary (describe()) of the 'cut' column and then manually compute the range.
  • Write a Pandas program to calculate and display various summary metrics for the 'cut' column using both built-in and custom functions.

Go to:


Previous: Write a Pandas program to compute a cross-tabulation of two Series in diamonds DataFrame.
Next: Write a Pandas program to create a histogram of the 'carat' Series (distribution of a numerical variable) of diamonds DataFrame.

Python Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?



Follow us on Facebook and Twitter for latest update.