Mastering Pandas: 100 Exercises with solutions for Python data analysis

Last update on December 21 2024 07:41:47 (UTC/GMT +8 hours)

Welcome to w3resource's 100 Pandas exercises collection! This comprehensive set of exercises is designed to help you master the fundamentals of Pandas, a powerful data manipulation and analysis library in Python. Whether you're a beginner or an experienced user looking to improve your skills, these exercises cover a wide range of topics. They provide practical challenges to enhance your Pandas understanding.

[An editor is available at the bottom of the page to write and execute the scripts. Go to the editor]

Exercise 1:

Create a DataFrame from a dictionary of lists.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df)

Output:

Exercise 2:

Select the first 3 rows of a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df.head(3))

Output:

Exercise 3:

Select the 'X' column from a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df['X'])

Output:

0    1
1    2
2    3
3    4
Name: X, dtype: int64

Exercise 4:

Filter rows based on a column condition.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
filtered_df = df[df['X'] > 2]
print(filtered_df)

Output:

   X  Y
2  3  7
3  4  8

Exercise 5:

Add a new column to an existing DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
df['Z'] = df['X'] + df['Y']
print(df)

Output:

   X  Y   Z
0  1  5   6
1  2  6   8
2  3  7  10
3  4  8  12

Exercise 6:

Remove a column from a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8], 'Z': [9, 10, 11, 12]}
df = pd.DataFrame(data)
df.drop(columns=['Z'], inplace=True)
print(df)

Output:

Exercise 7:

Sort a DataFrame by a column.

Solution:

import pandas as pd
data = {'X': [4, 3, 2, 1], 'Y': [8, 7, 6, 5]}
df = pd.DataFrame(data)
df.sort_values(by='X', inplace=True)
print(df)

Output:

Exercise 8:

Group a DataFrame by a column and calculate the mean of each group.

Solution:

import pandas as pd
data = {'X': [1, 2, 1, 2], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
grouped_df = df.groupby('X').mean()
print(grouped_df)

Output:

Exercise 9:

Replace missing values in a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [5, None, 7, 8]}
df = pd.DataFrame(data)
df.fillna(0, inplace=True)
print(df)

Output:

     X    Y
0  1.0  5.0
1  2.0  0.0
2  0.0  7.0
3  4.0  8.0

Exercise 10:

Convert a column to datetime.

Solution:

import pandas as pd
data = {'X': ['2020-01-01', '2020-01-02', '2020-01-03']}
df = pd.DataFrame(data)
df['X'] = pd.to_datetime(df['X'])
print(df)

Output:

           X
0 2020-01-01
1 2020-01-02
2 2020-01-03

Exercise 11:

Create a DataFrame with specific column names.

Solution:

import pandas as pd
data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)
print(df)

Output:

   col1  col2
0     1     4
1     2     5
2     3     6

Exercise 12:

Calculate the sum of values in each column.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.sum())

Output:

X     6
Y    15
dtype: int64

Exercise 13:

Calculate the mean of values in each row.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.mean(axis=1))

Output:

0    2.5
1    3.5
2    4.5
dtype: float64

Exercise 14:

Concatenate two DataFrames.

Solution:

import pandas as pd
data1 = {'X': [1, 2, 3]}
data2 = {'Y': [4, 5, 6]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
concatenated_df = pd.concat([df1, df2], axis=1)
print(concatenated_df)

Output:

Exercise 15:

Merge two DataFrames on a key.

Solution:

import pandas as pd
data1 = {'key': ['X', 'Y', 'Z'], 'value1': [1, 2, 3]}
data2 = {'key': ['X', 'Y', 'D'], 'value2': [4, 5, 6]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
merged_df = pd.merge(df1, df2, on='key')
print(merged_df)

Output:

  key  value1  value2
0   X       1       4
1   Y       2       5

Exercise 16:

Create a pivot table from a DataFrame.

Solution:

import pandas as pd
data = {'X': ['foo', 'foo', 'bar', 'bar'], 'Y': ['one', 'two', 'one', 'two'], 'Z': [1, 2, 3, 4]}
df = pd.DataFrame(data)
pivot_table = df.pivot_table(values='Z', index='X', columns='Y')
print(pivot_table)

Output:

Y    one  two
X            
bar  3.0  4.0
foo  1.0  2.0

Exercise 17:

Reshape a DataFrame from long to wide format.

Solution:

import pandas as pd
data = {'X': ['foo', 'foo', 'bar', 'bar'], 'Y': ['one', 'two', 'one', 'two'], 'Z': [1, 2, 3, 4]}
df = pd.DataFrame(data)
wide_df = df.pivot(index='X', columns='Y', values='Z')
print(wide_df)

Output:

Y    one  two
X            
bar    3    4
foo    1    2

Exercise 18:

Calculate the correlation between columns in a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 3, 2, 1]}
df = pd.DataFrame(data)
correlation = df.corr()
print(correlation)

Output:

     X    Y
X  1.0 -1.0
Y -1.0  1.0

Exercise 19:

Iterate over rows in a DataFrame using iterrows().

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
for index, row in df.iterrows():
    print(index, row['X'], row['Y'])

Output:

0 1 4
1 2 5
2 3 6

Exercise 20:

Apply a function to each element in a DataFrame.

Solution:

import pandas as pd  # Import the Pandas library
# Create a sample DataFrame
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
# Apply a function to each element using the map method
df = df.apply(lambda col: col.map(lambda x: x * 2))
print(df)

Output:

Exercise 21:

Create a DataFrame from a list of dictionaries.

Solution:

import pandas as pd
data = [{'X': 1, 'Y': 2}, {'X': 3, 'Y': 4}]
df = pd.DataFrame(data)
print(df)

Output:

   X  Y
0  1  2
1  3  4

Exercise 22:

Rename columns in a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.rename(columns={'X': 'X', 'Y': 'Y'}, inplace=True)
print(df)

Output:

Exercise 23:

Filter rows by multiple conditions.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
filtered_df = df[(df['X'] > 2) & (df['Y'] < 7)]
print(filtered_df)

Output:

   X  Y
2  3  6

Exercise 24:

Calculate the cumulative sum of a column.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Sum'] = df['X'].cumsum()
print(df)

Output:

   X  Cumulative_Sum
0  1               1
1  2               3
2  3               6
3  4              10

Exercise 25:

Drop rows with missing values.

Solution:

import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [4, 5, 6, None]}
df = pd.DataFrame(data)
df.dropna(inplace=True)
print(df)

Output:

     X    Y
0  1.0  4.0
1  2.0  5.0

Exercise 26:

Replace values in a DataFrame based on a condition.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
df.loc[df['X'] > 2, 'Y'] = 0
print(df)

Output:

Exercise 27:

Create a DataFrame with a MultiIndex.

Solution:

import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
print(df)

Output:

              Value
Group Number       
X     1          10
      2          20
Y     1          30
      2          40

Exercise 28:

Calculate the rolling mean of a column.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Rolling_Mean'] = df['X'].rolling(window=3).mean()
print(df)

Output:

   X  Rolling_Mean
0  1           NaN
1  2           NaN
2  3           2.0
3  4           3.0
4  5           4.0
5  6           5.0

Exercise 29:

Create a DataFrame from a list of tuples.

Solution:

import pandas as pd
data = [(1, 2), (3, 4), (5, 6)]
df = pd.DataFrame(data, columns=['X', 'Y'])
print(df)

Output:

Exercise 30:

Add a row to a DataFrame.

Solution:

import pandas as pd  # Import the Pandas library
# Create a sample DataFrame
data = {'X': [1, 2], 'Y': [3, 4]}
df = pd.DataFrame(data)

# Create a new row as a DataFrame
new_row = pd.DataFrame({'X': [5], 'Y': [6]})
# Concatenate the new row to the DataFrame
df = pd.concat([df, new_row], ignore_index=True)
print(df)

Output:

Exercise 31:

Create a DataFrame with random values.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df)

Output:

          X         Y         Z
0  0.688292  0.950264  0.665916
1  0.497719  0.840536  0.923938
2  0.285218  0.091178  0.722034
3  0.037824  0.248689  0.584696

Exercise 32:

Calculate the rank of values in a DataFrame.

Solution:

import pandas as pd
data = {'X': [3, 1, 4, 1], 'Y': [2, 3, 1, 4]}
df = pd.DataFrame(data)
df['Rank'] = df['X'].rank()
print(df)

Output:

   X  Y  Rank
0  3  2   3.0
1  1  3   1.5
2  4  1   4.0
3  1  4   1.5

Exercise 33:

Change the data type of a column.

Solution:

import pandas as pd
data = {'X': ['1', '2', '3']}
df = pd.DataFrame(data)
df['X'] = df['X'].astype(int)
print(df)

Output:

Exercise 34:

Filter rows based on string matching.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba')]
print(filtered_df)

Output:

    X
1  bar
2  baz

Exercise 35:

Create a DataFrame with specified row and column labels.

Solution:

import pandas as pd
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
df = pd.DataFrame(data, index=['row1', 'row2', 'row3'], columns=['col1', 'col2', 'col3'])
print(df)

Output:

       col1  col2  col3
row1     1     2     3
row2     4     5     6
row3     7     8     9

Exercise 36:

Transpose a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
transposed_df = df.T
print(transposed_df)

Output:

   0  1  2
X  1  2  3
Y  4  5  6

Exercise 37:

Set a column as the index of a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.set_index('X', inplace=True)
print(df)

Output:

Exercise 38:

Reset the index of a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.set_index('X', inplace=True)
df.reset_index(inplace=True)
print(df)

Output:

Exercise 39:

Add a prefix or suffix to column names.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df = df.add_prefix('col_')
print(df)

Output:

   col_X  col_Y
0      1      4
1      2      5
2      3      6

Exercise 40:

Filter rows based on datetime index.

Solution:

import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=5, freq='D')
data = {'X': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data, index=date_range)
filtered_df = df['2020-01-03':'2020-01-05']
print(filtered_df)

Output:

            X
2020-01-03  3
2020-01-04  4
2020-01-05  5

Exercise 41:

Create a DataFrame with duplicate rows and remove duplicates.

Solution:

import pandas as pd
data = {'X': [1, 2, 2, 3], 'Y': [4, 5, 5, 6]}
df = pd.DataFrame(data)
df.drop_duplicates(inplace=True)
print(df)

Output:

Exercise 42:

Create a DataFrame with hierarchical index.

Solution:

import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
print(df)

Output:

              Value
Group Number       
X     1          10
      2          20
Y     1          30
      2          40

Exercise 43:

Calculate the difference between consecutive rows in a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 3, 6, 10]}
df = pd.DataFrame(data)
df['Difference'] = df['X'].diff()
print(df)

Output:

    X  Difference
0   1         NaN
1   3         2.0
2   6         3.0
3  10         4.0

Exercise 44:

Create a DataFrame with hierarchical columns.

Solution:

import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], ['C1', 'C2', 'C1', 'C2']]
columns = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Type'))
data = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
df = pd.DataFrame(data, columns=columns)
print(df)

Output:

Group  X       Y    
Type  C1  C2  C1  C2
0      1   2   3   4
1      5   6   7   8
2      9  10  11  12

Exercise 45:

Filter rows based on the length of strings in a column.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.len() > 3]
print(filtered_df)

Output:

Empty DataFrame
Columns: [X]
Index: []

Exercise 46:

Calculate the percentage change between rows in a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Pct_Change'] = df['X'].pct_change()
print(df)

Output:

   X  Pct_Change
0  1         NaN
1  2    1.000000
2  3    0.500000
3  4    0.333333

Exercise 47:

Create a DataFrame from a dictionary of Series.

Solution:

import pandas as pd
data = {'X': pd.Series([1, 2, 3]), 'Y': pd.Series([4, 5, 6])}
df = pd.DataFrame(data)
print(df)

Output:

Exercise 48:

Filter rows based on whether a column value is in a list.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
filtered_df = df[df['X'].isin([2, 3])]
print(filtered_df)

Output:

   X  Y
1  2  6
2  3  7

Exercise 49:

Calculate the z-score of values in a DataFrame.

Solution:

import pandas as pd
import numpy as np
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
df['zscore_A'] = (df['X'] - np.mean(df['X'])) / np.std(df['X'])
print(df)

Output:

   X  Y  zscore_A
0  1  4 -1.341641
1  2  5 -0.447214
2  3  6  0.447214
3  4  7  1.341641

Exercise 50:

Create a DataFrame with random integers and calculate descriptive statistics.

Solution:

import pandas as pd
import numpy as np
data = np.random.randint(1, 100, size=(5, 3))
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.describe())

Output:

               X          Y          Z
count   5.000000   5.000000   5.000000
mean   60.600000  71.800000  42.600000
std    38.435661  13.971399  12.218838
min     5.000000  53.000000  28.000000
25%    40.000000  64.000000  34.000000
50%    69.000000  72.000000  41.000000
75%    91.000000  82.000000  55.000000
max    98.000000  88.000000  55.000000

Exercise 51:

Calculate the rank of values in each column of a DataFrame.

Solution:

import pandas as pd
data = {'X': [3, 1, 4, 1], 'Y': [2, 3, 1, 4]}
df = pd.DataFrame(data)
df['Rank_A'] = df['X'].rank()
df['Rank_B'] = df['Y'].rank()
print(df)

Output:

   X  Y  Rank_A  Rank_B
0  3  2     3.0     2.0
1  1  3     1.5     3.0
2  4  1     4.0     1.0
3  1  4     1.5     4.0

Exercise 52:

Filter rows based on multiple string conditions.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba|qu')]
print(filtered_df)

Output:

     X
1  bar
2  baz
3  qux

Exercise 53:

Create a DataFrame with random values and calculate the skewness.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba|qu')]
print(filtered_df)

Output:

     X
1  bar
2  baz
3  qux

Exercise 54:

Create a DataFrame and calculate the kurtosis.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.kurt())

Output:

X    2.958407
Y   -2.639654
Z    2.704430
dtype: float64

Exercise 55:

Calculate the cumulative product of a column in a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Product'] = df['X'].cumprod()
print(df)

Output:

   X  Cumulative_Product
0  1                   1
1  2                   2
2  3                   6
3  4                  24

Exercise 56:

Create a DataFrame and calculate the rolling standard deviation.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Rolling_Std'] = df['X'].rolling(window=3).std()
print(df)

Output:

   X  Rolling_Std
0  1          NaN
1  2          NaN
2  3          1.0
3  4          1.0
4  5          1.0
5  6          1.0

Exercise 57:

Create a DataFrame and calculate the expanding mean.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Mean'] = df['X'].expanding().mean()
print(df)

Output:

   X  Expanding_Mean
0  1             1.0
1  2             1.5
2  3             2.0
3  4             2.5
4  5             3.0
5  6             3.5

Exercise 58:

Create a DataFrame with random values and calculate the covariance matrix.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.cov())

Output:

          X         Y         Z
X  0.054079  0.007398 -0.031403
Y  0.007398  0.053211 -0.020480
Z -0.031403 -0.020480  0.048057

Exercise 59:

Create a DataFrame with random values and calculate the correlation matrix.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.corr())

Output:

               X              Y             Z
X  1.000000 -0.258187  0.541044
Y -0.258187  1.000000 -0.432419
Z  0.541044 -0.432419  1.000000

Exercise 60:

Create a DataFrame and calculate the rolling correlation between two columns.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6], 'Y': [6, 5, 4, 3, 2, 1]}
df = pd.DataFrame(data)
df['Rolling_Corr'] = df['X'].rolling(window=3).corr(df['Y'])
print(df)

Output:

   X  Y  Rolling_Corr
0  1  6           NaN
1  2  5           NaN
2  3  4          -1.0
3  4  3          -1.0
4  5  2          -1.0
5  6  1          -1.0

Exercise 61:

Create a DataFrame and calculate the expanding variance.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Var'] = df['X'].expanding().var()
print(df)

Output:

   X  Expanding_Var
0  1            NaN
1  2       0.500000
2  3       1.000000
3  4       1.666667
4  5       2.500000
5  6       3.500000

Exercise 62:

Create a DataFrame with datetime index and resample by month.

Solution:

import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=100, freq='D')
data = {'X': range(100)}
df = pd.DataFrame(data, index=date_range)
monthly_df = df.resample('M').sum()
print(monthly_df)

Output:

               X
2020-01-31   465
2020-02-29  1305
2020-03-31  2325
2020-04-30   855

Exercise 63:

Create a DataFrame and calculate the exponential moving average.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['EMA'] = df['X'].ewm(span=3, adjust=False).mean()
print(df)

Output:

   X      EMA
0  1  1.00000
1  2  1.50000
2  3  2.25000
3  4  3.12500
4  5  4.06250
5  6  5.03125

Exercise 64:

Create a DataFrame with random integers and calculate the mode.

Solution:

import pandas as pd
import numpy as np
data = np.random.randint(1, 10, size=(5, 3))
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.mode())

Output:

    X    Y    Z
0  2  1.0  2.0
1  3  3.0  7.0
2  5  NaN  NaN
3  6  NaN  NaN
4  9  NaN  NaN

Exercise 65:

Create a DataFrame and calculate the z-score of each column.

Solution:

import pandas as pd
import numpy as np
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
df['zscore_A'] = (df['X'] - np.mean(df['X'])) / np.std(df['X'])
df['zscore_B'] = (df['Y'] - np.mean(df['Y'])) / np.std(df['Y'])
print(df)

Output:

    X  Y  zscore_A  zscore_B
0  1  4 -1.341641 -1.341641
1  2  5 -0.447214 -0.447214
2  3  6  0.447214  0.447214
3  4  7  1.341641  1.341641

Exercise 66:

Create a DataFrame with random values and calculate the median.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.median())

Output:

X    0.787042
Y    0.477837
Z    0.696911
dtype: float64

Exercise 67:

Create a DataFrame and apply a custom function to each column.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df = df.apply(lambda x: x + 1)
print(df)

Output:

Exercise 68:

Create a DataFrame with hierarchical index and calculate the mean for each group.

Solution:

import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
grouped_df = df.groupby('Group').mean()
print(grouped_df)

Output:

         Value
Group       
X       15.0
Y       35.0

Exercise 69:

Create a DataFrame and calculate the percentage of missing values in each column.

Solution:

import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [4, None, 6, 8]}
df = pd.DataFrame(data)
missing_percentage = df.isnull().mean() * 100
print(missing_percentage)

Output:

X    25.0
Y    25.0
dtype: float64

Exercise 70:

Create a DataFrame and apply a custom function to each row.

Solution:

import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df['Sum'] = df.apply(lambda row: row['X'] + row['Y'], axis=1)
print(df)

Output:

   X  Y  Sum
0  1  4    5
1  2  5    7
2  3  6    9

Exercise 71:

Create a DataFrame with random values and calculate the quantiles.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.quantile([0.25, 0.5, 0.75]))

Output:

            X          Y         Z
0.25  0.174265  0.184036  0.520573
0.50  0.468040  0.315593  0.644571
0.75  0.767870  0.436426  0.771297

Exercise 72:

Create a DataFrame and calculate the interquartile range (IQR).

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1
print(IQR)

Output:

X    0.354244
Y    0.329573
Z    0.245520
dtype: float64

Exercise 73:

Create a DataFrame with datetime index and calculate the rolling mean.

Solution:

import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=10, freq='D')
data = {'X': range(10)}
df = pd.DataFrame(data, index=date_range)
df['Rolling_Mean'] = df['X'].rolling(window=3).mean()
print(df)

Output:

                  X  Rolling_Mean
2020-01-01  0           NaN
2020-01-02  1           NaN
2020-01-03  2           1.0
2020-01-04  3           2.0
2020-01-05  4           3.0
2020-01-06  5           4.0
2020-01-07  6           5.0
2020-01-08  7           6.0
2020-01-09  8           7.0
2020-01-10  9           8.0

Exercise 74:

Create a DataFrame and calculate the cumulative maximum.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 2, 1]}
df = pd.DataFrame(data)
df['Cumulative_Max'] = df['X'].cummax()
print(df)

Output:

   X  Cumulative_Max
0  1               1
1  2               2
2  3               3
3  2               3
4  1               3

Exercise 75:

Create a DataFrame and calculate the cumulative minimum.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 2, 1]}
df = pd.DataFrame(data)
df['Cumulative_Min'] = df['X'].cummin()
print(df)

Output:

   X  Cumulative_Min
0  1               1
1  2               1
2  3               1
3  2               1
4  1               1

Exercise 76:

Create a DataFrame with random values and calculate the cumulative variance.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Cumulative_Var'] = df['X'].expanding().var()
print(df)

Output:

          X         Y         Z  Cumulative_Var
0  0.315669  0.900791  0.404858             NaN
1  0.462000  0.463257  0.922495        0.010706
2  0.328968  0.200027  0.967625        0.006548
3  0.630370  0.992849  0.231884        0.021460
4  0.574397  0.968600  0.926893        0.020023
5  0.204077  0.889864  0.589022        0.027130
6  0.386806  0.630882  0.242157        0.022759
7  0.319831  0.935747  0.829739        0.020630
8  0.786435  0.377739  0.879458        0.034407
9  0.523467  0.077937  0.764476        0.031194

Exercise 77:

Create a DataFrame and apply a custom function to each element.

Solution:

import pandas as pd
# Create a DataFrame
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
# Define the custom function
def custom_function(x):
    return x * 2
# Apply the function to each element using map on each column
df = df.apply(lambda col: col.map(custom_function))
# Print the DataFrame
print(df)

Output:

Exercise 78:

Create a DataFrame with random values and calculate the z-score for each element.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df = df.apply(lambda x: (x - x.mean()) / x.std(), axis=0)
print(df)

Output:

          X         Y         Z
0  1.027393  0.656858  1.032853
1  0.674079 -1.277904 -0.220065
2 -0.996641 -0.298841  0.475217
3 -0.704831  0.919887 -1.288005

Exercise 79:

Create a DataFrame and calculate the cumulative sum for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Sum'] = df.groupby('X')['Y'].cumsum()
print(df)

Output:

     X  Y  Cumulative_Sum
0  foo  1               1
1  bar  2               2
2  foo  3               4
3  bar  4               6

Exercise 80:

Create a DataFrame with random values and calculate the rank for each element.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df = df.rank()
print(df)

Output:

     X    Y    Z
0  4.0  3.0  3.0
1  3.0  2.0  2.0
2  1.0  4.0  1.0
3  2.0  1.0  4.0

Exercise 81:

Create a DataFrame and calculate the cumulative product for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Product'] = df.groupby('X')['Y'].cumprod()
print(df)

Output:

     X  Y  Cumulative_Product
0  foo  1                   1
1  bar  2                   2
2  foo  3                   3
3  bar  4                   8

Exercise 82:

Create a DataFrame with random values and calculate the expanding sum.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Sum'] = df['X'].expanding().sum()
print(df)

Output:

         X          Y         Z  Expanding_Sum
0  0.815750  0.062819  0.699743       0.815750
1  0.128772  0.843222  0.411903       0.944522
2  0.857516  0.219424  0.234460       1.802038
3  0.011010  0.774375  0.259412       1.813048

Exercise 83:

Create a DataFrame and calculate the expanding minimum for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Expanding_Min'] = df.groupby('X')['Y'].expanding().min().reset_index(level=0, drop=True)
print(df)

Output:

     X  Y  Expanding_Min
0  foo  1            1.0
1  bar  2            2.0
2  foo  3            1.0
3  bar  4            2.0

Exercise 84:

Create a DataFrame with random values and calculate the expanding maximum for each group.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Max'] = df.groupby('X')['Y'].expanding().max().reset_index(level=0, drop=True)
print(df)

Output:

          X         Y         Z  Expanding_Max
0  0.751392  0.015856  0.313990       0.015856
1  0.812436  0.701808  0.069307       0.701808
2  0.148614  0.838726  0.290646       0.838726
3  0.764419  0.586510  0.470466       0.586510

Exercise 85:

Create a DataFrame and calculate the expanding variance for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Expanding_Var'] = df.groupby('X')['Y'].expanding().var().reset_index(level=0, drop=True)
print(df)

Output:

       X  Y  Expanding_Var
0  foo  1            NaN
1  bar  2            NaN
2  foo  3            2.0
3  bar  4            2.0

Exercise 86:

Create a DataFrame with random values and calculate the expanding standard deviation.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Std'] = df['X'].expanding().std()
print(df)

Output:

          X         Y         Z  Expanding_Std
0  0.693184  0.088273  0.109510            NaN
1  0.031186  0.163005  0.803467       0.468103
2  0.294881  0.409395  0.278145       0.333272
3  0.918778  0.854961  0.791329       0.397322

Exercise 87:

Create a DataFrame and calculate the expanding covariance.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 3, 2, 1]}
df = pd.DataFrame(data)
df['Expanding_Cov'] = df['X'].expanding().cov(df['Y'])
print(df)

Output:

   X  Y  Expanding_Cov
0  1  4            NaN
1  2  3      -0.500000
2  3  2      -1.000000
3  4  1      -1.666667

Exercise 88:

Create a DataFrame with random values and calculate the expanding correlation.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Corr'] = df['X'].expanding().corr(df['Y'])
print(df)

Output:

          X         Y         Z  Expanding_Corr
0  0.094026  0.320246  0.044218             NaN
1  0.422531  0.002172  0.995907       -1.000000
2  0.265459  0.391239  0.589878       -0.751147
3  0.118812  0.061489  0.837821       -0.372750

Exercise 89:

Create a DataFrame and calculate the expanding median.

Solution:

import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Median'] = df['X'].expanding().median()
print(df)

Output:

   X  Expanding_Median
0  1               1.0
1  2               1.5
2  3               2.0
3  4               2.5
4  5               3.0
5  6               3.5

Exercise 90:

Create a DataFrame with datetime index and calculate the expanding mean for each group.

Solution:

import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=10, freq='D')
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data, index=date_range)
df['Expanding_Mean'] = df.groupby('X')['Y'].expanding().mean().reset_index(level=0, drop=True)
print(df)

Output:

              X  Y  Expanding_Mean
2020-01-01  foo  0             0.0
2020-01-02  bar  1             1.0
2020-01-03  foo  2             1.0
2020-01-04  bar  3             2.0
2020-01-05  foo  4             2.0
2020-01-06  bar  5             3.0
2020-01-07  foo  6             3.0
2020-01-08  bar  7             4.0
2020-01-09  foo  8             4.0
2020-01-10  bar  9             5.0

Exercise 91:

Create a DataFrame with random values and calculate the rolling sum for each group.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Sum'] = df.groupby('X')['Y'].rolling(window=3).sum().reset_index(level=0, drop=True)
print(df)

Output:

          X         Y         Z  Rolling_Sum
0  0.342706  0.579330  0.902681          NaN
1  0.182432  0.163406  0.156607          NaN
2  0.983085  0.052785  0.588865          NaN
3  0.756982  0.123991  0.704262          NaN
4  0.876875  0.710953  0.923588          NaN
5  0.359818  0.135520  0.277327          NaN
6  0.693156  0.590918  0.985834          NaN
7  0.892253  0.633529  0.169000          NaN
8  0.084238  0.007579  0.076730          NaN
9  0.663869  0.780832  0.644874          NaN

Exercise 92:

Create a DataFrame and calculate the rolling mean for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Mean'] = df.groupby('X')['Y'].rolling(window=3).mean().reset_index(level=0, drop=True)
print(df)

Output:

       X  Y  Rolling_Mean
0  foo  0           NaN
1  bar  1           NaN
2  foo  2           NaN
3  bar  3           NaN
4  foo  4           2.0
5  bar  5           3.0
6  foo  6           4.0
7  bar  7           5.0
8  foo  8           6.0
9  bar  9           7.0

Exercise 93:

Create a DataFrame with random values and calculate the rolling standard deviation for each group.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Std'] = df.groupby('X')['Y'].rolling(window=3).std().reset_index(level=0, drop=True)
print(df)

Output:

          X         Y         Z  Rolling_Std
0  0.154838  0.162793  0.808882          NaN
1  0.740167  0.920318  0.650240          NaN
2  0.033449  0.007883  0.249656          NaN
3  0.983601  0.261995  0.399816          NaN
4  0.883155  0.051084  0.125735          NaN
5  0.986930  0.470328  0.612276          NaN
6  0.981338  0.016731  0.627210          NaN
7  0.670522  0.247346  0.530971          NaN
8  0.978909  0.752500  0.903401          NaN
9  0.185614  0.362602  0.541459          NaN

Exercise 94:

Create a DataFrame and calculate the rolling variance for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Var'] = df.groupby('X')['Y'].rolling(window=3).var().reset_index(level=0, drop=True)
print(df)

Output:

     X  Y  Rolling_Var
0  foo  0          NaN
1  bar  1          NaN
2  foo  2          NaN
3  bar  3          NaN
4  foo  4          4.0
5  bar  5          4.0
6  foo  6          4.0
7  bar  7          4.0
8  foo  8          4.0
9  bar  9          4.0

Exercise 95:

Create a DataFrame with random values and calculate the rolling correlation for each group.

Solution:

import pandas as pd
import numpy as np
# Create a DataFrame with random values
np.random.seed(42)  # For reproducibility
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
# Optionally create a group column if necessary
df['Group'] = np.random.choice(['A', 'B'], size=10)
# Calculate the rolling correlation for each group
df['Rolling_Corr'] = df.groupby('Group').apply(lambda group: group['Y'].rolling(window=3).corr(group['Z'])).reset_index(level=0, drop=True)
print(df)

Output:

          X                   Z Group  Rolling_Corr
0  0.374540  0.950714  0.731994     A           NaN	
1  0.598658  0.156019  0.155995     A           NaN
2  0.058084  0.866176  0.601115     A      0.992633
3  0.708073  0.020584  0.969910     A     -0.095420
4  0.832443  0.212339  0.181825     A     -0.180021
5  0.183405  0.304242  0.524756     B           NaN
6  0.431945  0.291229  0.611853     B           NaN
7  0.139494  0.292145  0.366362     A     -0.869948
8  0.456070  0.785176  0.199674     B     -0.984073
9  0.514234  0.592415  0.046450     B     -0.788379

Exercise 96:

Create a DataFrame and calculate the rolling covariance for each group.

Solution:

import pandas as pd

# Create a DataFrame with sample data
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
        'Y': range(10), 'Z': range(10, 20)}
df = pd.DataFrame(data)
# Calculate the rolling covariance for each group
rolling_cov = df.groupby('X').apply(lambda group: group['Y'].rolling(window=3).cov(group['Z'])).reset_index(level=0, drop=True)
# Add the rolling covariance to the original DataFrame
df['Rolling_Cov'] = rolling_cov
print(df)

Output:

     X  Y   Z  Rolling_Cov
0  foo  0  10          NaN
1  bar  1  11          NaN
2  foo  2  12          NaN
3  bar  3  13          NaN
4  foo  4  14          4.0
5  bar  5  15          4.0
6  foo  6  16          4.0
7  bar  7  17          4.0
8  foo  8  18          4.0
9  bar  9  19          4.0

Exercise 97:

Create a DataFrame with random values and calculate the rolling skewness for each group.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Skew'] = df.groupby('X')['Y'].rolling(window=3).skew().reset_index(level=0, drop=True)
print(df)

Output:

          X         Y         Z  Rolling_Skew
0  0.808397  0.304614  0.097672           NaN
1  0.684233  0.440152  0.122038           NaN
2  0.495177  0.034389  0.909320           NaN
3  0.258780  0.662522  0.311711           NaN
4  0.520068  0.546710  0.184854           NaN
5  0.969585  0.775133  0.939499           NaN
6  0.894827  0.597900  0.921874           NaN
7  0.088493  0.195983  0.045227           NaN
8  0.325330  0.388677  0.271349           NaN
9  0.828738  0.356753  0.280935           NaN

Exercise 98:

Create a DataFrame and calculate the rolling kurtosis for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Kurt'] = df.groupby('X')['Y'].rolling(window=3).kurt().reset_index(level=0, drop=True)
print(df)

Output:

     X  Y  Rolling_Kurt
0  foo  0           NaN
1  bar  1           NaN
2  foo  2           NaN
3  bar  3           NaN
4  foo  4           NaN
5  bar  5           NaN
6  foo  6           NaN
7  bar  7           NaN
8  foo  8           NaN
9  bar  9           NaN

Exercise 99:

Create a DataFrame with random values and calculate the rolling median for each group.

Solution:

import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Median'] = df.groupby('X')['Y'].rolling(window=3).median().reset_index(level=0, drop=True)
print(df)

Output:

          X         Y         Z  Rolling_Median
0  0.542696  0.140924  0.802197             NaN
1  0.074551  0.986887  0.772245             NaN
2  0.198716  0.005522  0.815461             NaN
3  0.706857  0.729007  0.771270             NaN
4  0.074045  0.358466  0.115869             NaN
5  0.863103  0.623298  0.330898             NaN
6  0.063558  0.310982  0.325183             NaN
7  0.729606  0.637557  0.887213             NaN
8  0.472215  0.119594  0.713245             NaN
9  0.760785  0.561277  0.770967             NaN

Exercise 100:

Create a DataFrame and calculate the expanding sum for each group.

Solution:

import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Expanding_Sum'] = df.groupby('X')['Y'].expanding().sum().reset_index(level=0, drop=True)
print(df)

Output:

     X  Y  Expanding_Sum
0  foo  0            0.0
1  bar  1            1.0
2  foo  2            2.0
3  bar  3            4.0
4  foo  4            6.0
5  bar  5            9.0
6  foo  6           12.0
7  bar  7           16.0
8  foo  8           20.0
9  bar  9           25.0

Python-Pandas Code Editor:

More to Come !

Do not submit any solution of the above exercises at here, if you want to contribute go to the appropriate exercise page.

Test your Python skills with w3resource's quiz