Mastering Pandas: 100 Exercises with solutions for Python data analysis
Welcome to w3resource's 100 Pandas exercises collection! This comprehensive set of exercises is designed to help you master the fundamentals of Pandas, a powerful data manipulation and analysis library in Python. Whether you're a beginner or an experienced user looking to improve your skills, these exercises cover a wide range of topics. They provide practical challenges to enhance your Pandas understanding.
[An editor is available at the bottom of the page to write and execute the scripts. Go to the editor]
Exercise 1:
Create a DataFrame from a dictionary of lists.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df)
Output:
X Y 0 1 5 1 2 6 2 3 7 3 4 8
Exercise 2:
Select the first 3 rows of a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df.head(3))
Output:
X Y 0 1 5 1 2 6 2 3 7
Exercise 3:
Select the 'X' column from a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df['X'])
Output:
0 1 1 2 2 3 3 4 Name: X, dtype: int64
Exercise 4:
Filter rows based on a column condition.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
filtered_df = df[df['X'] > 2]
print(filtered_df)
Output:
X Y 2 3 7 3 4 8
Exercise 5:
Add a new column to an existing DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
df['Z'] = df['X'] + df['Y']
print(df)
Output:
X Y Z 0 1 5 6 1 2 6 8 2 3 7 10 3 4 8 12
Exercise 6:
Remove a column from a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8], 'Z': [9, 10, 11, 12]}
df = pd.DataFrame(data)
df.drop(columns=['Z'], inplace=True)
print(df)
Output:
X Y 0 1 5 1 2 6 2 3 7 3 4 8
Exercise 7:
Sort a DataFrame by a column.
Solution:
import pandas as pd
data = {'X': [4, 3, 2, 1], 'Y': [8, 7, 6, 5]}
df = pd.DataFrame(data)
df.sort_values(by='X', inplace=True)
print(df)
Output:
X Y 3 1 5 2 2 6 1 3 7 0 4 8
Exercise 8:
Group a DataFrame by a column and calculate the mean of each group.
Solution:
import pandas as pd
data = {'X': [1, 2, 1, 2], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
grouped_df = df.groupby('X').mean()
print(grouped_df)
Output:
     Y
X     
1  6.0
2  7.0
Exercise 9:
Replace missing values in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [5, None, 7, 8]}
df = pd.DataFrame(data)
df.fillna(0, inplace=True)
print(df)
Output:
     X    Y
0  1.0  5.0
1  2.0  0.0
2  0.0  7.0
3  4.0  8.0
Exercise 10:
Convert a column to datetime.
Solution:
import pandas as pd
data = {'X': ['2020-01-01', '2020-01-02', '2020-01-03']}
df = pd.DataFrame(data)
df['X'] = pd.to_datetime(df['X'])
print(df)
Output:
           X
0 2020-01-01
1 2020-01-02
2 2020-01-03
Exercise 11:
Create a DataFrame with specific column names.
Solution:
import pandas as pd
data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)
print(df)
Output:
col1 col2 0 1 4 1 2 5 2 3 6
Exercise 12:
Calculate the sum of values in each column.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.sum())
Output:
X 6 Y 15 dtype: int64
Exercise 13:
Calculate the mean of values in each row.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.mean(axis=1))
Output:
0 2.5 1 3.5 2 4.5 dtype: float64
Exercise 14:
Concatenate two DataFrames.
Solution:
import pandas as pd
data1 = {'X': [1, 2, 3]}
data2 = {'Y': [4, 5, 6]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
concatenated_df = pd.concat([df1, df2], axis=1)
print(concatenated_df)
Output:
X Y 0 1 4 1 2 5 2 3 6
Exercise 15:
Merge two DataFrames on a key.
Solution:
import pandas as pd
data1 = {'key': ['X', 'Y', 'Z'], 'value1': [1, 2, 3]}
data2 = {'key': ['X', 'Y', 'D'], 'value2': [4, 5, 6]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
merged_df = pd.merge(df1, df2, on='key')
print(merged_df)
Output:
key value1 value2 0 X 1 4 1 Y 2 5
Exercise 16:
Create a pivot table from a DataFrame.
Solution:
import pandas as pd
data = {'X': ['foo', 'foo', 'bar', 'bar'], 'Y': ['one', 'two', 'one', 'two'], 'Z': [1, 2, 3, 4]}
df = pd.DataFrame(data)
pivot_table = df.pivot_table(values='Z', index='X', columns='Y')
print(pivot_table)
Output:
Y one two X bar 3.0 4.0 foo 1.0 2.0
Exercise 17:
Reshape a DataFrame from long to wide format.
Solution:
import pandas as pd
data = {'X': ['foo', 'foo', 'bar', 'bar'], 'Y': ['one', 'two', 'one', 'two'], 'Z': [1, 2, 3, 4]}
df = pd.DataFrame(data)
wide_df = df.pivot(index='X', columns='Y', values='Z')
print(wide_df) 
Output:
Y one two X bar 3 4 foo 1 2
Exercise 18:
Calculate the correlation between columns in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 3, 2, 1]}
df = pd.DataFrame(data)
correlation = df.corr()
print(correlation)
Output:
     X    Y
X  1.0 -1.0
Y -1.0  1.0
Exercise 19:
Iterate over rows in a DataFrame using iterrows().
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
for index, row in df.iterrows():
    print(index, row['X'], row['Y'])
Output:
0 1 4 1 2 5 2 3 6
Exercise 20:
Apply a function to each element in a DataFrame.
Solution:
import pandas as pd  # Import the Pandas library
# Create a sample DataFrame
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
# Apply a function to each element using the map method
df = df.apply(lambda col: col.map(lambda x: x * 2))
print(df)
Output:
X Y 0 2 8 1 4 10 2 6 12
Exercise 21:
Create a DataFrame from a list of dictionaries.
Solution:
import pandas as pd
data = [{'X': 1, 'Y': 2}, {'X': 3, 'Y': 4}]
df = pd.DataFrame(data)
print(df)
Output:
X Y 0 1 2 1 3 4
Exercise 22:
Rename columns in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.rename(columns={'X': 'X', 'Y': 'Y'}, inplace=True)
print(df)
Output:
X Y 0 1 4 1 2 5 2 3 6
Exercise 23:
Filter rows by multiple conditions.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
filtered_df = df[(df['X'] > 2) & (df['Y'] < 7)]
print(filtered_df)
Output:
X Y 2 3 6
Exercise 24:
Calculate the cumulative sum of a column.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Sum'] = df['X'].cumsum()
print(df)
Output:
X Cumulative_Sum 0 1 1 1 2 3 2 3 6 3 4 10
Exercise 25:
Drop rows with missing values.
Solution:
import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [4, 5, 6, None]}
df = pd.DataFrame(data)
df.dropna(inplace=True)
print(df)
Output:
     X    Y
0  1.0  4.0
1  2.0  5.0
Exercise 26:
Replace values in a DataFrame based on a condition.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
df.loc[df['X'] > 2, 'Y'] = 0
print(df)
Output:
X Y 0 1 5 1 2 6 2 3 0 3 4 0
Exercise 27:
Create a DataFrame with a MultiIndex.
Solution:
import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
print(df)
Output:
              Value
Group Number       
X     1          10
      2          20
Y     1          30
      2          40
Exercise 28:
Calculate the rolling mean of a column.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Rolling_Mean'] = df['X'].rolling(window=3).mean()
print(df)
Output:
X Rolling_Mean 0 1 NaN 1 2 NaN 2 3 2.0 3 4 3.0 4 5 4.0 5 6 5.0
Exercise 29:
Create a DataFrame from a list of tuples.
Solution:
import pandas as pd
data = [(1, 2), (3, 4), (5, 6)]
df = pd.DataFrame(data, columns=['X', 'Y'])
print(df)
Output:
X Y 0 1 2 1 3 4 2 5 6
Exercise 30:
Add a row to a DataFrame.
Solution:
import pandas as pd  # Import the Pandas library
# Create a sample DataFrame
data = {'X': [1, 2], 'Y': [3, 4]}
df = pd.DataFrame(data)
# Create a new row as a DataFrame
new_row = pd.DataFrame({'X': [5], 'Y': [6]})
# Concatenate the new row to the DataFrame
df = pd.concat([df, new_row], ignore_index=True)
print(df)
Output:
X Y 0 1 3 1 2 4 2 5 6
Exercise 31:
Create a DataFrame with random values.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df)
Output:
          X         Y         Z
0  0.688292  0.950264  0.665916
1  0.497719  0.840536  0.923938
2  0.285218  0.091178  0.722034
3  0.037824  0.248689  0.584696
Exercise 32:
Calculate the rank of values in a DataFrame.
Solution:
import pandas as pd
data = {'X': [3, 1, 4, 1], 'Y': [2, 3, 1, 4]}
df = pd.DataFrame(data)
df['Rank'] = df['X'].rank()
print(df)
Output:
X Y Rank 0 3 2 3.0 1 1 3 1.5 2 4 1 4.0 3 1 4 1.5
Exercise 33:
Change the data type of a column.
Solution:
import pandas as pd
data = {'X': ['1', '2', '3']}
df = pd.DataFrame(data)
df['X'] = df['X'].astype(int)
print(df)
Output:
X 0 1 1 2 2 3
Exercise 34:
Filter rows based on string matching.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba')]
print(filtered_df)
Output:
    X
1  bar
2  baz
Exercise 35:
Create a DataFrame with specified row and column labels.
Solution:
import pandas as pd
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
df = pd.DataFrame(data, index=['row1', 'row2', 'row3'], columns=['col1', 'col2', 'col3'])
print(df)
Output:
       col1  col2  col3
row1     1     2     3
row2     4     5     6
row3     7     8     9
Exercise 36:
Transpose a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
transposed_df = df.T
print(transposed_df)
Output:
0 1 2 X 1 2 3 Y 4 5 6
Exercise 37:
Set a column as the index of a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.set_index('X', inplace=True)
print(df)
Output:
Y X 1 4 2 5 3 6
Exercise 38:
Reset the index of a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.set_index('X', inplace=True)
df.reset_index(inplace=True)
print(df)
Output:
X Y 0 1 4 1 2 5 2 3 6
Exercise 39:
Add a prefix or suffix to column names.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df = df.add_prefix('col_')
print(df)
Output:
col_X col_Y 0 1 4 1 2 5 2 3 6
Exercise 40:
Filter rows based on datetime index.
Solution:
import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=5, freq='D')
data = {'X': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data, index=date_range)
filtered_df = df['2020-01-03':'2020-01-05']
print(filtered_df)
Output:
            X
2020-01-03  3
2020-01-04  4
2020-01-05  5
Exercise 41:
Create a DataFrame with duplicate rows and remove duplicates.
Solution:
import pandas as pd
data = {'X': [1, 2, 2, 3], 'Y': [4, 5, 5, 6]}
df = pd.DataFrame(data)
df.drop_duplicates(inplace=True)
print(df)
Output:
X Y 0 1 4 1 2 5 3 3 6
Exercise 42:
Create a DataFrame with hierarchical index.
Solution:
import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
print(df)
Output:
              Value
Group Number       
X     1          10
      2          20
Y     1          30
      2          40
Exercise 43:
Calculate the difference between consecutive rows in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 3, 6, 10]}
df = pd.DataFrame(data)
df['Difference'] = df['X'].diff()
print(df)
Output:
    X  Difference
0   1         NaN
1   3         2.0
2   6         3.0
3  10         4.0
Exercise 44:
Create a DataFrame with hierarchical columns.
Solution:
import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], ['C1', 'C2', 'C1', 'C2']]
columns = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Type'))
data = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
df = pd.DataFrame(data, columns=columns)
print(df)
Output:
Group X Y Type C1 C2 C1 C2 0 1 2 3 4 1 5 6 7 8 2 9 10 11 12
Exercise 45:
Filter rows based on the length of strings in a column.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.len() > 3]
print(filtered_df)
Output:
Empty DataFrame Columns: [X] Index: []
Exercise 46:
Calculate the percentage change between rows in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Pct_Change'] = df['X'].pct_change()
print(df)
Output:
X Pct_Change 0 1 NaN 1 2 1.000000 2 3 0.500000 3 4 0.333333
Exercise 47:
Create a DataFrame from a dictionary of Series.
Solution:
import pandas as pd
data = {'X': pd.Series([1, 2, 3]), 'Y': pd.Series([4, 5, 6])}
df = pd.DataFrame(data)
print(df)
Output:
X Y 0 1 4 1 2 5 2 3 6
Exercise 48:
Filter rows based on whether a column value is in a list.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
filtered_df = df[df['X'].isin([2, 3])]
print(filtered_df)
Output:
X Y 1 2 6 2 3 7
Exercise 49:
Calculate the z-score of values in a DataFrame.
Solution:
import pandas as pd
import numpy as np
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
df['zscore_A'] = (df['X'] - np.mean(df['X'])) / np.std(df['X'])
print(df)
Output:
X Y zscore_A 0 1 4 -1.341641 1 2 5 -0.447214 2 3 6 0.447214 3 4 7 1.341641
Exercise 50:
Create a DataFrame with random integers and calculate descriptive statistics.
Solution:
import pandas as pd
import numpy as np
data = np.random.randint(1, 100, size=(5, 3))
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.describe())
Output:
               X          Y          Z
count   5.000000   5.000000   5.000000
mean   60.600000  71.800000  42.600000
std    38.435661  13.971399  12.218838
min     5.000000  53.000000  28.000000
25%    40.000000  64.000000  34.000000
50%    69.000000  72.000000  41.000000
75%    91.000000  82.000000  55.000000
max    98.000000  88.000000  55.000000
Exercise 51:
Calculate the rank of values in each column of a DataFrame.
Solution:
import pandas as pd
data = {'X': [3, 1, 4, 1], 'Y': [2, 3, 1, 4]}
df = pd.DataFrame(data)
df['Rank_A'] = df['X'].rank()
df['Rank_B'] = df['Y'].rank()
print(df)
Output:
X Y Rank_A Rank_B 0 3 2 3.0 2.0 1 1 3 1.5 3.0 2 4 1 4.0 1.0 3 1 4 1.5 4.0
Exercise 52:
Filter rows based on multiple string conditions.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba|qu')]
print(filtered_df)
Output:
     X
1  bar
2  baz
3  qux
Exercise 53:
Create a DataFrame with random values and calculate the skewness.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba|qu')]
print(filtered_df)
Output:
     X
1  bar
2  baz
3  qux
Exercise 54:
Create a DataFrame and calculate the kurtosis.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.kurt())
Output:
X 2.958407 Y -2.639654 Z 2.704430 dtype: float64
Exercise 55:
Calculate the cumulative product of a column in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Product'] = df['X'].cumprod()
print(df) 
Output:
X Cumulative_Product 0 1 1 1 2 2 2 3 6 3 4 24
Exercise 56:
Create a DataFrame and calculate the rolling standard deviation.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Rolling_Std'] = df['X'].rolling(window=3).std()
print(df)
Output:
X Rolling_Std 0 1 NaN 1 2 NaN 2 3 1.0 3 4 1.0 4 5 1.0 5 6 1.0
Exercise 57:
Create a DataFrame and calculate the expanding mean.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Mean'] = df['X'].expanding().mean()
print(df)
Output:
X Expanding_Mean 0 1 1.0 1 2 1.5 2 3 2.0 3 4 2.5 4 5 3.0 5 6 3.5
Exercise 58:
Create a DataFrame with random values and calculate the covariance matrix.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.cov()) 
Output:
          X         Y         Z
X  0.054079  0.007398 -0.031403
Y  0.007398  0.053211 -0.020480
Z -0.031403 -0.020480  0.048057
Exercise 59:
Create a DataFrame with random values and calculate the correlation matrix.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.corr())
Output:
               X              Y             Z
X  1.000000 -0.258187  0.541044
Y -0.258187  1.000000 -0.432419
Z  0.541044 -0.432419  1.000000
Exercise 60:
Create a DataFrame and calculate the rolling correlation between two columns.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6], 'Y': [6, 5, 4, 3, 2, 1]}
df = pd.DataFrame(data)
df['Rolling_Corr'] = df['X'].rolling(window=3).corr(df['Y'])
print(df) 
Output:
X Y Rolling_Corr 0 1 6 NaN 1 2 5 NaN 2 3 4 -1.0 3 4 3 -1.0 4 5 2 -1.0 5 6 1 -1.0
Exercise 61:
Create a DataFrame and calculate the expanding variance.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Var'] = df['X'].expanding().var()
print(df) 
Output:
X Expanding_Var 0 1 NaN 1 2 0.500000 2 3 1.000000 3 4 1.666667 4 5 2.500000 5 6 3.500000
Exercise 62:
Create a DataFrame with datetime index and resample by month.
Solution:
import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=100, freq='D')
data = {'X': range(100)}
df = pd.DataFrame(data, index=date_range)
monthly_df = df.resample('M').sum()
print(monthly_df)
Output:
               X
2020-01-31   465
2020-02-29  1305
2020-03-31  2325
2020-04-30   855
Exercise 63:
Create a DataFrame and calculate the exponential moving average.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['EMA'] = df['X'].ewm(span=3, adjust=False).mean()
print(df) 
Output:
X EMA 0 1 1.00000 1 2 1.50000 2 3 2.25000 3 4 3.12500 4 5 4.06250 5 6 5.03125
Exercise 64:
Create a DataFrame with random integers and calculate the mode.
Solution:
import pandas as pd
import numpy as np
data = np.random.randint(1, 10, size=(5, 3))
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.mode()) 
Output:
    X    Y    Z
0  2  1.0  2.0
1  3  3.0  7.0
2  5  NaN  NaN
3  6  NaN  NaN
4  9  NaN  NaN
Exercise 65:
Create a DataFrame and calculate the z-score of each column.
Solution:
import pandas as pd
import numpy as np
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
df['zscore_A'] = (df['X'] - np.mean(df['X'])) / np.std(df['X'])
df['zscore_B'] = (df['Y'] - np.mean(df['Y'])) / np.std(df['Y'])
print(df) 
Output:
    X  Y  zscore_A  zscore_B
0  1  4 -1.341641 -1.341641
1  2  5 -0.447214 -0.447214
2  3  6  0.447214  0.447214
3  4  7  1.341641  1.341641
Exercise 66:
Create a DataFrame with random values and calculate the median.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.median())
Output:
X 0.787042 Y 0.477837 Z 0.696911 dtype: float64
Exercise 67:
Create a DataFrame and apply a custom function to each column.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df = df.apply(lambda x: x + 1)
print(df)
Output:
X Y 0 2 5 1 3 6 2 4 7
Exercise 68:
Create a DataFrame with hierarchical index and calculate the mean for each group.
Solution:
import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
grouped_df = df.groupby('Group').mean()
print(grouped_df) 
Output:
         Value
Group       
X       15.0
Y       35.0
Exercise 69:
Create a DataFrame and calculate the percentage of missing values in each column.
Solution:
import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [4, None, 6, 8]}
df = pd.DataFrame(data)
missing_percentage = df.isnull().mean() * 100
print(missing_percentage)
Output:
X 25.0 Y 25.0 dtype: float64
Exercise 70:
Create a DataFrame and apply a custom function to each row.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df['Sum'] = df.apply(lambda row: row['X'] + row['Y'], axis=1)
print(df)
Output:
X Y Sum 0 1 4 5 1 2 5 7 2 3 6 9
Exercise 71:
Create a DataFrame with random values and calculate the quantiles.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.quantile([0.25, 0.5, 0.75])) 
Output:
            X          Y         Z
0.25  0.174265  0.184036  0.520573
0.50  0.468040  0.315593  0.644571
0.75  0.767870  0.436426  0.771297
Exercise 72:
Create a DataFrame and calculate the interquartile range (IQR).
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1
print(IQR) 
Output:
X 0.354244 Y 0.329573 Z 0.245520 dtype: float64
Exercise 73:
Create a DataFrame with datetime index and calculate the rolling mean.
Solution:
import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=10, freq='D')
data = {'X': range(10)}
df = pd.DataFrame(data, index=date_range)
df['Rolling_Mean'] = df['X'].rolling(window=3).mean()
print(df) 
Output:
                  X  Rolling_Mean
2020-01-01  0           NaN
2020-01-02  1           NaN
2020-01-03  2           1.0
2020-01-04  3           2.0
2020-01-05  4           3.0
2020-01-06  5           4.0
2020-01-07  6           5.0
2020-01-08  7           6.0
2020-01-09  8           7.0
2020-01-10  9           8.0
Exercise 74:
Create a DataFrame and calculate the cumulative maximum.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 2, 1]}
df = pd.DataFrame(data)
df['Cumulative_Max'] = df['X'].cummax()
print(df) 
Output:
X Cumulative_Max 0 1 1 1 2 2 2 3 3 3 2 3 4 1 3
Exercise 75:
Create a DataFrame and calculate the cumulative minimum.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 2, 1]}
df = pd.DataFrame(data)
df['Cumulative_Min'] = df['X'].cummin()
print(df)
Output:
X Cumulative_Min 0 1 1 1 2 1 2 3 1 3 2 1 4 1 1
Exercise 76:
Create a DataFrame with random values and calculate the cumulative variance.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Cumulative_Var'] = df['X'].expanding().var()
print(df)
Output:
          X         Y         Z  Cumulative_Var
0  0.315669  0.900791  0.404858             NaN
1  0.462000  0.463257  0.922495        0.010706
2  0.328968  0.200027  0.967625        0.006548
3  0.630370  0.992849  0.231884        0.021460
4  0.574397  0.968600  0.926893        0.020023
5  0.204077  0.889864  0.589022        0.027130
6  0.386806  0.630882  0.242157        0.022759
7  0.319831  0.935747  0.829739        0.020630
8  0.786435  0.377739  0.879458        0.034407
9  0.523467  0.077937  0.764476        0.031194
Exercise 77:
Create a DataFrame and apply a custom function to each element.
Solution:
import pandas as pd
# Create a DataFrame
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
# Define the custom function
def custom_function(x):
    return x * 2
# Apply the function to each element using map on each column
df = df.apply(lambda col: col.map(custom_function))
# Print the DataFrame
print(df) 
Output:
X Y 0 2 8 1 4 10 2 6 12
Exercise 78:
Create a DataFrame with random values and calculate the z-score for each element.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df = df.apply(lambda x: (x - x.mean()) / x.std(), axis=0)
print(df) 
Output:
          X         Y         Z
0  1.027393  0.656858  1.032853
1  0.674079 -1.277904 -0.220065
2 -0.996641 -0.298841  0.475217
3 -0.704831  0.919887 -1.288005
Exercise 79:
Create a DataFrame and calculate the cumulative sum for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Sum'] = df.groupby('X')['Y'].cumsum()
print(df) 
Output:
     X  Y  Cumulative_Sum
0  foo  1               1
1  bar  2               2
2  foo  3               4
3  bar  4               6
Exercise 80:
Create a DataFrame with random values and calculate the rank for each element.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df = df.rank()
print(df)
Output:
     X    Y    Z
0  4.0  3.0  3.0
1  3.0  2.0  2.0
2  1.0  4.0  1.0
3  2.0  1.0  4.0
Exercise 81:
Create a DataFrame and calculate the cumulative product for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Product'] = df.groupby('X')['Y'].cumprod()
print(df) 
Output:
     X  Y  Cumulative_Product
0  foo  1                   1
1  bar  2                   2
2  foo  3                   3
3  bar  4                   8
Exercise 82:
Create a DataFrame with random values and calculate the expanding sum.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Sum'] = df['X'].expanding().sum()
print(df) 
Output:
         X          Y         Z  Expanding_Sum
0  0.815750  0.062819  0.699743       0.815750
1  0.128772  0.843222  0.411903       0.944522
2  0.857516  0.219424  0.234460       1.802038
3  0.011010  0.774375  0.259412       1.813048
Exercise 83:
Create a DataFrame and calculate the expanding minimum for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Expanding_Min'] = df.groupby('X')['Y'].expanding().min().reset_index(level=0, drop=True)
print(df)
Output:
     X  Y  Expanding_Min
0  foo  1            1.0
1  bar  2            2.0
2  foo  3            1.0
3  bar  4            2.0
Exercise 84:
Create a DataFrame with random values and calculate the expanding maximum for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Max'] = df.groupby('X')['Y'].expanding().max().reset_index(level=0, drop=True)
print(df) 
Output:
          X         Y         Z  Expanding_Max
0  0.751392  0.015856  0.313990       0.015856
1  0.812436  0.701808  0.069307       0.701808
2  0.148614  0.838726  0.290646       0.838726
3  0.764419  0.586510  0.470466       0.586510
Exercise 85:
Create a DataFrame and calculate the expanding variance for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Expanding_Var'] = df.groupby('X')['Y'].expanding().var().reset_index(level=0, drop=True)
print(df) 
Output:
       X  Y  Expanding_Var
0  foo  1            NaN
1  bar  2            NaN
2  foo  3            2.0
3  bar  4            2.0
Exercise 86:
Create a DataFrame with random values and calculate the expanding standard deviation.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Std'] = df['X'].expanding().std()
print(df) 
Output:
          X         Y         Z  Expanding_Std
0  0.693184  0.088273  0.109510            NaN
1  0.031186  0.163005  0.803467       0.468103
2  0.294881  0.409395  0.278145       0.333272
3  0.918778  0.854961  0.791329       0.397322
Exercise 87:
Create a DataFrame and calculate the expanding covariance.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 3, 2, 1]}
df = pd.DataFrame(data)
df['Expanding_Cov'] = df['X'].expanding().cov(df['Y'])
print(df)
Output:
X Y Expanding_Cov 0 1 4 NaN 1 2 3 -0.500000 2 3 2 -1.000000 3 4 1 -1.666667
Exercise 88:
Create a DataFrame with random values and calculate the expanding correlation.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Corr'] = df['X'].expanding().corr(df['Y'])
print(df)
Output:
          X         Y         Z  Expanding_Corr
0  0.094026  0.320246  0.044218             NaN
1  0.422531  0.002172  0.995907       -1.000000
2  0.265459  0.391239  0.589878       -0.751147
3  0.118812  0.061489  0.837821       -0.372750
Exercise 89:
Create a DataFrame and calculate the expanding median.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Median'] = df['X'].expanding().median()
print(df)
Output:
X Expanding_Median 0 1 1.0 1 2 1.5 2 3 2.0 3 4 2.5 4 5 3.0 5 6 3.5
Exercise 90:
Create a DataFrame with datetime index and calculate the expanding mean for each group.
Solution:
import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=10, freq='D')
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data, index=date_range)
df['Expanding_Mean'] = df.groupby('X')['Y'].expanding().mean().reset_index(level=0, drop=True)
print(df) 
Output:
              X  Y  Expanding_Mean
2020-01-01  foo  0             0.0
2020-01-02  bar  1             1.0
2020-01-03  foo  2             1.0
2020-01-04  bar  3             2.0
2020-01-05  foo  4             2.0
2020-01-06  bar  5             3.0
2020-01-07  foo  6             3.0
2020-01-08  bar  7             4.0
2020-01-09  foo  8             4.0
2020-01-10  bar  9             5.0
Exercise 91:
Create a DataFrame with random values and calculate the rolling sum for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Sum'] = df.groupby('X')['Y'].rolling(window=3).sum().reset_index(level=0, drop=True)
print(df) 
Output:
          X         Y         Z  Rolling_Sum
0  0.342706  0.579330  0.902681          NaN
1  0.182432  0.163406  0.156607          NaN
2  0.983085  0.052785  0.588865          NaN
3  0.756982  0.123991  0.704262          NaN
4  0.876875  0.710953  0.923588          NaN
5  0.359818  0.135520  0.277327          NaN
6  0.693156  0.590918  0.985834          NaN
7  0.892253  0.633529  0.169000          NaN
8  0.084238  0.007579  0.076730          NaN
9  0.663869  0.780832  0.644874          NaN
Exercise 92:
Create a DataFrame and calculate the rolling mean for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Mean'] = df.groupby('X')['Y'].rolling(window=3).mean().reset_index(level=0, drop=True)
print(df) 
Output:
       X  Y  Rolling_Mean
0  foo  0           NaN
1  bar  1           NaN
2  foo  2           NaN
3  bar  3           NaN
4  foo  4           2.0
5  bar  5           3.0
6  foo  6           4.0
7  bar  7           5.0
8  foo  8           6.0
9  bar  9           7.0
Exercise 93:
Create a DataFrame with random values and calculate the rolling standard deviation for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Std'] = df.groupby('X')['Y'].rolling(window=3).std().reset_index(level=0, drop=True)
print(df) 
Output:
          X         Y         Z  Rolling_Std
0  0.154838  0.162793  0.808882          NaN
1  0.740167  0.920318  0.650240          NaN
2  0.033449  0.007883  0.249656          NaN
3  0.983601  0.261995  0.399816          NaN
4  0.883155  0.051084  0.125735          NaN
5  0.986930  0.470328  0.612276          NaN
6  0.981338  0.016731  0.627210          NaN
7  0.670522  0.247346  0.530971          NaN
8  0.978909  0.752500  0.903401          NaN
9  0.185614  0.362602  0.541459          NaN
Exercise 94:
Create a DataFrame and calculate the rolling variance for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Var'] = df.groupby('X')['Y'].rolling(window=3).var().reset_index(level=0, drop=True)
print(df) 
Output:
     X  Y  Rolling_Var
0  foo  0          NaN
1  bar  1          NaN
2  foo  2          NaN
3  bar  3          NaN
4  foo  4          4.0
5  bar  5          4.0
6  foo  6          4.0
7  bar  7          4.0
8  foo  8          4.0
9  bar  9          4.0
Exercise 95:
Create a DataFrame with random values and calculate the rolling correlation for each group.
Solution:
import pandas as pd
import numpy as np
# Create a DataFrame with random values
np.random.seed(42)  # For reproducibility
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
# Optionally create a group column if necessary
df['Group'] = np.random.choice(['A', 'B'], size=10)
# Calculate the rolling correlation for each group
df['Rolling_Corr'] = df.groupby('Group').apply(lambda group: group['Y'].rolling(window=3).corr(group['Z'])).reset_index(level=0, drop=True)
print(df) 
Output:
          X                   Z Group  Rolling_Corr
0  0.374540  0.950714  0.731994     A           NaN	
1  0.598658  0.156019  0.155995     A           NaN
2  0.058084  0.866176  0.601115     A      0.992633
3  0.708073  0.020584  0.969910     A     -0.095420
4  0.832443  0.212339  0.181825     A     -0.180021
5  0.183405  0.304242  0.524756     B           NaN
6  0.431945  0.291229  0.611853     B           NaN
7  0.139494  0.292145  0.366362     A     -0.869948
8  0.456070  0.785176  0.199674     B     -0.984073
9  0.514234  0.592415  0.046450     B     -0.788379
Exercise 96:
Create a DataFrame and calculate the rolling covariance for each group.
Solution:
import pandas as pd
# Create a DataFrame with sample data
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
        'Y': range(10), 'Z': range(10, 20)}
df = pd.DataFrame(data)
# Calculate the rolling covariance for each group
rolling_cov = df.groupby('X').apply(lambda group: group['Y'].rolling(window=3).cov(group['Z'])).reset_index(level=0, drop=True)
# Add the rolling covariance to the original DataFrame
df['Rolling_Cov'] = rolling_cov
print(df) 
Output:
     X  Y   Z  Rolling_Cov
0  foo  0  10          NaN
1  bar  1  11          NaN
2  foo  2  12          NaN
3  bar  3  13          NaN
4  foo  4  14          4.0
5  bar  5  15          4.0
6  foo  6  16          4.0
7  bar  7  17          4.0
8  foo  8  18          4.0
9  bar  9  19          4.0
Exercise 97:
Create a DataFrame with random values and calculate the rolling skewness for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Skew'] = df.groupby('X')['Y'].rolling(window=3).skew().reset_index(level=0, drop=True)
print(df)
Output:
          X         Y         Z  Rolling_Skew
0  0.808397  0.304614  0.097672           NaN
1  0.684233  0.440152  0.122038           NaN
2  0.495177  0.034389  0.909320           NaN
3  0.258780  0.662522  0.311711           NaN
4  0.520068  0.546710  0.184854           NaN
5  0.969585  0.775133  0.939499           NaN
6  0.894827  0.597900  0.921874           NaN
7  0.088493  0.195983  0.045227           NaN
8  0.325330  0.388677  0.271349           NaN
9  0.828738  0.356753  0.280935           NaN
Exercise 98:
Create a DataFrame and calculate the rolling kurtosis for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Kurt'] = df.groupby('X')['Y'].rolling(window=3).kurt().reset_index(level=0, drop=True)
print(df) 
Output:
     X  Y  Rolling_Kurt
0  foo  0           NaN
1  bar  1           NaN
2  foo  2           NaN
3  bar  3           NaN
4  foo  4           NaN
5  bar  5           NaN
6  foo  6           NaN
7  bar  7           NaN
8  foo  8           NaN
9  bar  9           NaN
Exercise 99:
Create a DataFrame with random values and calculate the rolling median for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Median'] = df.groupby('X')['Y'].rolling(window=3).median().reset_index(level=0, drop=True)
print(df) 
Output:
          X         Y         Z  Rolling_Median
0  0.542696  0.140924  0.802197             NaN
1  0.074551  0.986887  0.772245             NaN
2  0.198716  0.005522  0.815461             NaN
3  0.706857  0.729007  0.771270             NaN
4  0.074045  0.358466  0.115869             NaN
5  0.863103  0.623298  0.330898             NaN
6  0.063558  0.310982  0.325183             NaN
7  0.729606  0.637557  0.887213             NaN
8  0.472215  0.119594  0.713245             NaN
9  0.760785  0.561277  0.770967             NaN
Exercise 100:
Create a DataFrame and calculate the expanding sum for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Expanding_Sum'] = df.groupby('X')['Y'].expanding().sum().reset_index(level=0, drop=True)
print(df) 
Output:
     X  Y  Expanding_Sum
0  foo  0            0.0
1  bar  1            1.0
2  foo  2            2.0
3  bar  3            4.0
4  foo  4            6.0
5  bar  5            9.0
6  foo  6           12.0
7  bar  7           16.0
8  foo  8           20.0
9  bar  9           25.0
Python-Pandas Code Editor:
More to Come !
Do not submit any solution of the above exercises at here, if you want to contribute go to the appropriate exercise page.
Test your Python skills with w3resource's quiz
