Mastering Pandas: 100 Exercises with solutions for Python data analysis
Mastering Pandas [100 exercises with solution]
Welcome to w3resource's 100 Pandas exercises collection! This comprehensive set of exercises is designed to help you master the fundamentals of Pandas, a powerful data manipulation and analysis library in Python. Whether you're a beginner or an experienced user looking to improve your skills, these exercises cover a wide range of topics. They provide practical challenges to enhance your Pandas understanding.
[An editor is available at the bottom of the page to write and execute the scripts. Go to the editor]
Exercise 1:
Create a DataFrame from a dictionary of lists.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df)
Output:
X Y 0 1 5 1 2 6 2 3 7 3 4 8
Exercise 2:
Select the first 3 rows of a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df.head(3))
Output:
X Y 0 1 5 1 2 6 2 3 7
Exercise 3:
Select the 'X' column from a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
print(df['X'])
Output:
0 1 1 2 2 3 3 4 Name: X, dtype: int64
Exercise 4:
Filter rows based on a column condition.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
filtered_df = df[df['X'] > 2]
print(filtered_df)
Output:
X Y 2 3 7 3 4 8
Exercise 5:
Add a new column to an existing DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
df['Z'] = df['X'] + df['Y']
print(df)
Output:
X Y Z 0 1 5 6 1 2 6 8 2 3 7 10 3 4 8 12
Exercise 6:
Remove a column from a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8], 'Z': [9, 10, 11, 12]}
df = pd.DataFrame(data)
df.drop(columns=['Z'], inplace=True)
print(df)
Output:
X Y 0 1 5 1 2 6 2 3 7 3 4 8
Exercise 7:
Sort a DataFrame by a column.
Solution:
import pandas as pd
data = {'X': [4, 3, 2, 1], 'Y': [8, 7, 6, 5]}
df = pd.DataFrame(data)
df.sort_values(by='X', inplace=True)
print(df)
Output:
X Y 3 1 5 2 2 6 1 3 7 0 4 8
Exercise 8:
Group a DataFrame by a column and calculate the mean of each group.
Solution:
import pandas as pd
data = {'X': [1, 2, 1, 2], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
grouped_df = df.groupby('X').mean()
print(grouped_df)
Output:
Y X 1 6.0 2 7.0
Exercise 9:
Replace missing values in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [5, None, 7, 8]}
df = pd.DataFrame(data)
df.fillna(0, inplace=True)
print(df)
Output:
X Y 0 1.0 5.0 1 2.0 0.0 2 0.0 7.0 3 4.0 8.0
Exercise 10:
Convert a column to datetime.
Solution:
import pandas as pd
data = {'X': ['2020-01-01', '2020-01-02', '2020-01-03']}
df = pd.DataFrame(data)
df['X'] = pd.to_datetime(df['X'])
print(df)
Output:
X 0 2020-01-01 1 2020-01-02 2 2020-01-03
Exercise 11:
Create a DataFrame with specific column names.
Solution:
import pandas as pd
data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)
print(df)
Output:
col1 col2 0 1 4 1 2 5 2 3 6
Exercise 12:
Calculate the sum of values in each column.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.sum())
Output:
X 6 Y 15 dtype: int64
Exercise 13:
Calculate the mean of values in each row.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.mean(axis=1))
Output:
0 2.5 1 3.5 2 4.5 dtype: float64
Exercise 14:
Concatenate two DataFrames.
Solution:
import pandas as pd
data1 = {'X': [1, 2, 3]}
data2 = {'Y': [4, 5, 6]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
concatenated_df = pd.concat([df1, df2], axis=1)
print(concatenated_df)
Output:
X Y 0 1 4 1 2 5 2 3 6
Exercise 15:
Merge two DataFrames on a key.
Solution:
import pandas as pd
data1 = {'key': ['X', 'Y', 'Z'], 'value1': [1, 2, 3]}
data2 = {'key': ['X', 'Y', 'D'], 'value2': [4, 5, 6]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
merged_df = pd.merge(df1, df2, on='key')
print(merged_df)
Output:
key value1 value2 0 X 1 4 1 Y 2 5
Exercise 16:
Create a pivot table from a DataFrame.
Solution:
import pandas as pd
data = {'X': ['foo', 'foo', 'bar', 'bar'], 'Y': ['one', 'two', 'one', 'two'], 'Z': [1, 2, 3, 4]}
df = pd.DataFrame(data)
pivot_table = df.pivot_table(values='Z', index='X', columns='Y')
print(pivot_table)
Output:
Y one two X bar 3.0 4.0 foo 1.0 2.0
Exercise 17:
Reshape a DataFrame from long to wide format.
Solution:
import pandas as pd
data = {'X': ['foo', 'foo', 'bar', 'bar'], 'Y': ['one', 'two', 'one', 'two'], 'Z': [1, 2, 3, 4]}
df = pd.DataFrame(data)
wide_df = df.pivot(index='X', columns='Y', values='Z')
print(wide_df)
Output:
Y one two X bar 3 4 foo 1 2
Exercise 18:
Calculate the correlation between columns in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 3, 2, 1]}
df = pd.DataFrame(data)
correlation = df.corr()
print(correlation)
Output:
X Y X 1.0 -1.0 Y -1.0 1.0
Exercise 19:
Iterate over rows in a DataFrame using iterrows().
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
for index, row in df.iterrows():
print(index, row['X'], row['Y'])
Output:
0 1 4 1 2 5 2 3 6
Exercise 20:
Apply a function to each element in a DataFrame.
Solution:
import pandas as pd # Import the Pandas library
# Create a sample DataFrame
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
# Apply a function to each element using the map method
df = df.apply(lambda col: col.map(lambda x: x * 2))
print(df)
Output:
X Y 0 2 8 1 4 10 2 6 12
Exercise 21:
Create a DataFrame from a list of dictionaries.
Solution:
import pandas as pd
data = [{'X': 1, 'Y': 2}, {'X': 3, 'Y': 4}]
df = pd.DataFrame(data)
print(df)
Output:
X Y 0 1 2 1 3 4
Exercise 22:
Rename columns in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.rename(columns={'X': 'X', 'Y': 'Y'}, inplace=True)
print(df)
Output:
X Y 0 1 4 1 2 5 2 3 6
Exercise 23:
Filter rows by multiple conditions.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
filtered_df = df[(df['X'] > 2) & (df['Y'] < 7)]
print(filtered_df)
Output:
X Y 2 3 6
Exercise 24:
Calculate the cumulative sum of a column.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Sum'] = df['X'].cumsum()
print(df)
Output:
X Cumulative_Sum 0 1 1 1 2 3 2 3 6 3 4 10
Exercise 25:
Drop rows with missing values.
Solution:
import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [4, 5, 6, None]}
df = pd.DataFrame(data)
df.dropna(inplace=True)
print(df)
Output:
X Y 0 1.0 4.0 1 2.0 5.0
Exercise 26:
Replace values in a DataFrame based on a condition.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
df.loc[df['X'] > 2, 'Y'] = 0
print(df)
Output:
X Y 0 1 5 1 2 6 2 3 0 3 4 0
Exercise 27:
Create a DataFrame with a MultiIndex.
Solution:
import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
print(df)
Output:
Value Group Number X 1 10 2 20 Y 1 30 2 40
Exercise 28:
Calculate the rolling mean of a column.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Rolling_Mean'] = df['X'].rolling(window=3).mean()
print(df)
Output:
X Rolling_Mean 0 1 NaN 1 2 NaN 2 3 2.0 3 4 3.0 4 5 4.0 5 6 5.0
Exercise 29:
Create a DataFrame from a list of tuples.
Solution:
import pandas as pd
data = [(1, 2), (3, 4), (5, 6)]
df = pd.DataFrame(data, columns=['X', 'Y'])
print(df)
Output:
X Y 0 1 2 1 3 4 2 5 6
Exercise 30:
Add a row to a DataFrame.
Solution:
import pandas as pd # Import the Pandas library
# Create a sample DataFrame
data = {'X': [1, 2], 'Y': [3, 4]}
df = pd.DataFrame(data)
# Create a new row as a DataFrame
new_row = pd.DataFrame({'X': [5], 'Y': [6]})
# Concatenate the new row to the DataFrame
df = pd.concat([df, new_row], ignore_index=True)
print(df)
Output:
X Y 0 1 3 1 2 4 2 5 6
Exercise 31:
Create a DataFrame with random values.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df)
Output:
X Y Z 0 0.688292 0.950264 0.665916 1 0.497719 0.840536 0.923938 2 0.285218 0.091178 0.722034 3 0.037824 0.248689 0.584696
Exercise 32:
Calculate the rank of values in a DataFrame.
Solution:
import pandas as pd
data = {'X': [3, 1, 4, 1], 'Y': [2, 3, 1, 4]}
df = pd.DataFrame(data)
df['Rank'] = df['X'].rank()
print(df)
Output:
X Y Rank 0 3 2 3.0 1 1 3 1.5 2 4 1 4.0 3 1 4 1.5
Exercise 33:
Change the data type of a column.
Solution:
import pandas as pd
data = {'X': ['1', '2', '3']}
df = pd.DataFrame(data)
df['X'] = df['X'].astype(int)
print(df)
Output:
X 0 1 1 2 2 3
Exercise 34:
Filter rows based on string matching.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba')]
print(filtered_df)
Output:
X 1 bar 2 baz
Exercise 35:
Create a DataFrame with specified row and column labels.
Solution:
import pandas as pd
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
df = pd.DataFrame(data, index=['row1', 'row2', 'row3'], columns=['col1', 'col2', 'col3'])
print(df)
Output:
col1 col2 col3 row1 1 2 3 row2 4 5 6 row3 7 8 9
Exercise 36:
Transpose a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
transposed_df = df.T
print(transposed_df)
Output:
0 1 2 X 1 2 3 Y 4 5 6
Exercise 37:
Set a column as the index of a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.set_index('X', inplace=True)
print(df)
Output:
Y X 1 4 2 5 3 6
Exercise 38:
Reset the index of a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df.set_index('X', inplace=True)
df.reset_index(inplace=True)
print(df)
Output:
X Y 0 1 4 1 2 5 2 3 6
Exercise 39:
Add a prefix or suffix to column names.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df = df.add_prefix('col_')
print(df)
Output:
col_X col_Y 0 1 4 1 2 5 2 3 6
Exercise 40:
Filter rows based on datetime index.
Solution:
import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=5, freq='D')
data = {'X': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data, index=date_range)
filtered_df = df['2020-01-03':'2020-01-05']
print(filtered_df)
Output:
X 2020-01-03 3 2020-01-04 4 2020-01-05 5
Exercise 41:
Create a DataFrame with duplicate rows and remove duplicates.
Solution:
import pandas as pd
data = {'X': [1, 2, 2, 3], 'Y': [4, 5, 5, 6]}
df = pd.DataFrame(data)
df.drop_duplicates(inplace=True)
print(df)
Output:
X Y 0 1 4 1 2 5 3 3 6
Exercise 42:
Create a DataFrame with hierarchical index.
Solution:
import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
print(df)
Output:
Value Group Number X 1 10 2 20 Y 1 30 2 40
Exercise 43:
Calculate the difference between consecutive rows in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 3, 6, 10]}
df = pd.DataFrame(data)
df['Difference'] = df['X'].diff()
print(df)
Output:
X Difference 0 1 NaN 1 3 2.0 2 6 3.0 3 10 4.0
Exercise 44:
Create a DataFrame with hierarchical columns.
Solution:
import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], ['C1', 'C2', 'C1', 'C2']]
columns = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Type'))
data = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
df = pd.DataFrame(data, columns=columns)
print(df)
Output:
Group X Y Type C1 C2 C1 C2 0 1 2 3 4 1 5 6 7 8 2 9 10 11 12
Exercise 45:
Filter rows based on the length of strings in a column.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.len() > 3]
print(filtered_df)
Output:
Empty DataFrame Columns: [X] Index: []
Exercise 46:
Calculate the percentage change between rows in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Pct_Change'] = df['X'].pct_change()
print(df)
Output:
X Pct_Change 0 1 NaN 1 2 1.000000 2 3 0.500000 3 4 0.333333
Exercise 47:
Create a DataFrame from a dictionary of Series.
Solution:
import pandas as pd
data = {'X': pd.Series([1, 2, 3]), 'Y': pd.Series([4, 5, 6])}
df = pd.DataFrame(data)
print(df)
Output:
X Y 0 1 4 1 2 5 2 3 6
Exercise 48:
Filter rows based on whether a column value is in a list.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}
df = pd.DataFrame(data)
filtered_df = df[df['X'].isin([2, 3])]
print(filtered_df)
Output:
X Y 1 2 6 2 3 7
Exercise 49:
Calculate the z-score of values in a DataFrame.
Solution:
import pandas as pd
import numpy as np
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
df['zscore_A'] = (df['X'] - np.mean(df['X'])) / np.std(df['X'])
print(df)
Output:
X Y zscore_A 0 1 4 -1.341641 1 2 5 -0.447214 2 3 6 0.447214 3 4 7 1.341641
Exercise 50:
Create a DataFrame with random integers and calculate descriptive statistics.
Solution:
import pandas as pd
import numpy as np
data = np.random.randint(1, 100, size=(5, 3))
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.describe())
Output:
X Y Z count 5.000000 5.000000 5.000000 mean 60.600000 71.800000 42.600000 std 38.435661 13.971399 12.218838 min 5.000000 53.000000 28.000000 25% 40.000000 64.000000 34.000000 50% 69.000000 72.000000 41.000000 75% 91.000000 82.000000 55.000000 max 98.000000 88.000000 55.000000
Exercise 51:
Calculate the rank of values in each column of a DataFrame.
Solution:
import pandas as pd
data = {'X': [3, 1, 4, 1], 'Y': [2, 3, 1, 4]}
df = pd.DataFrame(data)
df['Rank_A'] = df['X'].rank()
df['Rank_B'] = df['Y'].rank()
print(df)
Output:
X Y Rank_A Rank_B 0 3 2 3.0 2.0 1 1 3 1.5 3.0 2 4 1 4.0 1.0 3 1 4 1.5 4.0
Exercise 52:
Filter rows based on multiple string conditions.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba|qu')]
print(filtered_df)
Output:
X 1 bar 2 baz 3 qux
Exercise 53:
Create a DataFrame with random values and calculate the skewness.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'baz', 'qux']}
df = pd.DataFrame(data)
filtered_df = df[df['X'].str.contains('ba|qu')]
print(filtered_df)
Output:
X 1 bar 2 baz 3 qux
Exercise 54:
Create a DataFrame and calculate the kurtosis.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.kurt())
Output:
X 2.958407 Y -2.639654 Z 2.704430 dtype: float64
Exercise 55:
Calculate the cumulative product of a column in a DataFrame.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Product'] = df['X'].cumprod()
print(df)
Output:
X Cumulative_Product 0 1 1 1 2 2 2 3 6 3 4 24
Exercise 56:
Create a DataFrame and calculate the rolling standard deviation.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Rolling_Std'] = df['X'].rolling(window=3).std()
print(df)
Output:
X Rolling_Std 0 1 NaN 1 2 NaN 2 3 1.0 3 4 1.0 4 5 1.0 5 6 1.0
Exercise 57:
Create a DataFrame and calculate the expanding mean.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Mean'] = df['X'].expanding().mean()
print(df)
Output:
X Expanding_Mean 0 1 1.0 1 2 1.5 2 3 2.0 3 4 2.5 4 5 3.0 5 6 3.5
Exercise 58:
Create a DataFrame with random values and calculate the covariance matrix.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.cov())
Output:
X Y Z X 0.054079 0.007398 -0.031403 Y 0.007398 0.053211 -0.020480 Z -0.031403 -0.020480 0.048057
Exercise 59:
Create a DataFrame with random values and calculate the correlation matrix.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.corr())
Output:
X Y Z X 1.000000 -0.258187 0.541044 Y -0.258187 1.000000 -0.432419 Z 0.541044 -0.432419 1.000000
Exercise 60:
Create a DataFrame and calculate the rolling correlation between two columns.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6], 'Y': [6, 5, 4, 3, 2, 1]}
df = pd.DataFrame(data)
df['Rolling_Corr'] = df['X'].rolling(window=3).corr(df['Y'])
print(df)
Output:
X Y Rolling_Corr 0 1 6 NaN 1 2 5 NaN 2 3 4 -1.0 3 4 3 -1.0 4 5 2 -1.0 5 6 1 -1.0
Exercise 61:
Create a DataFrame and calculate the expanding variance.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Var'] = df['X'].expanding().var()
print(df)
Output:
X Expanding_Var 0 1 NaN 1 2 0.500000 2 3 1.000000 3 4 1.666667 4 5 2.500000 5 6 3.500000
Exercise 62:
Create a DataFrame with datetime index and resample by month.
Solution:
import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=100, freq='D')
data = {'X': range(100)}
df = pd.DataFrame(data, index=date_range)
monthly_df = df.resample('M').sum()
print(monthly_df)
Output:
X 2020-01-31 465 2020-02-29 1305 2020-03-31 2325 2020-04-30 855
Exercise 63:
Create a DataFrame and calculate the exponential moving average.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['EMA'] = df['X'].ewm(span=3, adjust=False).mean()
print(df)
Output:
X EMA 0 1 1.00000 1 2 1.50000 2 3 2.25000 3 4 3.12500 4 5 4.06250 5 6 5.03125
Exercise 64:
Create a DataFrame with random integers and calculate the mode.
Solution:
import pandas as pd
import numpy as np
data = np.random.randint(1, 10, size=(5, 3))
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.mode())
Output:
X Y Z 0 2 1.0 2.0 1 3 3.0 7.0 2 5 NaN NaN 3 6 NaN NaN 4 9 NaN NaN
Exercise 65:
Create a DataFrame and calculate the z-score of each column.
Solution:
import pandas as pd
import numpy as np
data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}
df = pd.DataFrame(data)
df['zscore_A'] = (df['X'] - np.mean(df['X'])) / np.std(df['X'])
df['zscore_B'] = (df['Y'] - np.mean(df['Y'])) / np.std(df['Y'])
print(df)
Output:
X Y zscore_A zscore_B 0 1 4 -1.341641 -1.341641 1 2 5 -0.447214 -0.447214 2 3 6 0.447214 0.447214 3 4 7 1.341641 1.341641
Exercise 66:
Create a DataFrame with random values and calculate the median.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.median())
Output:
X 0.787042 Y 0.477837 Z 0.696911 dtype: float64
Exercise 67:
Create a DataFrame and apply a custom function to each column.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df = df.apply(lambda x: x + 1)
print(df)
Output:
X Y 0 2 5 1 3 6 2 4 7
Exercise 68:
Create a DataFrame with hierarchical index and calculate the mean for each group.
Solution:
import pandas as pd
arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))
data = {'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data, index=index)
grouped_df = df.groupby('Group').mean()
print(grouped_df)
Output:
Value Group X 15.0 Y 35.0
Exercise 69:
Create a DataFrame and calculate the percentage of missing values in each column.
Solution:
import pandas as pd
data = {'X': [1, 2, None, 4], 'Y': [4, None, 6, 8]}
df = pd.DataFrame(data)
missing_percentage = df.isnull().mean() * 100
print(missing_percentage)
Output:
X 25.0 Y 25.0 dtype: float64
Exercise 70:
Create a DataFrame and apply a custom function to each row.
Solution:
import pandas as pd
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
df['Sum'] = df.apply(lambda row: row['X'] + row['Y'], axis=1)
print(df)
Output:
X Y Sum 0 1 4 5 1 2 5 7 2 3 6 9
Exercise 71:
Create a DataFrame with random values and calculate the quantiles.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
print(df.quantile([0.25, 0.5, 0.75]))
Output:
X Y Z 0.25 0.174265 0.184036 0.520573 0.50 0.468040 0.315593 0.644571 0.75 0.767870 0.436426 0.771297
Exercise 72:
Create a DataFrame and calculate the interquartile range (IQR).
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1
print(IQR)
Output:
X 0.354244 Y 0.329573 Z 0.245520 dtype: float64
Exercise 73:
Create a DataFrame with datetime index and calculate the rolling mean.
Solution:
import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=10, freq='D')
data = {'X': range(10)}
df = pd.DataFrame(data, index=date_range)
df['Rolling_Mean'] = df['X'].rolling(window=3).mean()
print(df)
Output:
X Rolling_Mean 2020-01-01 0 NaN 2020-01-02 1 NaN 2020-01-03 2 1.0 2020-01-04 3 2.0 2020-01-05 4 3.0 2020-01-06 5 4.0 2020-01-07 6 5.0 2020-01-08 7 6.0 2020-01-09 8 7.0 2020-01-10 9 8.0
Exercise 74:
Create a DataFrame and calculate the cumulative maximum.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 2, 1]}
df = pd.DataFrame(data)
df['Cumulative_Max'] = df['X'].cummax()
print(df)
Output:
X Cumulative_Max 0 1 1 1 2 2 2 3 3 3 2 3 4 1 3
Exercise 75:
Create a DataFrame and calculate the cumulative minimum.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 2, 1]}
df = pd.DataFrame(data)
df['Cumulative_Min'] = df['X'].cummin()
print(df)
Output:
X Cumulative_Min 0 1 1 1 2 1 2 3 1 3 2 1 4 1 1
Exercise 76:
Create a DataFrame with random values and calculate the cumulative variance.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Cumulative_Var'] = df['X'].expanding().var()
print(df)
Output:
X Y Z Cumulative_Var 0 0.315669 0.900791 0.404858 NaN 1 0.462000 0.463257 0.922495 0.010706 2 0.328968 0.200027 0.967625 0.006548 3 0.630370 0.992849 0.231884 0.021460 4 0.574397 0.968600 0.926893 0.020023 5 0.204077 0.889864 0.589022 0.027130 6 0.386806 0.630882 0.242157 0.022759 7 0.319831 0.935747 0.829739 0.020630 8 0.786435 0.377739 0.879458 0.034407 9 0.523467 0.077937 0.764476 0.031194
Exercise 77:
Create a DataFrame and apply a custom function to each element.
Solution:
import pandas as pd
# Create a DataFrame
data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}
df = pd.DataFrame(data)
# Define the custom function
def custom_function(x):
return x * 2
# Apply the function to each element using map on each column
df = df.apply(lambda col: col.map(custom_function))
# Print the DataFrame
print(df)
Output:
X Y 0 2 8 1 4 10 2 6 12
Exercise 78:
Create a DataFrame with random values and calculate the z-score for each element.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df = df.apply(lambda x: (x - x.mean()) / x.std(), axis=0)
print(df)
Output:
X Y Z 0 1.027393 0.656858 1.032853 1 0.674079 -1.277904 -0.220065 2 -0.996641 -0.298841 0.475217 3 -0.704831 0.919887 -1.288005
Exercise 79:
Create a DataFrame and calculate the cumulative sum for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Sum'] = df.groupby('X')['Y'].cumsum()
print(df)
Output:
X Y Cumulative_Sum 0 foo 1 1 1 bar 2 2 2 foo 3 4 3 bar 4 6
Exercise 80:
Create a DataFrame with random values and calculate the rank for each element.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df = df.rank()
print(df)
Output:
X Y Z 0 4.0 3.0 3.0 1 3.0 2.0 2.0 2 1.0 4.0 1.0 3 2.0 1.0 4.0
Exercise 81:
Create a DataFrame and calculate the cumulative product for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Cumulative_Product'] = df.groupby('X')['Y'].cumprod()
print(df)
Output:
X Y Cumulative_Product 0 foo 1 1 1 bar 2 2 2 foo 3 3 3 bar 4 8
Exercise 82:
Create a DataFrame with random values and calculate the expanding sum.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Sum'] = df['X'].expanding().sum()
print(df)
Output:
X Y Z Expanding_Sum 0 0.815750 0.062819 0.699743 0.815750 1 0.128772 0.843222 0.411903 0.944522 2 0.857516 0.219424 0.234460 1.802038 3 0.011010 0.774375 0.259412 1.813048
Exercise 83:
Create a DataFrame and calculate the expanding minimum for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Expanding_Min'] = df.groupby('X')['Y'].expanding().min().reset_index(level=0, drop=True)
print(df)
Output:
X Y Expanding_Min 0 foo 1 1.0 1 bar 2 2.0 2 foo 3 1.0 3 bar 4 2.0
Exercise 84:
Create a DataFrame with random values and calculate the expanding maximum for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Max'] = df.groupby('X')['Y'].expanding().max().reset_index(level=0, drop=True)
print(df)
Output:
X Y Z Expanding_Max 0 0.751392 0.015856 0.313990 0.015856 1 0.812436 0.701808 0.069307 0.701808 2 0.148614 0.838726 0.290646 0.838726 3 0.764419 0.586510 0.470466 0.586510
Exercise 85:
Create a DataFrame and calculate the expanding variance for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}
df = pd.DataFrame(data)
df['Expanding_Var'] = df.groupby('X')['Y'].expanding().var().reset_index(level=0, drop=True)
print(df)
Output:
X Y Expanding_Var 0 foo 1 NaN 1 bar 2 NaN 2 foo 3 2.0 3 bar 4 2.0
Exercise 86:
Create a DataFrame with random values and calculate the expanding standard deviation.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Std'] = df['X'].expanding().std()
print(df)
Output:
X Y Z Expanding_Std 0 0.693184 0.088273 0.109510 NaN 1 0.031186 0.163005 0.803467 0.468103 2 0.294881 0.409395 0.278145 0.333272 3 0.918778 0.854961 0.791329 0.397322
Exercise 87:
Create a DataFrame and calculate the expanding covariance.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4], 'Y': [4, 3, 2, 1]}
df = pd.DataFrame(data)
df['Expanding_Cov'] = df['X'].expanding().cov(df['Y'])
print(df)
Output:
X Y Expanding_Cov 0 1 4 NaN 1 2 3 -0.500000 2 3 2 -1.000000 3 4 1 -1.666667
Exercise 88:
Create a DataFrame with random values and calculate the expanding correlation.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(4, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Expanding_Corr'] = df['X'].expanding().corr(df['Y'])
print(df)
Output:
X Y Z Expanding_Corr 0 0.094026 0.320246 0.044218 NaN 1 0.422531 0.002172 0.995907 -1.000000 2 0.265459 0.391239 0.589878 -0.751147 3 0.118812 0.061489 0.837821 -0.372750
Exercise 89:
Create a DataFrame and calculate the expanding median.
Solution:
import pandas as pd
data = {'X': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['Expanding_Median'] = df['X'].expanding().median()
print(df)
Output:
X Expanding_Median 0 1 1.0 1 2 1.5 2 3 2.0 3 4 2.5 4 5 3.0 5 6 3.5
Exercise 90:
Create a DataFrame with datetime index and calculate the expanding mean for each group.
Solution:
import pandas as pd
date_range = pd.date_range(start='1/1/2020', periods=10, freq='D')
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data, index=date_range)
df['Expanding_Mean'] = df.groupby('X')['Y'].expanding().mean().reset_index(level=0, drop=True)
print(df)
Output:
X Y Expanding_Mean 2020-01-01 foo 0 0.0 2020-01-02 bar 1 1.0 2020-01-03 foo 2 1.0 2020-01-04 bar 3 2.0 2020-01-05 foo 4 2.0 2020-01-06 bar 5 3.0 2020-01-07 foo 6 3.0 2020-01-08 bar 7 4.0 2020-01-09 foo 8 4.0 2020-01-10 bar 9 5.0
Exercise 91:
Create a DataFrame with random values and calculate the rolling sum for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Sum'] = df.groupby('X')['Y'].rolling(window=3).sum().reset_index(level=0, drop=True)
print(df)
Output:
X Y Z Rolling_Sum 0 0.342706 0.579330 0.902681 NaN 1 0.182432 0.163406 0.156607 NaN 2 0.983085 0.052785 0.588865 NaN 3 0.756982 0.123991 0.704262 NaN 4 0.876875 0.710953 0.923588 NaN 5 0.359818 0.135520 0.277327 NaN 6 0.693156 0.590918 0.985834 NaN 7 0.892253 0.633529 0.169000 NaN 8 0.084238 0.007579 0.076730 NaN 9 0.663869 0.780832 0.644874 NaN
Exercise 92:
Create a DataFrame and calculate the rolling mean for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Mean'] = df.groupby('X')['Y'].rolling(window=3).mean().reset_index(level=0, drop=True)
print(df)
Output:
X Y Rolling_Mean 0 foo 0 NaN 1 bar 1 NaN 2 foo 2 NaN 3 bar 3 NaN 4 foo 4 2.0 5 bar 5 3.0 6 foo 6 4.0 7 bar 7 5.0 8 foo 8 6.0 9 bar 9 7.0
Exercise 93:
Create a DataFrame with random values and calculate the rolling standard deviation for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Std'] = df.groupby('X')['Y'].rolling(window=3).std().reset_index(level=0, drop=True)
print(df)
Output:
X Y Z Rolling_Std 0 0.154838 0.162793 0.808882 NaN 1 0.740167 0.920318 0.650240 NaN 2 0.033449 0.007883 0.249656 NaN 3 0.983601 0.261995 0.399816 NaN 4 0.883155 0.051084 0.125735 NaN 5 0.986930 0.470328 0.612276 NaN 6 0.981338 0.016731 0.627210 NaN 7 0.670522 0.247346 0.530971 NaN 8 0.978909 0.752500 0.903401 NaN 9 0.185614 0.362602 0.541459 NaN
Exercise 94:
Create a DataFrame and calculate the rolling variance for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Var'] = df.groupby('X')['Y'].rolling(window=3).var().reset_index(level=0, drop=True)
print(df)
Output:
X Y Rolling_Var 0 foo 0 NaN 1 bar 1 NaN 2 foo 2 NaN 3 bar 3 NaN 4 foo 4 4.0 5 bar 5 4.0 6 foo 6 4.0 7 bar 7 4.0 8 foo 8 4.0 9 bar 9 4.0
Exercise 95:
Create a DataFrame with random values and calculate the rolling correlation for each group.
Solution:
import pandas as pd
import numpy as np
# Create a DataFrame with random values
np.random.seed(42) # For reproducibility
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
# Optionally create a group column if necessary
df['Group'] = np.random.choice(['A', 'B'], size=10)
# Calculate the rolling correlation for each group
df['Rolling_Corr'] = df.groupby('Group').apply(lambda group: group['Y'].rolling(window=3).corr(group['Z'])).reset_index(level=0, drop=True)
print(df)
Output:
X Z Group Rolling_Corr 0 0.374540 0.950714 0.731994 A NaN 1 0.598658 0.156019 0.155995 A NaN 2 0.058084 0.866176 0.601115 A 0.992633 3 0.708073 0.020584 0.969910 A -0.095420 4 0.832443 0.212339 0.181825 A -0.180021 5 0.183405 0.304242 0.524756 B NaN 6 0.431945 0.291229 0.611853 B NaN 7 0.139494 0.292145 0.366362 A -0.869948 8 0.456070 0.785176 0.199674 B -0.984073 9 0.514234 0.592415 0.046450 B -0.788379
Exercise 96:
Create a DataFrame and calculate the rolling covariance for each group.
Solution:
import pandas as pd
# Create a DataFrame with sample data
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
'Y': range(10), 'Z': range(10, 20)}
df = pd.DataFrame(data)
# Calculate the rolling covariance for each group
rolling_cov = df.groupby('X').apply(lambda group: group['Y'].rolling(window=3).cov(group['Z'])).reset_index(level=0, drop=True)
# Add the rolling covariance to the original DataFrame
df['Rolling_Cov'] = rolling_cov
print(df)
Output:
X Y Z Rolling_Cov 0 foo 0 10 NaN 1 bar 1 11 NaN 2 foo 2 12 NaN 3 bar 3 13 NaN 4 foo 4 14 4.0 5 bar 5 15 4.0 6 foo 6 16 4.0 7 bar 7 17 4.0 8 foo 8 18 4.0 9 bar 9 19 4.0
Exercise 97:
Create a DataFrame with random values and calculate the rolling skewness for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Skew'] = df.groupby('X')['Y'].rolling(window=3).skew().reset_index(level=0, drop=True)
print(df)
Output:
X Y Z Rolling_Skew 0 0.808397 0.304614 0.097672 NaN 1 0.684233 0.440152 0.122038 NaN 2 0.495177 0.034389 0.909320 NaN 3 0.258780 0.662522 0.311711 NaN 4 0.520068 0.546710 0.184854 NaN 5 0.969585 0.775133 0.939499 NaN 6 0.894827 0.597900 0.921874 NaN 7 0.088493 0.195983 0.045227 NaN 8 0.325330 0.388677 0.271349 NaN 9 0.828738 0.356753 0.280935 NaN
Exercise 98:
Create a DataFrame and calculate the rolling kurtosis for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Rolling_Kurt'] = df.groupby('X')['Y'].rolling(window=3).kurt().reset_index(level=0, drop=True)
print(df)
Output:
X Y Rolling_Kurt 0 foo 0 NaN 1 bar 1 NaN 2 foo 2 NaN 3 bar 3 NaN 4 foo 4 NaN 5 bar 5 NaN 6 foo 6 NaN 7 bar 7 NaN 8 foo 8 NaN 9 bar 9 NaN
Exercise 99:
Create a DataFrame with random values and calculate the rolling median for each group.
Solution:
import pandas as pd
import numpy as np
data = np.random.rand(10, 3)
df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])
df['Rolling_Median'] = df.groupby('X')['Y'].rolling(window=3).median().reset_index(level=0, drop=True)
print(df)
Output:
X Y Z Rolling_Median 0 0.542696 0.140924 0.802197 NaN 1 0.074551 0.986887 0.772245 NaN 2 0.198716 0.005522 0.815461 NaN 3 0.706857 0.729007 0.771270 NaN 4 0.074045 0.358466 0.115869 NaN 5 0.863103 0.623298 0.330898 NaN 6 0.063558 0.310982 0.325183 NaN 7 0.729606 0.637557 0.887213 NaN 8 0.472215 0.119594 0.713245 NaN 9 0.760785 0.561277 0.770967 NaN
Exercise 100:
Create a DataFrame and calculate the expanding sum for each group.
Solution:
import pandas as pd
data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)
df['Expanding_Sum'] = df.groupby('X')['Y'].expanding().sum().reset_index(level=0, drop=True)
print(df)
Output:
X Y Expanding_Sum 0 foo 0 0.0 1 bar 1 1.0 2 foo 2 2.0 3 bar 3 4.0 4 foo 4 6.0 5 bar 5 9.0 6 foo 6 12.0 7 bar 7 16.0 8 foo 8 20.0 9 bar 9 25.0
Python-Pandas Code Editor:
More to Come !
Do not submit any solution of the above exercises at here, if you want to contribute go to the appropriate exercise page.
Test your Python skills with w3resource's quiz
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://w3resource.com/python-exercises/pandas/pandas_100_exercises_with_solutions.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics