Efficiently apply multiple Aggregation functions in Pandas
Pandas: Performance Optimization Exercise-19 with Solution
Write a Python program that uses the agg method to apply multiple aggregation functions to a DataFrame and compares the performance with applying each function individually.
Sample Solution :
Python Code :
# Import necessary libraries
import pandas as pd
import numpy as np
import time
# Create a sample DataFrame
np.random.seed(0)
df = pd.DataFrame({
'A': np.random.randint(1, 100, 1000),
'B': np.random.rand(1000),
'C': np.random.randint(1, 100, 1000)
})
# Define aggregation functions
aggregations = {
'A': ['sum', 'mean', 'std'],
'B': ['sum', 'mean', 'std'],
'C': ['sum', 'mean', 'std']
}
# Timing the agg method
start_time_agg = time.time()
df_agg = df.agg(aggregations)
time_agg = time.time() - start_time_agg
# Timing the individual application of functions
start_time_individual = time.time()
results_individual = {
'A_sum': df['A'].sum(),
'A_mean': df['A'].mean(),
'A_std': df['A'].std(),
'B_sum': df['B'].sum(),
'B_mean': df['B'].mean(),
'B_std': df['B'].std(),
'C_sum': df['C'].sum(),
'C_mean': df['C'].mean(),
'C_std': df['C'].std()
}
time_individual = time.time() - start_time_individual
# Print results
print(f"Time using agg method: {time_agg:.6f} seconds")
print(f"Time applying functions individually: {time_individual:.6f} seconds")
print("Aggregated results using agg method:")
print(df_agg)
print("Results applying functions individually:")
print(results_individual)
Output:
Time using agg method: 0.001994 seconds Time applying functions individually: 0.000000 seconds Aggregated results using agg method: A B C sum 49723.000000 509.199400 48276.000000 mean 49.723000 0.509199 48.276000 std 28.857183 0.296208 28.470799 Results applying functions individually: {'A_sum': 49723, 'A_mean': 49.723, 'A_std': 28.857182953434812, 'B_sum': 509.19940043113445, 'B_mean': 0.5091994004311344, 'B_std': 0.2962083809189193, 'C_sum': 48276, 'C_mean': 48.276, 'C_std': 28.47079925837016}
Explanation:
- Import necessary libraries:
- Import pandas, numpy, and time.
- Create a sample DataFrame:
- Random data is generated for columns 'A', 'B', and 'C'.
- Define aggregation functions:
- A dictionary specifying the functions to be applied to each column.
- Timing the agg method:
- Measure the time taken to apply multiple aggregations using the agg method.
- Timing the individual application of functions:
- Measure the time taken to apply each function individually.
- Finally compare the times and show the results from both methods
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Rolling Window Calculation in Pandas: rolling vs. Manual.
Next: Optimize reading large Excel files with Pandas.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://w3resource.com/python-exercises/pandas/efficiently-apply-multiple-aggregation-functions-in-pandas.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics