w3resource

Performance comparison of cumulative Sum calculation in Pandas


14. Cumulative Sum: cumsum() vs. For Loop

Write a Pandas program to compare the performance of calculating the cumulative sum of a column using the “cumsum” method vs. using a "for" loop.

Sample Solution :

Python Code :

# Import necessary libraries
import pandas as pd
import numpy as np
import time

# Create a sample DataFrame
num_rows = 1000000
df = pd.DataFrame({'value': np.random.randn(num_rows)})

# Measure time for cumsum method
start_time = time.time()
cumsum_result = df['value'].cumsum()
end_time = time.time()
cumsum_time = end_time - start_time

# Measure time for for loop method
start_time = time.time()
cumsum_for_loop = np.zeros(num_rows)
cumsum_for_loop[0] = df['value'].iloc[0]
for i in range(1, num_rows):
    cumsum_for_loop[i] = cumsum_for_loop[i-1] + df['value'].iloc[i]
end_time = time.time()
for_loop_time = end_time - start_time

# Print the time taken for each method
print(f"Time taken using cumsum method: {cumsum_time:.6f} seconds")
print(f"Time taken using for loop: {for_loop_time:.6f} seconds")

Output:

Time taken using cumsum method: 0.006079 seconds
Time taken using for loop: 8.145854 seconds

Explanation:

  • Import Libraries:
    • Import pandas, numpy, and time.
  • Create DataFrame:
    • Generate a sample DataFrame with 1,000,000 rows.
  • Time Measurement for cumsum Method:
    • Measure the time taken to calculate the cumulative sum using the cumsum method.
  • Time Measurement for for Loop:
    • Measure the time taken to calculate the cumulative sum using a for loop.
  • Print Results:
    • Print the time taken for each method.

For more Practice: Solve these Related Problems:

  • Write a Pandas program to calculate the cumulative sum of a DataFrame column using the cumsum() method and time it.
  • Write a Pandas program to implement cumulative sum using a for loop and compare its performance with cumsum().
  • Write a Pandas program to benchmark the efficiency of the built-in cumsum() versus an iterative approach for a large DataFrame.
  • Write a Pandas program to compute and compare cumulative sums of a column using both vectorized and loop methods.

Go to:


Previous: Performance comparison of Resampling time Series data in Pandas.
Next: Optimize string operations in Pandas: str accessor vs. apply.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.