w3resource

Compare DataFrame element-wise multiplication using for loop vs. * Operator

Pandas: Performance Optimization Exercise-9 with Solution

Write a Pandas program that performs element-wise multiplication on a DataFrame using a for loop vs. using the * operator. Compare the performance.

Sample Solution :

Python Code :

import pandas as pd  # Import the Pandas library
import numpy as np  # Import the NumPy library
import time  # Import the time module to measure execution time

# Create a sample DataFrame
np.random.seed(0)  # Set seed for reproducibility
data = {
    'A': np.random.randint(1, 100, size=1000000),
    'B': np.random.randint(1, 100, size=1000000)
}
df = pd.DataFrame(data)

# Perform element-wise multiplication using a for loop
start_time = time.time()  # Record the start time
result_for_loop = []
for index, row in df.iterrows():
    result_for_loop.append(row['A'] * row['B'])
result_for_loop = pd.Series(result_for_loop)
time_for_loop = time.time() - start_time  # Calculate the time taken

# Perform element-wise multiplication using the * operator
start_time = time.time()  # Record the start time
result_vectorized = df['A'] * df['B']
time_vectorized = time.time() - start_time  # Calculate the time taken

# Print the time taken for both methods
print("Time taken using for loop:", time_for_loop, "seconds")
print("Time taken using * operator:", time_vectorized, "seconds")

Output:

Time taken using for loop: 37.052802324295044 seconds
Time taken using * operator: 0.0019948482513427734 seconds

Explanation:

  • Import Libraries:
    • Import the Pandas library for data manipulation.
    • Import the NumPy library for generating random data.
    • Import the time module to measure execution time.
  • Create a Sample DataFrame:
    • Set a seed for reproducibility using np.random.seed(0).
    • Create a dictionary data with columns 'A' and 'B' containing random integers.
    • Generate a DataFrame df using the dictionary.
  • Perform Element-wise Multiplication Using a for loop:
    • Record the start time using time.time().
    • Initialize an empty list result_for_loop to store the multiplication results.
    • Iterate through each row in the DataFrame using a for loop with "df.iterrows()". Multiply the values in columns 'A' and 'B' and append the result to result_for_loop.
    • Convert 'result_for_loop' to a Pandas Series.
    • Calculate the time taken by subtracting the start time from the current time.
  • Perform Element-wise Multiplication Using the * Operator:
    • Record the start time using time.time().
    • Use the * operator to perform element-wise multiplication of columns 'A' and 'B'.
    • Store the result in 'result_vectorized'.
    • Calculate the time taken by subtracting the start time from the current time.
  • Finally display the time taken for both the for loop method and the * operator method.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Optimize memory usage with Categorical data type in Pandas DataFrame.
Next: Compare performance of eval method vs. standard operations in Pandas.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.