w3resource

Performance comparison of DataFrame filtering in Pandas


12. Query Method vs. Boolean Indexing

Write a Pandas program that uses the query method to filter rows of a DataFrame based on a condition. Compare the performance with boolean indexing.

Sample Solution :

Python Code :

# Import necessary libraries
import pandas as pd
import numpy as np
import time

# Create a sample DataFrame
num_rows = 1000000
df = pd.DataFrame({
    'A': np.random.randint(0, 100, size=num_rows),
    'B': np.random.randn(num_rows),
    'C': np.random.rand(num_rows)
})

# Define the condition
condition = 'A > 50 and B < 0'

# Measure time for query method
start_time = time.time()
result_query = df.query(condition)
end_time = time.time()
query_time = end_time - start_time

# Measure time for boolean indexing
start_time = time.time()
result_boolean_indexing = df[(df['A'] > 50) & (df['B'] < 0)]
end_time = time.time()
boolean_indexing_time = end_time - start_time

# Print the time taken for each method
print(f"Time taken using query method: {query_time:.6f} seconds")
print(f"Time taken using boolean indexing: {boolean_indexing_time:.6f} seconds")

Output:

Time taken using query method: 0.021941 seconds
Time taken using boolean indexing: 0.008976 seconds

Explanation:

  • Import Libraries:
    • Import pandas, numpy, and time.
  • Create DataFrame:
    • Generate a sample DataFrame with 1,000,000 rows.
  • Define Condition:
    • Set a condition for filtering rows.
  • Time Measurement for query Method:
    • Measure the time taken to filter rows using the query method.
  • Time Measurement for Boolean Indexing:
    • Measure the time taken to filter rows using boolean indexing.
  • Print Results:
    • Print the time taken for each method.

For more Practice: Solve these Related Problems:

  • Write a Pandas program to filter rows using the query() method and compare the speed with boolean indexing.
  • Write a Pandas program to apply a filter condition on a DataFrame using both query() and direct indexing, then benchmark the performance.
  • Write a Pandas program to use query() for filtering a large DataFrame and compare its execution time with standard conditional selection.
  • Write a Pandas program to analyze and report the performance difference between query() and boolean indexing in DataFrame filtering.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Measure concatenation time of DataFrames in Pandas.
Next: Performance comparison of Resampling time Series data in Pandas.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.