Performance comparison of DataFrame filtering in Pandas
12. Query Method vs. Boolean Indexing
Write a Pandas program that uses the query method to filter rows of a DataFrame based on a condition. Compare the performance with boolean indexing.
Sample Solution :
Python Code :
# Import necessary libraries
import pandas as pd
import numpy as np
import time
# Create a sample DataFrame
num_rows = 1000000
df = pd.DataFrame({
'A': np.random.randint(0, 100, size=num_rows),
'B': np.random.randn(num_rows),
'C': np.random.rand(num_rows)
})
# Define the condition
condition = 'A > 50 and B < 0'
# Measure time for query method
start_time = time.time()
result_query = df.query(condition)
end_time = time.time()
query_time = end_time - start_time
# Measure time for boolean indexing
start_time = time.time()
result_boolean_indexing = df[(df['A'] > 50) & (df['B'] < 0)]
end_time = time.time()
boolean_indexing_time = end_time - start_time
# Print the time taken for each method
print(f"Time taken using query method: {query_time:.6f} seconds")
print(f"Time taken using boolean indexing: {boolean_indexing_time:.6f} seconds")
Output:
Time taken using query method: 0.021941 seconds Time taken using boolean indexing: 0.008976 seconds
Explanation:
- Import Libraries:
- Import pandas, numpy, and time.
- Create DataFrame:
- Generate a sample DataFrame with 1,000,000 rows.
- Define Condition:
- Set a condition for filtering rows.
- Time Measurement for query Method:
- Measure the time taken to filter rows using the query method.
- Time Measurement for Boolean Indexing:
- Measure the time taken to filter rows using boolean indexing.
- Print Results:
- Print the time taken for each method.
For more Practice: Solve these Related Problems:
- Write a Pandas program to filter rows using the query() method and compare the speed with boolean indexing.
- Write a Pandas program to apply a filter condition on a DataFrame using both query() and direct indexing, then benchmark the performance.
- Write a Pandas program to use query() for filtering a large DataFrame and compare its execution time with standard conditional selection.
- Write a Pandas program to analyze and report the performance difference between query() and boolean indexing in DataFrame filtering.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Measure concatenation time of DataFrames in Pandas.
Next: Performance comparison of Resampling time Series data in Pandas.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.