w3resource

Performance comparison of DataFrame sorting in Pandas


17. Sorting Performance: sort_values() vs. Custom apply()

Write a Pandas program to measure the time taken to sort a large DataFrame using the sort_values method vs. using a custom sorting function with apply.

Sample Solution :

Python Code :

# Import necessary libraries
import pandas as pd
import numpy as np
import time

# Create a sample DataFrame
num_rows = 1000000
df = pd.DataFrame({
    'A': np.random.randint(0, 100, size=num_rows),
    'B': np.random.randn(num_rows)
})

# Measure time for sort_values method
start_time = time.time()
sorted_df = df.sort_values(by='A')
end_time = time.time()
sort_values_time = end_time - start_time

# Define a custom sorting function
def sort_custom(df):
    return df.sort_values(by='A')

# Measure time for apply method
start_time = time.time()
apply_sorted_df = df.apply(lambda x: x).sort_values(by='A')
end_time = time.time()
apply_time = end_time - start_time

# Print the time taken for each method
print(f"Time taken using sort_values method: {sort_values_time:.6f} seconds")
print(f"Time taken using apply method: {apply_time:.6f} seconds")

Output:

Time taken using sort_values method: 0.075608 seconds
Time taken using apply method: 0.075997 seconds

Explanation:

  • Import Libraries:
    • Import pandas, numpy, and time.
  • Create DataFrame:
    • Generate a sample DataFrame with 1,000,000 rows with columns 'A' and 'B'.
  • Time Measurement for sort_values Method:
    • Measure the time taken to sort the DataFrame using the sort_values method.
  • Define Custom Sorting Function:
    • Define a custom function for sorting.
  • Time Measurement for apply Method:
    • Measure the time taken to sort the DataFrame using the custom sorting function with the apply method.
  • Finally print the time taken for each method.

For more Practice: Solve these Related Problems:

  • Write a Pandas program to sort a large DataFrame using sort_values() and measure the execution time.
  • Write a Pandas program to sort a DataFrame by a custom key using apply() and compare the performance with sort_values().
  • Write a Pandas program to benchmark the sorting of a DataFrame using the built-in sort_values() versus a custom sorting function.
  • Write a Pandas program to analyze the time taken for DataFrame sorting using vectorized sort_values() compared to iterative sorting.

Go to:


Previous: Reshaping DataFrame in Pandas: pivot_table vs. manual Loop.
Next: Rolling Window Calculation in Pandas: rolling vs. Manual.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.