Optimize memory usage with Categorical data type in Pandas DataFrame
Pandas: Performance Optimization Exercise-8 with Solution
Write a Pandas program to create a DataFrame with categorical data and use the category data type to optimize memory usage. Measure the performance difference.
Sample Solution :
Python Code :
import pandas as pd # Import the Pandas library
import numpy as np # Import the NumPy library
# Create a sample DataFrame with categorical data
np.random.seed(0) # Set seed for reproducibility
data = {
'Category': np.random.choice(['A', 'B', 'C', 'D'], size=1000000),
'Values': np.random.randint(1, 100, size=1000000)
}
df = pd.DataFrame(data)
# Print memory usage before optimization
print("Memory usage before optimization:")
print(df.info(memory_usage='deep'))
# Convert the 'Category' column to the category data type
df['Category'] = df['Category'].astype('category')
# Print memory usage after optimization
print("\nMemory usage after optimization:")
print(df.info(memory_usage='deep'))
Output:
Memory usage before optimization: <class 'pandas.core.frame.DataFrame'> RangeIndex: 1000000 entries, 0 to 999999 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Category 1000000 non-null object 1 Values 1000000 non-null int32 dtypes: int32(1), object(1) memory usage: 59.1 MB None Memory usage after optimization: <class 'pandas.core.frame.DataFrame'> RangeIndex: 1000000 entries, 0 to 999999 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Category 1000000 non-null category 1 Values 1000000 non-null int32 dtypes: category(1), int32(1) memory usage: 4.8 MB None
Explanation:
- Import Libraries:
- Import the Pandas library for data manipulation.
- Import the NumPy library for generating random data.
- Create a Sample DataFrame with Categorical Data:
- Set a seed for reproducibility using np.random.seed(0).
- Create a dictionary data with a 'Category' column containing random category labels and a 'Values' column containing random integers.
- Generate a DataFrame df using the dictionary.
- Print Memory Usage Before Optimization:
- Use df.info(memory_usage='deep') to display the memory usage of the DataFrame before optimization.
- Convert Column to Category Data Type:
- Use the astype method to convert the 'Category' column to the category data type.
- Print Memory Usage After Optimization:
- Use df.info(memory_usage='deep') to display the memory usage of the DataFrame after optimization.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Compare DataFrame merge using merge method vs. nested for loop in Pandas.
Next: Compare DataFrame element-wise multiplication using for loop vs. * Operator.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://w3resource.com/python-exercises/pandas/optimize-memory-usage-with-categorical-data-type-in-pandas-dataframe.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics