Splitting a Dataset into training and testing sets using Pandas

Last update on May 06 2025 13:19:34 (UTC/GMT +8 hours)

9. Splitting Dataset into Training and Testing Sets

Write a Pandas program that splits Dataset into Training and Testing sets.

This exercise shows how to split a dataset into training and testing sets using Scikit-learn's train_test_split().

Sample Solution :

Code :

import pandas as pd
from sklearn.model_selection import train_test_split

# Load the dataset
df = pd.read_csv('data.csv')

# Split the dataset into features and target
X = df.drop('Target', axis=1)
y = df['Target']

# Split the dataset into training and testing sets (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Output the size of the training and testing sets
print(f"Training set size: {len(X_train)}")
print(f"Testing set size: {len(X_test)}")

Output:

Training set size: 4
Testing set size: 2

Explanation:

Loaded the dataset and split it into features (X) and target (y).
Used train_test_split() to split the dataset into training and testing sets with an 80-20 ratio.
Displayed the size of the training and testing sets.

For more Practice: Solve these Related Problems:

Write a Pandas program to split a dataset into training and testing sets with stratified sampling based on a categorical column.
Write a Pandas program to split a DataFrame into training and testing sets while maintaining the original index order.
Write a Pandas program to partition a dataset into multiple sets for cross-validation and report the sizes of each set.
Write a Pandas program to split a dataset into training and testing sets and then save each set into separate CSV files.

Go to:

Previous: Standardizing Numerical Data Using Z-Score Scaling.
Next: Removing Outliers from a Dataset.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.