w3resource

Splitting a Dataset into training and testing sets using Pandas


9. Splitting Dataset into Training and Testing Sets

Write a Pandas program that splits Dataset into Training and Testing sets.

This exercise shows how to split a dataset into training and testing sets using Scikit-learn's train_test_split().

Sample Solution :

Code :

import pandas as pd
from sklearn.model_selection import train_test_split

# Load the dataset
df = pd.read_csv('data.csv')

# Split the dataset into features and target
X = df.drop('Target', axis=1)
y = df['Target']

# Split the dataset into training and testing sets (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Output the size of the training and testing sets
print(f"Training set size: {len(X_train)}")
print(f"Testing set size: {len(X_test)}")

Output:

Training set size: 4
Testing set size: 2

Explanation:

  • Loaded the dataset and split it into features (X) and target (y).
  • Used train_test_split() to split the dataset into training and testing sets with an 80-20 ratio.
  • Displayed the size of the training and testing sets.

For more Practice: Solve these Related Problems:

  • Write a Pandas program to split a dataset into training and testing sets with stratified sampling based on a categorical column.
  • Write a Pandas program to split a DataFrame into training and testing sets while maintaining the original index order.
  • Write a Pandas program to partition a dataset into multiple sets for cross-validation and report the sizes of each set.
  • Write a Pandas program to split a dataset into training and testing sets and then save each set into separate CSV files.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.