w3resource

Applying Polynomial features for feature expansion in Pandas


14. Applying Polynomial Features for Feature Expansion

Write a Pandas program that applies Polynomial Features for feature expansion.

The following e exercise shows how to expand the feature set by generating polynomial features using Scikit-learn's PolynomialFeatures.

Sample Solution :

Code :

import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn.impute import SimpleImputer

# Load the dataset
df = pd.read_csv('data.csv')

# Select only numeric columns for polynomial feature expansion
numeric_cols = df.select_dtypes(include=[float, int])

# Impute missing values using the mean for numeric columns
imputer = SimpleImputer(strategy='mean')
numeric_cols_imputed = pd.DataFrame(imputer.fit_transform(numeric_cols), columns=numeric_cols.columns)

# Initialize PolynomialFeatures with degree 2
poly = PolynomialFeatures(degree=2)

# Apply polynomial feature expansion to imputed numeric data
X_poly = poly.fit_transform(numeric_cols_imputed)

# Output the expanded feature set
print(X_poly)

Output:

[[1.0000e+00 1.0000e+00 2.5000e+01 5.0000e+04 0.0000e+00 1.0000e+00
  2.5000e+01 5.0000e+04 0.0000e+00 6.2500e+02 1.2500e+06 0.0000e+00
  2.5000e+09 0.0000e+00 0.0000e+00]
 [1.0000e+00 2.0000e+00 3.0000e+01 6.0000e+04 1.0000e+00 4.0000e+00
  6.0000e+01 1.2000e+05 2.0000e+00 9.0000e+02 1.8000e+06 3.0000e+01
  3.6000e+09 6.0000e+04 1.0000e+00]
 [1.0000e+00 3.0000e+00 2.2000e+01 7.0000e+04 0.0000e+00 9.0000e+00
  6.6000e+01 2.1000e+05 0.0000e+00 4.8400e+02 1.5400e+06 0.0000e+00
  4.9000e+09 0.0000e+00 0.0000e+00]
 [1.0000e+00 4.0000e+00 3.5000e+01 8.0000e+04 1.0000e+00 1.6000e+01
  1.4000e+02 3.2000e+05 4.0000e+00 1.2250e+03 2.8000e+06 3.5000e+01
  6.4000e+09 8.0000e+04 1.0000e+00]
 [1.0000e+00 5.0000e+00 2.8200e+01 5.5000e+04 0.0000e+00 2.5000e+01
  1.4100e+02 2.7500e+05 0.0000e+00 7.9524e+02 1.5510e+06 0.0000e+00
  3.0250e+09 0.0000e+00 0.0000e+00]
 [1.0000e+00 6.0000e+00 2.9000e+01 6.3000e+04 1.0000e+00 3.6000e+01
  1.7400e+02 3.7800e+05 6.0000e+00 8.4100e+02 1.8270e+06 2.9000e+01
  3.9690e+09 6.3000e+04 1.0000e+00]]

Explanation:

  • Import Libraries:
    • pandas is imported for data manipulation.
    • PolynomialFeatures from Scikit-learn is imported for polynomial feature expansion.
    • SimpleImputer from Scikit-learn is imported to handle missing values (imputation).
  • Load the Dataset:
    • The dataset data.csv is loaded using pd.read_csv() into a DataFrame df.
  • Select Numeric Columns:
    • select_dtypes(include=[float, int]) is used to filter and select only numeric columns (Age, Salary), excluding non-numeric ones like Name and Gender.
  • Impute Missing Values:
    • SimpleImputer with strategy='mean' is used to replace any missing values (NaN) in the numeric columns with the mean of the respective columns.
    • The imputed numeric columns are stored in numeric_cols_imputed.
  • Initialize PolynomialFeatures:
    • PolynomialFeatures(degree=2) is initialized to generate polynomial and interaction features up to the second degree.
  • Apply Polynomial Feature Expansion:
    • fit_transform() is applied to the imputed numeric data (numeric_cols_imputed), creating new features such as squares and interactions between the original numeric columns.
  • Output the Expanded Feature Set:
    • The transformed feature set X_poly is printed, which contains both original and new polynomial features.

For more Practice: Solve these Related Problems:

  • Write a Pandas program to generate polynomial features for numeric columns up to a specified degree and append them to the DataFrame.
  • Write a Pandas program to create interaction terms between features using polynomial expansion and compare model performance.
  • Write a Pandas program to apply polynomial feature expansion selectively on columns based on their correlation with the target variable.
  • Write a Pandas program to generate polynomial features and then perform feature selection to remove redundant variables.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.