w3resource

Feature selection using variance threshold in Pandas

Pandas: Machine Learning Integration Exercise-12 with Solution

Write a Pandas program to select feature selection using variance threshold.

This exercise demonstrates how to select features based on their variance using Scikit-learn's VarianceThreshold.

Sample Solution :

Code :

import pandas as pd
from sklearn.feature_selection import VarianceThreshold

# Load the dataset
df = pd.read_csv('data.csv')

# Select only the numeric columns for feature selection
numeric_cols = df.select_dtypes(include=[float, int])

# Initialize the VarianceThreshold with a threshold of 0.1
selector = VarianceThreshold(threshold=0.1)

# Apply feature selection based on variance
X_selected = selector.fit_transform(numeric_cols)

# Output the selected features
print(X_selected)

Output:

[[1.0e+00 2.5e+01 5.0e+04 0.0e+00]
 [2.0e+00 3.0e+01 6.0e+04 1.0e+00]
 [3.0e+00 2.2e+01 7.0e+04 0.0e+00]
 [4.0e+00 3.5e+01 8.0e+04 1.0e+00]
 [5.0e+00     nan 5.5e+04 0.0e+00]
 [6.0e+00 2.9e+01     nan 1.0e+00]]

Explanation:

  • Import Libraries:
    • pandas is imported for handling data in DataFrame format.
    • VarianceThreshold from Scikit-learn is imported for performing feature selection based on variance.
  • Load the Dataset:
    • The dataset data.csv is loaded using pd.read_csv() and stored in the DataFrame df.
  • Select Numeric Columns:
    • select_dtypes(include=[float, int]) is used to select only the numeric columns from the dataset (e.g., Age, Salary) and exclude non-numeric columns like Name and Gender.
  • Initialize VarianceThreshold:
    • VarianceThreshold is initialized with a threshold of 0.1. Features with variance lower than this threshold will be removed.
  • Apply VarianceThreshold:
    • fit_transform() is applied to the numeric columns to perform feature selection, keeping only the features that have a variance greater than 0.1.
  • Output the Selected Features:
    • The resulting selected features are printed after the variance-based filtering.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Become a Patron!

Follow us on Facebook and Twitter for latest update.

It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.

https://w3resource.com/python-exercises/pandas/pandas-feature-selection-using-variance-threshold.php