Pandas: Data Manipulation - factorize() function

Last update on August 19 2022 21:50:33 (UTC/GMT +8 hours)

factorize() function

The factorize() function is used to encode the object as an enumerated type or categorical variable. This method is useful for obtaining a numeric representation of an array when all that matters is identifying distinct values.

Syntax:

pandas.factorize(values, sort=False, order=None, na_sentinel=-1, size_hint=None)

Parameters:

Name	Description	Type	Default Value	Required / Optional
values	A 1-D sequence. Sequences that aren’t pandas objects are coerced to ndarrays before factorization.	sequence		Required
prefix	Sort uniques and shuffle labels to maintain the relationship.	bool	Default: False	Optional
na_sentinel	Value to mark “not found”.	int	Default:1	Optional
size_hint	Hint to the hashtable sizer.	int		Optional

Returns: labels: ndarray - An integer ndarray that’s an indexer into uniques. uniques.take(labels) will have the same values as values.
uniques: ndarray, Index, or Categorical - The unique valid values.
When values is Categorical, uniques is a Categorical.
When values is some other pandas object, an Index is returned. Otherwise, a 1-D ndarray is returned.

Note: Even if there’s a missing value in values, uniques will not contain an entry for it.

Example:

Download the Pandas DataFrame Notebooks from here.

Previous: get_dummies() function
Next: unique() function