Pandas: Data Manipulation - factorize() function
factorize() function
The factorize() function is used to encode the object as an enumerated type or categorical variable. This method is useful for obtaining a numeric representation of an array when all that matters is identifying distinct values.
Syntax:
pandas.factorize(values, sort=False, order=None, na_sentinel=-1, size_hint=None)
Parameters:
Name | Description | Type | Default Value | Required / Optional |
---|---|---|---|---|
values | A 1-D sequence. Sequences that aren’t pandas objects are coerced to ndarrays before factorization. | sequence | Required | |
prefix | Sort uniques and shuffle labels to maintain the relationship. | bool | Default: False | Optional |
na_sentinel | Value to mark “not found”. | int | Default:1 | Optional |
size_hint | Hint to the hashtable sizer. | int | Optional |
Returns: labels: ndarray - An integer ndarray that’s an indexer into uniques. uniques.take(labels) will have the same values as values.
uniques: ndarray, Index, or Categorical - The unique valid values.
When values is Categorical, uniques is a Categorical.
When values is some other pandas object, an Index is returned. Otherwise, a 1-D ndarray is returned.
Note: Even if there’s a missing value in values, uniques will not contain an entry for it.
Example:
Download the Pandas DataFrame Notebooks from here.
Previous: get_dummies() function
Next: unique() function
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics