Last modified: Dec 08, 2024 By Alexander Williams
Python Pandas unique() Explained
The unique()
function in Pandas allows you to extract unique values from a Series. It's a simple yet powerful tool for data analysis and preprocessing.
What is unique() in Pandas?
The unique()
function returns an array of unique values from a Pandas Series, preserving the order of their appearance.
Basic Syntax of unique()
Series.unique()
Parameters: The unique()
function does not take any parameters.
Extracting Unique Values
Here’s an example of how to use unique()
to extract unique values:
import pandas as pd
data = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
series = pd.Series(data)
# Extract unique values
unique_values = series.unique()
print(unique_values)
['apple' 'banana' 'orange']
Handling Numerical Data
The unique()
function works seamlessly with numerical data:
numerical_data = pd.Series([1, 2, 3, 1, 2, 4])
# Extract unique values
unique_numbers = numerical_data.unique()
print(unique_numbers)
[1 2 3 4]
Comparison with value_counts()
While unique()
returns an array of unique values, value_counts()
provides counts of each value. Use them based on your needs.
For an in-depth guide, read our article on value_counts().
Use Case: Removing Duplicates
Combine unique()
with other functions to clean data:
# Removing duplicates using unique
cleaned_data = pd.Series(series.unique())
print(cleaned_data)
0 apple
1 banana
2 orange
dtype: object
Practical Applications of unique()
The unique()
function is commonly used to:
- Identify distinct categories in a dataset.
- Prepare data for grouping or aggregation.
- Remove duplicate values in preprocessing.
For more data analysis techniques, see our guide on grouping and aggregating data using groupby().
Conclusion
The unique()
function is a fundamental tool in Pandas for identifying distinct values. Its simplicity and efficiency make it invaluable for data exploration and cleaning.
Explore similar functionalities in our article on mapping functions with Pandas map().