Last modified: Dec 08, 2024 By Alexander Williams

Python Pandas value_counts() Simplified

The value_counts() function in Pandas is a powerful tool for counting unique values in a Series. It's essential for data analysis and exploration.

What is value_counts() in Pandas?

The value_counts() function returns a Series containing counts of unique values, sorted in descending order. It's a quick way to summarize data.

Basic Syntax of value_counts()


Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

Parameters:

  • normalize: If True, returns proportions instead of counts.
  • sort: Sort results in descending order by default.
  • ascending: Changes sorting to ascending order if set to True.
  • bins: Bins numerical data into intervals.
  • dropna: Excludes NaN values if True.

Counting Unique Values in a Series

Here's how you can count unique values using value_counts():


import pandas as pd

data = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
series = pd.Series(data)

# Count unique values
value_counts = series.value_counts()
print(value_counts)


apple     3
banana    2
orange    1
dtype: int64

Returning Proportions Instead of Counts

Set normalize=True to get proportions:


proportions = series.value_counts(normalize=True)
print(proportions)


apple     0.5
banana    0.333333
orange    0.166667
dtype: float64

Sorting Results in Ascending Order

To sort results in ascending order, use ascending=True:


ascending_counts = series.value_counts(ascending=True)
print(ascending_counts)


orange    1
banana    2
apple     3
dtype: int64

Binning Numerical Data

Use the bins parameter to bin numerical data into intervals:


numerical_data = pd.Series([10, 20, 30, 40, 50, 60, 70])

# Bin data into intervals
binned_counts = numerical_data.value_counts(bins=3)
print(binned_counts)


(9.95, 30.0]     3
(30.0, 50.0]     2
(50.0, 70.0]     2
dtype: int64

Excluding NaN Values

By default, value_counts() excludes NaN. To include them, set dropna=False:


data_with_nan = pd.Series(['apple', 'banana', None, 'apple', None])

# Count values including NaN
counts_with_nan = data_with_nan.value_counts(dropna=False)
print(counts_with_nan)


apple     2
banana    1
NaN       2
dtype: int64

Practical Applications of value_counts()

The value_counts() function is widely used in data exploration, such as finding the distribution of categories or identifying missing data.

For related tasks, see our guide on grouping and aggregating data with Pandas groupby().

Conclusion

The value_counts() function is an essential tool in data analysis. It simplifies counting unique values and provides valuable insights into your data.

For further data cleaning tips, check out handling missing data with Pandas fillna().