Last modified: Dec 02, 2024 By Alexander Williams

Python Pandas describe(): Data Summary Made Easy

Understanding your data's key metrics is vital for analysis. The describe() method in Pandas provides a quick summary of numerical data in your DataFrame.

This guide explains the syntax, parameters, and use cases of describe() with examples to make it easy for beginners.

What Is the describe() Method?

The describe() method generates descriptive statistics for your data. It summarizes central tendencies, dispersion, and count, making it a great starting point for exploration.

Syntax of describe()

The basic syntax is simple:


DataFrame.describe(percentiles=None, include=None, exclude=None)

By default, it returns statistics for numerical columns only.

Installing Pandas

Before using describe(), ensure Pandas is installed. Follow How to Install Pandas in Python for a step-by-step installation guide.


pip install pandas

Using describe(): Examples

Here’s how to use describe() in action:


import pandas as pd

# Sample DataFrame
data = {
    'Age': [23, 45, 31, 22, 35],
    'Height': [160, 175, 168, 155, 180],
    'Weight': [55, 72, 67, 50, 80]
}

df = pd.DataFrame(data)

# Generate summary statistics
print(df.describe())

Output:


            Age      Height     Weight
count   5.000000   5.000000   5.000000
mean   31.200000  167.600000  64.800000
std     8.773228   10.037828  11.019078
min    22.000000  155.000000  50.000000
25%    23.000000  160.000000  55.000000
50%    31.000000  168.000000  67.000000
75%    35.000000  175.000000  72.000000
max    45.000000  180.000000  80.000000

The output includes metrics like count, mean, standard deviation, min, max, and percentiles.

Exploring Parameters

Let’s break down the key parameters:

  • percentiles: Specifies custom percentiles to include.
  • include: Includes specific column data types.
  • exclude: Excludes specific column data types.

Example: Using percentiles


# Specify custom percentiles
df.describe(percentiles=[0.1, 0.9])

This adds the 10th and 90th percentiles to the summary.

Working with Non-Numerical Data

By default, describe() skips non-numerical data. To include it, use the include parameter:


# Include all columns
df.describe(include='all')

This provides a summary for all columns, including categorical data.

Use Cases of describe()

The describe() method is useful for:

  • Getting a quick data overview.
  • Identifying potential outliers.
  • Checking for missing values or data consistency.

If your analysis includes exporting data, explore Python Pandas to_csv() or Python Pandas to_excel() for seamless workflows.

Complementing describe() with Other Methods

For better insights, combine describe() with methods like head() to preview data or info() for structure analysis.

Key Takeaways

The describe() method is a versatile tool for summarizing data. It provides valuable metrics to understand your dataset at a glance.

Conclusion

Mastering describe() will enhance your ability to explore and understand data. Its simplicity and efficiency make it a must-know for any data enthusiast.