Last modified: Dec 04, 2024 By Alexander Williams

Python Pandas agg(): Aggregate Data in DataFrames

The agg() function in Python Pandas allows you to perform multiple aggregation operations on a DataFrame or Series. It is versatile and can be used to apply various functions like sum, mean, count, and many others.

What is the agg() Function?

The agg() function stands for “aggregate” and helps to compute one or more aggregation operations on a DataFrame or Series. You can use it to apply functions across columns or rows, making it an essential tool for data analysis in Python.

Syntax of agg()

The general syntax of the agg() function is:


DataFrame.agg(func, axis=0, *args, **kwargs)

Where:

  • func: The aggregation function(s) to apply. This can be a single function or a list of functions.
  • axis: The axis along which to apply the function. Use 0 for rows and 1 for columns.
  • args, kwargs: Additional arguments passed to the function.

Example 1: Basic Usage of agg() with Single Function

Let’s apply the agg() function to calculate the sum of each column in a DataFrame.


import pandas as pd

# Sample DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

# Apply aggregation
result = df.agg('sum')

print(result)


A     6
B    15
dtype: int64

In this example, the sum is computed for each column using agg('sum').

Example 2: Using Multiple Functions in agg()

You can also pass multiple functions to the agg() method. Let’s compute both the mean and sum of each column.


# Apply multiple functions
result = df.agg(['mean', 'sum'])

print(result)


       A     B
mean  2.0   5.0
sum   6.0  15.0

Here, we used the agg() method with a list of functions to calculate both the mean and sum for each column.

Example 3: Aggregation Across Rows

You can also aggregate across rows by setting the axis parameter to 1.


# Apply aggregation across rows
result = df.agg('sum', axis=1)

print(result)


0     5
1     7
2     9
dtype: int64

In this case, the sum is calculated across each row.

Using agg() with Custom Functions

In addition to built-in aggregation functions like sum and mean, you can pass custom functions to agg().


# Define custom function
def custom_func(x):
    return x.max() - x.min()

# Apply custom function
result = df.agg(custom_func)

print(result)


A    2
B    2
dtype: int64

The custom function calculates the range (difference between max and min) for each column.

Conclusion

The agg() function in Python Pandas is a powerful tool for performing aggregation operations on DataFrames or Series. You can apply a wide range of functions, from built-in to custom, on either rows or columns. This flexibility makes it an essential function for data analysis.

If you want to further manipulate your DataFrame's structure, you may also want to explore other Pandas functions such as reset_index() or set_index().