Last modified: May 10, 2026

GroupBy & Aggregations in Polars

Grouping and aggregating data is a core task in data analysis. In Polars, the group_by() method is powerful and fast. It helps you split your data into groups. Then you apply functions to each group. This article will show you how to use it well.

Polars is built for speed. Its lazy evaluation and expressive API make grouping easy. You will learn the main patterns here. We will use clear examples and short code blocks.

What is GroupBy?

GroupBy is a process in data analysis. You split a DataFrame into groups based on one or more columns. Then you compute a summary statistic for each group.

For example, you can group sales data by region. Then you find the total sales per region. This is a common task in business reports.

In Polars, you call df.group_by("column"). This returns a GroupBy object. Then you chain an aggregation method like .agg().

Basic GroupBy Example

Let's start with a simple DataFrame. We have sales data with product and amount.

import polars as pl

# Create sample data
df = pl.DataFrame({
    "product": ["A", "B", "A", "B", "A"],
    "sales": [100, 200, 150, 250, 300]
})

print(df)

shape: (5, 2)
┌─────────┬───────┐
│ product ┆ sales │
│ ---     ┆ ---   │
│ str     ┆ i64   │
╞═════════╪═══════╡
│ A       ┆ 100   │
│ B       ┆ 200   │
│ A       ┆ 150   │
│ B       ┆ 250   │
│ A       ┆ 300   │
└─────────┴───────┘

Now we group by product and sum the sales.

# Group by product and aggregate sum
grouped = df.group_by("product").agg(pl.col("sales").sum())

print(grouped)

shape: (2, 2)
┌─────────┬───────┐
│ product ┆ sales │
│ ---     ┆ ---   │
│ str     ┆ i64   │
╞═════════╪═══════╡
│ A       ┆ 550   │
│ B       ┆ 450   │
└─────────┴───────┘

Notice how Polars automatically sorts the groups. The output is clean and fast.

Multiple Aggregations

You can apply many aggregations at once. This is one of the best features of Polars. Use .agg() with a list of expressions.

# Multiple aggregations per group
grouped = df.group_by("product").agg([
    pl.col("sales").sum().alias("total_sales"),
    pl.col("sales").mean().alias("avg_sales"),
    pl.col("sales").count().alias("count")
])

print(grouped)

shape: (2, 4)
┌─────────┬─────────────┬───────────┬───────┐
│ product ┆ total_sales ┆ avg_sales ┆ count │
│ ---     ┆ ---         ┆ ---       ┆ ---   │
│ str     ┆ i64         ┆ f64       ┆ u32   │
╞═════════╪═════════════╪═══════════╪═══════╡
│ A       ┆ 550         ┆ 183.33    ┆ 3     │
│ B       ┆ 450         ┆ 225.0     ┆ 2     │
└─────────┴─────────────┴───────────┴───────┘

We used .alias() to rename columns. This makes the output readable.

GroupBy with Multiple Columns

Sometimes you need to group by more than one column. For example, group by product and region.

# Add a region column
df2 = pl.DataFrame({
    "product": ["A", "A", "B", "B", "A"],
    "region": ["North", "South", "North", "South", "North"],
    "sales": [100, 150, 200, 250, 300]
})

# Group by two columns
grouped = df2.group_by(["product", "region"]).agg(pl.col("sales").sum())

print(grouped)

shape: (4, 3)
┌─────────┬────────┬───────┐
│ product ┆ region ┆ sales │
│ ---     ┆ ---    ┆ ---   │
│ str     ┆ str    ┆ i64   │
╞═════════╪════════╪═══════╡
│ A       ┆ North  ┆ 400   │
│ A       ┆ South  ┆ 150   │
│ B       ┆ North  ┆ 200   │
│ B       ┆ South  ┆ 250   │
└─────────┴────────┴───────┘

This is useful for hierarchical data.

Common Aggregation Functions

Polars has many built-in aggregation functions. Here are the most common ones.

sum() – total of values
mean() – average
median() – median value
min() and max() – smallest and largest
std() – standard deviation
var() – variance
count() – number of non-null values
first() and last() – first and last value in group

You can combine them in one .agg() call.

Working with Null Values

Null values can affect aggregation results. Polars handles them well. By default, most functions ignore nulls. But you can change this.

For example, pl.col("sales").sum() skips nulls. If you want to include them, use pl.col("sales").sum(null_handling="ignore").

Learn more about handling missing data in our guide on Handling Null & Missing Values in Polars.

GroupBy with Conditional Logic

You can filter groups after aggregation. Use the .filter() method on the grouped result.

# Group and then filter groups
grouped = df.group_by("product").agg(pl.col("sales").sum())

# Keep only groups with total sales > 500
filtered = grouped.filter(pl.col("sales") > 500)

print(filtered)

shape: (1, 2)
┌─────────┬───────┐
│ product ┆ sales │
│ ---     ┆ ---   │
│ str     ┆ i64   │
╞═════════╪═══════╡
│ A       ┆ 550   │
└─────────┴───────┘

This is helpful for removing small groups.

Using Expressions for Advanced Aggregations

Polars expressions are very powerful. You can create custom aggregations. For example, calculate the percentage of total sales per product.

# Calculate percentage of total
total_sales = df["sales"].sum()

grouped = df.group_by("product").agg([
    pl.col("sales").sum().alias("total"),
    (pl.col("sales").sum() / total_sales * 100).alias("percentage")
])

print(grouped)

shape: (2, 3)
┌─────────┬───────┬────────────┐
│ product ┆ total ┆ percentage │
│ ---     ┆ ---   ┆ ---        │
│ str     ┆ i64   ┆ f64        │
╞═════════╪═══════╪════════════╡
│ A       ┆ 550   ┆ 55.0       │
│ B       ┆ 450   ┆ 45.0       │
└─────────┴───────┴────────────┘

You can also use pl.element() for row-wise operations inside groups.

Sorting After GroupBy

You can sort the grouped result. Use .sort() on the final DataFrame.

# Group, aggregate, and sort
grouped = df.group_by("product").agg(pl.col("sales").sum())
sorted_grouped = grouped.sort("sales", descending=True)

print(sorted_grouped)

shape: (2, 2)
┌─────────┬───────┐
│ product ┆ sales │
│ ---     ┆ ---   │
│ str     ┆ i64   │
╞═════════╪═══════╡
│ A       ┆ 550   │
│ B       ┆ 450   │
└─────────┴───────┘

For more on sorting, check our Polars DataFrame Sorting & Slicing Guide.

GroupBy with Date Columns

Date grouping is common. You can group by year, month, or day. Use the dt namespace.

# Create date data
df_date = pl.DataFrame({
    "date": pl.date_range(start="2023-01-01", end="2023-01-10", interval="1d"),
    "value": range(1, 11)
})

# Group by month
grouped = df_date.group_by(pl.col("date").dt.month()).agg(pl.col("value").sum())

print(grouped)

shape: (1, 2)
┌─────────┬───────┐
│ month   ┆ value │
│ ---     ┆ ---   │
│ i8      ┆ i32   │
╞═════════╪═══════╡
│ 1       ┆ 55    │
└─────────┴───────┘

Learn more about date operations in Date & Time in Polars with dt Namespace.

GroupBy with String Columns

String columns can be grouped too. You can also apply string functions inside groups.

# Group by string length
df_str = pl.DataFrame({
    "name": ["Alice", "Bob", "Charlie", "David"],
    "score": [85, 92, 78, 95]
})

grouped = df_str.group_by(pl.col("name").str.lengths().alias("name_len")).agg(pl.col("score").mean())

print(grouped)

shape: (3, 2)
┌──────────┬───────┐
│ name_len ┆ score │
│ ---      ┆ ---   │
│ u32      ┆ f64   │
╞══════════╪═══════╡
│ 3        ┆ 88.5  │
│ 5        ┆ 78.0  │
│ 7        ┆ 95.0  │
└──────────┴───────┘

For string operations, see our guide on Polars String Operations with pl.Expr.str.

Performance Tips

Polars is already fast. But you can make it even faster. Use lazy evaluation with pl.LazyFrame. Then call .collect() at the end.

# Lazy GroupBy
lazy_df = df.lazy()
result = (lazy_df
    .group_by("product")
    .agg(pl.col("sales").sum())
    .collect()
)

print(result)

Lazy mode optimizes the query plan. It is best for large datasets.

Conclusion

GroupBy and aggregations are essential in Polars. You learned how to group by one or many columns. You saw how to apply multiple aggregations. You also learned about filtering, sorting, and working with dates or strings.

Polars makes grouping fast and expressive. Use .agg() with expressions for full control. Practice with your own data to master these patterns.

For more on Polars expressions, read our Polars Expression System Core Concept guide. It will deepen your understanding.