Last modified: May 11, 2026 By Alexander Williams
Nested Data in Polars: Lists & Structs
Real-world data is rarely flat. You often encounter JSON files, logs, or API responses with nested structures. Polars handles this elegantly with two core nested types: List and Struct. This guide shows you how to work with them efficiently.
Understanding nested data is key to mastering Polars. It unlocks the ability to process complex datasets without flattening everything first. This saves memory and keeps your code clean.
We will explore creating, querying, and transforming nested columns. By the end, you will handle lists and structs with confidence.
What Are Lists and Structs in Polars?
A List column contains a sequence of values. Each element in the column is a list itself. All elements in a single list must share the same data type.
A Struct column holds multiple named fields. Think of it as a dictionary or a small table inside a single cell. Each field has its own name and data type.
These types are first-class citizens in Polars. You can use expressions on them just like on simple columns.
Creating Nested Columns
You can create nested columns from scratch or by aggregating existing data. Let's see both methods.
Creating a List Column
Use the pl.Series constructor or the pl.list() function. Here is a simple example.
import polars as pl
# Create a DataFrame with a List column
df = pl.DataFrame({
"id": [1, 2, 3],
"scores": [
[85, 90, 78],
[92, 88],
[70, 75, 80, 95]
]
})
print(df)
shape: (3, 2)
┌─────┬──────────────────┐
│ id ┆ scores │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪══════════════════╡
│ 1 ┆ [85, 90, 78] │
│ 2 ┆ [92, 88] │
│ 3 ┆ [70, 75, 80, 95] │
└─────┴──────────────────┘
Notice the type list[i64]. This tells Polars the column contains lists of 64-bit integers.
Creating a Struct Column
Use pl.struct() to combine multiple columns into one struct column.
# Create a Struct column from existing columns
df_struct = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"city": ["NYC", "LA", "Chicago"]
}).select([
pl.struct(["name", "age", "city"]).alias("person")
])
print(df_struct)
shape: (3, 1)
┌─────────────────────────────┐
│ person │
│ --- │
│ struct[3] │
╞═════════════════════════════╡
│ {"Alice",25,"NYC"} │
│ {"Bob",30,"LA"} │
│ {"Charlie",35,"Chicago"} │
└─────────────────────────────┘
The output shows a struct with three fields: name, age, and city.
Accessing Elements in Lists
You can access elements by index using the list namespace. The list.get() method retrieves an element at a specific position.
Remember that indexing starts at zero. Negative indices work from the end.
# Get the first score for each student
df = df.with_columns(
pl.col("scores").list.get(0).alias("first_score")
)
print(df)
shape: (3, 3)
┌─────┬──────────────────┬─────────────┐
│ id ┆ scores ┆ first_score │
│ --- ┆ --- ┆ --- │
│ i64 ┆ list[i64] ┆ i64 │
╞═════╪══════════════════╪═════════════╡
│ 1 ┆ [85, 90, 78] ┆ 85 │
│ 2 ┆ [92, 88] ┆ 92 │
│ 3 ┆ [70, 75, 80, 95] ┆ 70 │
└─────┴──────────────────┴─────────────┘
You can also explode a list column. This creates one row per list element. It is useful for analysis.
# Explode the scores column
df_exploded = df.explode("scores")
print(df_exploded)
shape: (10, 3)
┌─────┬───────────┬─────────────┐
│ id ┆ scores ┆ first_score │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═══════════╪═════════════╡
│ 1 ┆ 85 ┆ 85 │
│ 1 ┆ 90 ┆ 85 │
│ 1 ┆ 78 ┆ 85 │
│ 2 ┆ 92 ┆ 92 │
│ 2 ┆ 88 ┆ 92 │
│ 3 ┆ 70 ┆ 70 │
│ 3 ┆ 75 ┆ 70 │
│ 3 ┆ 80 ┆ 70 │
│ 3 ┆ 95 ┆ 70 │
└─────┴───────────┴─────────────┘
Working with Struct Fields
Access struct fields with the struct namespace. Use struct.field() to extract a single field.
# Extract the name field from the struct
df_struct = df_struct.with_columns(
pl.col("person").struct.field("name").alias("extracted_name")
)
print(df_struct)
shape: (3, 2)
┌─────────────────────────────┬────────────────┐
│ person ┆ extracted_name │
│ --- ┆ --- │
│ struct[3] ┆ str │
╞═════════════════════════════╪════════════════╡
│ {"Alice",25,"NYC"} ┆ Alice │
│ {"Bob",30,"LA"} ┆ Bob │
│ {"Charlie",35,"Chicago"} ┆ Charlie │
└─────────────────────────────┴────────────────┘
You can also rename fields or add new ones using struct.rename_fields() and struct.with_fields().
Combining Lists and Structs
You can nest structs inside lists and vice versa. This mirrors real-world JSON data perfectly.
# DataFrame with a list of structs
df_nested = pl.DataFrame({
"id": [1, 2],
"orders": [
[{"item": "book", "qty": 2}, {"item": "pen", "qty": 5}],
[{"item": "notebook", "qty": 1}]
]
})
print(df_nested)
shape: (2, 2)
┌─────┬─────────────────────────────────────┐
│ id ┆ orders │
│ --- ┆ --- │
│ i64 ┆ list[struct[2]] │
╞═════╪═════════════════════════════════════╡
│ 1 ┆ [{"book",2}, {"pen",5}] │
│ 2 ┆ [{"notebook",1}] │
└─────┴─────────────────────────────────────┘
To access nested data, chain expressions. First explode the list, then extract the struct field.
# Explode orders and extract item names
df_items = df_nested.explode("orders").with_columns(
pl.col("orders").struct.field("item").alias("item_name")
)
print(df_items)
shape: (3, 3)
┌─────┬──────────────────────┬───────────┐
│ id ┆ orders ┆ item_name │
│ --- ┆ --- ┆ --- │
│ i64 ┆ struct[2] ┆ str │
╞═════╪══════════════════════╪═══════════╡
│ 1 ┆ {"book",2} ┆ book │
│ 1 ┆ {"pen",5} ┆ pen │
│ 2 ┆ {"notebook",1} ┆ notebook │
└─────┴──────────────────────┴───────────┘
Aggregating into Lists and Structs
You can create nested columns from flat data. Use list() in aggregation to group values into a list.
# Group by id and collect scores into a list
df_flat = pl.DataFrame({
"id": [1, 1, 2, 2],
"score": [85, 90, 92, 88]
})
df_grouped = df_flat.group_by("id").agg(
pl.col("score").alias("all_scores")
)
print(df_grouped)
shape: (2, 2)
┌─────┬────────────┐
│ id ┆ all_scores │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪════════════╡
│ 1 ┆ [85, 90] │
│ 2 ┆ [92, 88] │
└─────┴────────────┘
To create a struct from grouped data, use pl.struct() inside the aggregation.
# Create struct from aggregated columns
df_struct_grouped = df_flat.group_by("id").agg(
pl.struct([
pl.col("score").mean().alias("avg_score"),
pl.col("score").count().alias("count")
]).alias("stats")
)
print(df_struct_grouped)
shape: (2, 2)
┌─────┬─────────────────────┐
│ id ┆ stats │
│ --- ┆ --- │
│ i64 ┆ struct[2] │
╞═════╪═════════════════════╡
│ 1 ┆ {87.5,2} │
│ 2 ┆ {90.0,2} │
└─────┴─────────────────────┘
Querying Nested Data with Conditions
You can filter rows based on nested values. Use list.eval() for complex conditions inside lists.
# Find rows where any score is above 90
df_filtered = df.filter(
pl.col("scores").list.eval(
pl.element() > 90
).list.any()
)
print(df_filtered)
shape: (2, 2)
┌─────┬──────────────────┐
│ id ┆ scores │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪══════════════════╡
│ 2 ┆ [92, 88] │
│ 3 ┆ [70, 75, 80, 95] │
└─────┴──────────────────┘
For structs, use struct.field() inside a filter condition.
# Filter struct rows where age is greater than 28
df_filtered_struct = df_struct.filter(
pl.col("person").struct.field("age") > 28
)
print(df_filtered_struct)
shape: (2, 2)
┌─────────────────────────────┬────────────────┐
│ person ┆ extracted_name │
│ --- ┆ --- │
│ struct[3] ┆ str │
╞═════════════════════════════╪════════════════╡
│ {"Bob",30,"LA"} ┆ Bob │
│ {"Charlie",35,"Chicago"} ┆ Charlie │
└─────────────────────────────┴────────────────┘
Performance Tips for Nested Data
Polars is optimized for nested data. However, there are best practices to follow.
Avoid exploding large lists unnecessarily. Exploding creates many rows. Use list.eval() to work inside lists without exploding.
Use structs to group related fields. This keeps your schema organized. It also improves cache locality.
For large datasets, consider using the lazy API. It optimizes the query plan. You can learn more about optimization in our guide on Polars LazyFrame Query Optimization.
When dealing with JSON files, Polars can infer nested schemas automatically. This saves time and reduces errors.
If you need to reshape nested data, techniques like pivoting can help. Check out our article on Reshape Data in Polars: Pivot, Melt & Transpose for more details.
Real-World Use Case: Parsing JSON Logs
Imagine you have a JSON log file with nested events. Polars can read it directly and maintain the structure.
# Simulate reading a JSON file with nested data
import io
json_data = """
[
{"user": "Alice", "events": [{"type": "click", "time": 1}, {"type": "scroll", "time": 3}]},
{"user": "Bob", "events": [{"type": "click", "time": 2}]}
]
"""
df_logs = pl.read_json(io.StringIO(json_data))
print(df_logs)
shape: (2, 2)
┌───────┬─────────────────────────────────────┐
│ user ┆ events │
│ --- ┆ --- │
│ str ┆ list[struct[2]] │
╞═══════╪═════════════════════════════════════╡
│ Alice ┆ [{"click",1}, {"scroll",3}] │
│ Bob ┆ [{"click",2}] │
└───────┴─────────────────────────────────────┘
You can then analyze the events without flattening the entire dataset. Use expressions to count event types per user.
# Count click events per user
df_clicks = df_logs.with_columns(
pl.col("events").list.eval(
pl.element().struct.field("type") == "click"
).list.sum().alias("click_count")
)
print(df_clicks)
shape: (2, 3)
┌───────┬─────────────────────────────────────┬─────────────┐
│ user ┆ events ┆ click_count │
│ --- ┆ --- ┆ --- │
│ str ┆ list[struct[2]] ┆ u32 │
╞═══════╪═════════════════════════════════════╪═════════════╡
│ Alice ┆ [{"click",1}, {"scroll",3}] ┆ 1 │
│ Bob ┆ [{"click",2}] ┆ 1 │
└───────┴─────────────────────────────────────┴─────────────┘
Conclusion
Working with nested data in Polars is powerful and intuitive. The List and Struct types let you model complex data without losing structure.
You learned how to create, access, and transform nested columns. You also saw how to aggregate flat data into nested forms and query them efficiently.
Nested data is essential for modern data processing. Polars gives you the tools to handle it with speed and clarity. For more advanced patterns, explore chaining expressions in our Polars Chaining Expressions Guide.
Start using nested data in your next project. Your code will be cleaner and your analysis more insightful.