Last modified: May 10, 2026 By Alexander Williams
Polars String Operations with pl.Expr.str
Working with text data is a common task in data analysis. Polars provides a powerful and efficient way to handle string columns using the pl.Expr.str namespace. This article will guide you through the essential string operations in Polars. You will learn how to extract, replace, and transform text with clean and readable code.
Polars is built for speed. Its string operations are optimized for large datasets. The str namespace offers many methods. These methods are similar to Python's string methods but work on entire columns. This makes your code both fast and concise.
Getting Started with String Expressions
To use string operations, you first need a column with text data. The pl.Expr.str namespace is accessed via the str attribute on a column expression. Let's create a simple DataFrame to demonstrate.
import polars as pl
# Create a DataFrame with text data
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie", "David"],
"email": ["[email protected]", "[email protected]", "[email protected]", "[email protected]"]
})
print(df)
shape: (4, 2)
┌─────────┬──────────────────────┐
│ name ┆ email │
│ --- ┆ --- │
│ str ┆ str │
╞═════════╪══════════════════════╡
│ Alice ┆ [email protected] │
│ Bob ┆ [email protected] │
│ Charlie ┆ [email protected] │
│ David ┆ [email protected] │
└─────────┴──────────────────────┘
Now you can apply string methods. For example, to convert names to uppercase, use str.to_uppercase(). This method is part of the str namespace.
# Convert names to uppercase
result = df.with_columns(
pl.col("name").str.to_uppercase().alias("name_upper")
)
print(result)
shape: (4, 3)
┌─────────┬──────────────────────┬────────────┐
│ name ┆ email ┆ name_upper │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════╪══════════════════════╪════════════╡
│ Alice ┆ [email protected] ┆ ALICE │
│ Bob ┆ [email protected] ┆ BOB │
│ Charlie ┆ [email protected] ┆ CHARLIE │
│ David ┆ [email protected] ┆ DAVID │
└─────────┴──────────────────────┴────────────┘
Common String Transformations
Polars offers many common string transformations. These include case changes, stripping whitespace, and padding. Use str.to_lowercase() for lowercase. Use str.to_titlecase() for title case. For example, you can clean up messy text columns easily. This is useful when you need to standardize data before analysis.
Another useful method is str.strip_chars(). It removes specified characters from the start and end of strings. You can also use str.pad_start() or str.pad_end() to add padding. These operations are chainable, allowing you to build complex transformations in a single expression.
Extracting Substrings
Extracting parts of strings is a frequent need. The str.slice() method lets you extract substrings by position. You specify the start and length. For example, you might need the first three characters of a name.
# Extract first 3 characters of each name
result = df.with_columns(
pl.col("name").str.slice(0, 3).alias("name_short")
)
print(result)
shape: (4, 3)
┌─────────┬──────────────────────┬────────────┐
│ name ┆ email ┆ name_short │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════╪══════════════════════╪════════════╡
│ Alice ┆ [email protected] ┆ Ali │
│ Bob ┆ [email protected] ┆ Bob │
│ Charlie ┆ [email protected] ┆ Cha │
│ David ┆ [email protected] ┆ Dav │
└─────────┴──────────────────────┴────────────┘
For pattern-based extraction, use str.extract(). This method uses regular expressions. It returns the first match. This is powerful for parsing complex text like email domains.
# Extract domain from email using regex
result = df.with_columns(
pl.col("email").str.extract(r"@(.+)", 1).alias("domain")
)
print(result)
shape: (4, 3)
┌─────────┬──────────────────────┬──────────────┐
│ name ┆ email ┆ domain │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════╪══════════════════════╪══════════════╡
│ Alice ┆ [email protected] ┆ example.com │
│ Bob ┆ [email protected] ┆ test.org │
│ Charlie ┆ [email protected] ┆ data.io │
│ David ┆ [email protected] ┆ web.net │
└─────────┴──────────────────────┴──────────────┘
Replacing and Splitting Strings
Replacing parts of strings is simple with str.replace() and str.replace_all(). The first replaces the first occurrence. The second replaces all occurrences. Both support regex patterns. For example, you can replace dots in domains.
# Replace dots with dashes in domain
result = df.with_columns(
pl.col("email").str.replace_all(r"\.", "-").alias("email_clean")
)
print(result)
shape: (4, 3)
┌─────────┬──────────────────────┬──────────────────────┐
│ name ┆ email ┆ email_clean │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════╪══════════════════════╪══════════════════════╡
│ Alice ┆ [email protected] ┆ alice@example-com │
│ Bob ┆ [email protected] ┆ bob@test-org │
│ Charlie ┆ [email protected] ┆ charlie@data-io │
│ David ┆ [email protected] ┆ david@web-net │
└─────────┴──────────────────────┴──────────────────────┘
Splitting strings into lists is done with str.split(). This returns a list column. You can then explode it or extract specific elements. This is useful for parsing CSV-like data in a single column.
Checking String Conditions
Polars also provides methods to check string properties. Use str.contains() to see if a pattern exists. Use str.starts_with() and str.ends_with() for prefix and suffix checks. These return boolean columns, which are great for filtering rows.
# Filter rows where email ends with .org
result = df.filter(
pl.col("email").str.ends_with(".org")
)
print(result)
shape: (1, 2)
┌──────┬────────────────┐
│ name ┆ email │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪════════════════╡
│ Bob ┆ [email protected] │
└──────┴────────────────┘
These condition methods are often combined with other expressions. For example, you might want to filter rows based on a string pattern and then transform the result. This is a common pattern in data cleaning pipelines.
Handling Null Values in String Operations
Real-world data often has missing values. Polars handles nulls gracefully in string operations. Most methods return null for null inputs. You can use str.fill_null() to replace nulls with a default string. For more details, see our guide on Handling Null & Missing Values in Polars.
# Create DataFrame with nulls
df_null = pl.DataFrame({
"text": ["hello", None, "world", None]
})
# Fill nulls with default
result = df_null.with_columns(
pl.col("text").str.fill_null("missing")
)
print(result)
shape: (4, 1)
┌─────────┐
│ text │
│ --- │
│ str │
╞═════════╡
│ hello │
│ missing │
│ world │
│ missing │
└─────────┘
You can also chain null handling with other string methods. This ensures your transformations are robust. For more on schema management, see Polars Data Type Casting & Schema Management.
Performance Tips
Polars string operations are fast because they are implemented in Rust. However, you can optimize further. Avoid using Python lambdas inside map_elements() when possible. Instead, use the built-in str methods. They are vectorized and much faster.
Another tip is to use regular expressions carefully. Complex patterns can slow down processing. For simple operations like prefix checks, use str.starts_with() instead of regex. This improves performance on large datasets.
Conclusion
String operations in Polars with the pl.Expr.str namespace are both powerful and easy to use. You can transform, extract, replace, and check text data efficiently. The methods are chainable and integrate well with the rest of the Polars expression system. This makes your code clean and fast.
We covered key methods like str.to_uppercase(), str.extract(), str.replace_all(), and str.contains(). You also learned how to handle null values and optimize performance. For more on Polars expressions, check out Polars Expression System Core Concept.
Start using these string operations in your next project. They will save you time and make your data pipelines more efficient. Happy coding with Polars!