Last modified: Nov 08, 2024 By Alexander Williams

Python re.sub: Replace and Modify Text with Regular Expressions

The re.sub function in Python is a powerful tool for performing text substitutions using regular expressions. It allows you to search for patterns and replace them with new text in a single operation.

Basic Syntax and Usage

Before using re.sub, you'll need to import the re module. The basic syntax follows this pattern:


import re

text = "Hello 123 World 456"
result = re.sub(r'\d+', 'NUM', text)
print(result)


Hello NUM World NUM

Working with Patterns and Groups

Like re.search and re.findall, re.sub supports pattern groups that can be referenced in the replacement string.


text = "First Last"
result = re.sub(r'(\w+) (\w+)', r'\2, \1', text)
print(result)


Last, First

Using Functions for Replacement

One powerful feature of re.sub is its ability to use a function for replacement instead of a simple string.


def double_number(match):
    num = int(match.group())
    return str(num * 2)

text = "Numbers: 10, 20, 30"
result = re.sub(r'\d+', double_number, text)
print(result)


Numbers: 20, 40, 60

Count Parameter

You can limit the number of replacements using the count parameter. This is useful when you only want to replace a specific number of occurrences.


text = "replace replace replace replace"
result = re.sub('replace', 'new', text, count=2)
print(result)


new new replace replace

Case-Insensitive Substitution

Use the re.IGNORECASE flag for case-insensitive replacements. This is particularly useful when dealing with text in different cases.


text = "Python PYTHON python"
result = re.sub('python', 'Java', text, flags=re.IGNORECASE)
print(result)


Java Java Java

Common Applications

The re.sub function is commonly used alongside re.split and re.finditer for text processing tasks.

Error Handling

Always validate your patterns before using them in production. Invalid patterns will raise re.error exceptions that should be handled appropriately.


try:
    result = re.sub('[invalid)', 'new', text)
except re.error as e:
    print(f"Invalid pattern: {e}")

Conclusion

re.sub is an essential tool for text manipulation in Python. Whether you're cleaning data, formatting strings, or performing complex text transformations, it provides the flexibility needed for various tasks.