Last modified: Nov 08, 2024 By Alexander Williams
Python re.sub: Replace and Modify Text with Regular Expressions
The re.sub
function in Python is a powerful tool for performing text substitutions using regular expressions. It allows you to search for patterns and replace them with new text in a single operation.
Basic Syntax and Usage
Before using re.sub
, you'll need to import the re module. The basic syntax follows this pattern:
import re
text = "Hello 123 World 456"
result = re.sub(r'\d+', 'NUM', text)
print(result)
Hello NUM World NUM
Working with Patterns and Groups
Like re.search and re.findall, re.sub
supports pattern groups that can be referenced in the replacement string.
text = "First Last"
result = re.sub(r'(\w+) (\w+)', r'\2, \1', text)
print(result)
Last, First
Using Functions for Replacement
One powerful feature of re.sub
is its ability to use a function for replacement instead of a simple string.
def double_number(match):
num = int(match.group())
return str(num * 2)
text = "Numbers: 10, 20, 30"
result = re.sub(r'\d+', double_number, text)
print(result)
Numbers: 20, 40, 60
Count Parameter
You can limit the number of replacements using the count parameter. This is useful when you only want to replace a specific number of occurrences.
text = "replace replace replace replace"
result = re.sub('replace', 'new', text, count=2)
print(result)
new new replace replace
Case-Insensitive Substitution
Use the re.IGNORECASE flag for case-insensitive replacements. This is particularly useful when dealing with text in different cases.
text = "Python PYTHON python"
result = re.sub('python', 'Java', text, flags=re.IGNORECASE)
print(result)
Java Java Java
Common Applications
The re.sub
function is commonly used alongside re.split and re.finditer for text processing tasks.
Error Handling
Always validate your patterns before using them in production. Invalid patterns will raise re.error exceptions that should be handled appropriately.
try:
result = re.sub('[invalid)', 'new', text)
except re.error as e:
print(f"Invalid pattern: {e}")
Conclusion
re.sub
is an essential tool for text manipulation in Python. Whether you're cleaning data, formatting strings, or performing complex text transformations, it provides the flexibility needed for various tasks.