Last modified: Nov 08, 2024 By Alexander Williams
Python re.finditer: Iterate Over Pattern Matches in Text
Python's re.finditer
is a powerful function for finding all non-overlapping matches of a pattern in a string, returning an iterator of match objects instead of a list like re.findall.
Understanding re.finditer Basics
Unlike re.search which finds the first match, re.finditer
finds all matches and is memory-efficient for processing large text files.
Basic Syntax
Here's the basic syntax of re.finditer:
import re
text = "Python is awesome. Python is powerful."
pattern = r"Python"
matches = re.finditer(pattern, text)
Working with Match Objects
Each match object provides useful information about the match including position and groups:
import re
text = "Python is awesome. Python is powerful."
pattern = r"Python"
for match in re.finditer(pattern, text):
print(f"Match found at position {match.start()}-{match.end()}: {match.group()}")
Match found at position 0-6: Python
Match found at position 17-23: Python
Using Groups with re.finditer
You can use capturing groups to extract specific parts of matches:
import re
text = "Email: john@example.com, Contact: alice@email.com"
pattern = r"(\w+)@(\w+)\.com"
for match in re.finditer(pattern, text):
print(f"Full match: {match.group()}")
print(f"Username: {match.group(1)}")
print(f"Domain: {match.group(2)}\n")
Full match: john@example.com
Username: john
Domain: example
Full match: alice@email.com
Username: alice
Domain: email
Memory Efficiency
Memory efficiency is a key advantage of re.finditer
over re.findall
, especially when working with large text files.
Position Information
Unlike re.match, re.finditer
provides detailed position information through match objects:
import re
text = "Price: $10.99, Cost: $20.50"
pattern = r"\$\d+\.\d+"
for match in re.finditer(pattern, text):
span = match.span()
print(f"Found {match.group()} at positions {span}")
Found $10.99 at positions (7, 13)
Found $20.50 at positions (21, 27)
Conclusion
re.finditer
is ideal for iterative pattern matching in Python, offering memory efficiency and detailed match information through its iterator-based approach.
Use re.finditer
when you need to process matches one at a time or when working with large texts where memory usage is a concern.