Last modified: Nov 08, 2024 By Alexander Williams
Python re.findall: Extract All Pattern Matches from Text
Python's re.findall()
is a powerful function from the re module that helps you find all non-overlapping matches of a pattern in a string, returning them as a list.
Basic Syntax and Usage
Before using re.findall()
, you need to import the re module. The basic syntax is straightforward, taking a pattern and a string as arguments.
import re
text = "The dates are 2023-11-25 and 2024-01-15"
dates = re.findall(r'\d{4}-\d{2}-\d{2}', text)
print(dates)
['2023-11-25', '2024-01-15']
Finding All vs Finding First Match
While re.findall()
returns all matches, you might want to use re.search when you need only the first match.
Working with Groups
When using groups in your pattern, re.findall()
behaves differently. It returns a list of tuples if the pattern contains groups.
text = "Contact us: support@example.com, sales@example.com"
emails = re.findall(r'(\w+)@(\w+)\.com', text)
print(emails)
[('support', 'example'), ('sales', 'example')]
Common Use Cases
Here are some practical examples where re.findall proves particularly useful:
# Finding all words
text = "Python is awesome! Python is powerful."
words = re.findall(r'\w+', text)
print(words)
# Extracting phone numbers
text = "Call us: 123-456-7890 or 098-765-4321"
phones = re.findall(r'\d{3}-\d{3}-\d{4}', text)
print(phones)
['Python', 'is', 'awesome', 'Python', 'is', 'powerful']
['123-456-7890', '098-765-4321']
Case Sensitivity
You can make pattern matching case-insensitive using the re.IGNORECASE flag. This is particularly useful when searching for text patterns.
text = "Python PYTHON python"
matches = re.findall(r'python', text, re.IGNORECASE)
print(matches)
['Python', 'PYTHON', 'python']
Error Handling
When using re.findall()
, it's important to handle potential errors, especially with invalid patterns. Unlike re.match, findall returns an empty list if no matches are found.
try:
result = re.findall(r'[', "test") # Invalid pattern
except re.error as e:
print(f"Invalid pattern: {e}")
Conclusion
re.findall()
is an essential tool for pattern matching in Python. It's particularly useful when you need to extract multiple matches from text.
Remember to use appropriate patterns and consider using flags when needed. For complex patterns, always test with sample data first to ensure correct matching.