Last modified: Nov 08, 2024 By Alexander Williams

Python re.findall: Extract All Pattern Matches from Text

Python's re.findall() is a powerful function from the re module that helps you find all non-overlapping matches of a pattern in a string, returning them as a list.

Basic Syntax and Usage

Before using re.findall(), you need to import the re module. The basic syntax is straightforward, taking a pattern and a string as arguments.


import re

text = "The dates are 2023-11-25 and 2024-01-15"
dates = re.findall(r'\d{4}-\d{2}-\d{2}', text)
print(dates)


['2023-11-25', '2024-01-15']

Finding All vs Finding First Match

While re.findall() returns all matches, you might want to use re.search when you need only the first match.

Working with Groups

When using groups in your pattern, re.findall() behaves differently. It returns a list of tuples if the pattern contains groups.


text = "Contact us: support@example.com, sales@example.com"
emails = re.findall(r'(\w+)@(\w+)\.com', text)
print(emails)


[('support', 'example'), ('sales', 'example')]

Common Use Cases

Here are some practical examples where re.findall proves particularly useful:


# Finding all words
text = "Python is awesome! Python is powerful."
words = re.findall(r'\w+', text)
print(words)

# Extracting phone numbers
text = "Call us: 123-456-7890 or 098-765-4321"
phones = re.findall(r'\d{3}-\d{3}-\d{4}', text)
print(phones)


['Python', 'is', 'awesome', 'Python', 'is', 'powerful']
['123-456-7890', '098-765-4321']

Case Sensitivity

You can make pattern matching case-insensitive using the re.IGNORECASE flag. This is particularly useful when searching for text patterns.


text = "Python PYTHON python"
matches = re.findall(r'python', text, re.IGNORECASE)
print(matches)


['Python', 'PYTHON', 'python']

Error Handling

When using re.findall(), it's important to handle potential errors, especially with invalid patterns. Unlike re.match, findall returns an empty list if no matches are found.


try:
    result = re.findall(r'[', "test")  # Invalid pattern
except re.error as e:
    print(f"Invalid pattern: {e}")

Conclusion

re.findall() is an essential tool for pattern matching in Python. It's particularly useful when you need to extract multiple matches from text.

Remember to use appropriate patterns and consider using flags when needed. For complex patterns, always test with sample data first to ensure correct matching.