Last modified: Jan 10, 2023 By Alexander Williams
BeautifulSoup: How to find by text
BeautifulSoup provides many parameters to make our search more accurate and, one of them is string.
In this tutorial, we'll learn how to use string to find by text and, we'll also see how to use it with regex.
Find by text
Syntax:
string="your_text"
In the following example, we'll find the <P> tag with child 2 in the value.
from bs4 import BeautifulSoup
# Html source
html_source = '''
<div>
<p>child 1</p>
<p>child 2</p>
<p>child 3</p>
<p></p>
</div>
'''
# Parsing
soup = BeautifulSoup(html_source, 'html.parser')
# find <p> with child 2 value
el = soup.find("p", string="child 2")
print(el)
Output:
<p>child 2</p>
You can use a list:
# Find all <p> with multiple values
els = soup.find_all("p", string=["child 2", "child 3"])
print(els)
Output:
[<p>child 2</p>, <p>child 3</p>]
You can also use Booleans.
True: no-empty value
False: empty value
let's see an example:
# Find all <p> with value
els = soup.find_all("p", string=True)
print(els)
Output:
[<p>child 1</p>, <p>child 2</p>, <p>child 3</p>]
Using regex with string
BeautifulSoup allows us to use regex with the string parameter, and in this example, we'll find all <p> tags that contain a number.
Syntax:
string=re.compile('regex_code')
Example:
import re
# Html source
html_source = '''
<div>
<p>child 1</p>
<p>child 2</p>
<p>child 3</p>
<p></p>
</div>
'''
# Parsing
soup = BeautifulSoup(html_source, 'html.parser')
els = soup.find_all("p", string=re.compile(r'\d'))
print(els)
the \d: Matches any letter, digit, or underscore.
Output:
[<p>child 1</p>, <p>child 2</p>, <p>child 3</p>]