Last modified: October 25, 2021

BeautifulSoup: How to find by text

BeautifulSoup provides many parameters to make our search more accurate and, one of them is string.

In this tutorial, we'll learn how to use string to find by text and, we'll also see how to use it with regex.

Find by text

Syntax:


string="your_text"

In the following example, we'll find the <P> tag and child 2 in the value.


from bs4 import BeautifulSoup

# Html source
html_source = '''
<div>
<p>child 1</p>
<p>child 2</p>
<p>child 3</p>
<p></p>
</div>
'''

# Parsing
soup = BeautifulSoup(html_source, 'html.parser')

# find <p> with child 2 value
el = soup.find("p", string="child 2")

print(el)

Output:

<p>child 2</p>

You can use a list:


# Find all <p> with multiple values
els = soup.find_all("p", string=["child 2", "child 3"])

print(els)

Output:

[<p>child 2</p>, <p>child 3</p>]

You can also use Booleans.

True: no-empty value

False: empty value

let's see an example:


# Find all <p> with value
els = soup.find_all("p", string=True)

print(els)

Output:

[<p>child 1</p>, <p>child 2</p>, <p>child 3</p>]

Using regex with string

BeautifulSoup allows us to use regex with the string parameter, and in this example, we'll find all <p> tags that contain a number.

Syntax:


string=re.compile('regex_code')


Example:


import re

# Html source
html_source = '''
<div>
<p>child 1</p>
<p>child 2</p>
<p>child 3</p>
<p></p>
</div>
'''

# Parsing
soup = BeautifulSoup(html_source, 'html.parser')

els = soup.find_all("p", string=re.compile(r'\d'))

print(els)

the \d: Matches any letter, digit, or underscore.

Output:

[<p>child 1</p>, <p>child 2</p>, <p>child 3</p>]