Beautifulsoup: How to Get Text Inside Tag or Tags

Beautifulsoup: How to Get Text Inside Tag or Tags

There are many ways to get the text inside a tag in BeautifulSoup. In this article, we'll explore some of the most common ways to:

  • get the text inside the tag
  • get the text between tags

Get Text inside Tag

The common way to get the text inside a tag is to use the .string property. Let's see how to use it with examples:

Get text inside the <p> tag

from bs4 import BeautifulSoup # 👉️ Import BeautifulSoup module

# 👇 HTML Source
my_html = '''
<p>Hello P</p>
'''

soup = BeautifulSoup(my_html, 'html.parser') # 👉️ Parsing

el = soup.find("p") # 👉️ Find <p>

el_text = el.string # 👉️ Get Text Inside <p>

print(el_text)

Output:

Hello P

Get text inside Div tag

from bs4 import BeautifulSoup # 👉️ Import BeautifulSoup module

# 👇 HTML Source
my_html = '''
<div>Hello Div</div>
'''

soup = BeautifulSoup(my_html, 'html.parser') # 👉️ Parsing

el = soup.find("div") # 👉️ Find <div>

el_text = el.string # 👉️ Get Text Inside <div>

print(el_text)

Output:

Hello Div

Get text inside Span tag

from bs4 import BeautifulSoup # 👉️ Import BeautifulSoup module

# 👇 HTML Source
my_html = '''
<p>Hello P <span>I'm Span</span></p>
'''

soup = BeautifulSoup(my_html, 'html.parser') # 👉️ Parsing

el = soup.find("span") # 👉️ Find <span>

el_text = el.string # 👉️ Get Text Inside <span>

print(el_text)

Output:

I'm Span

If you want to get text inside <p> and span tags, you'll find it in the next part of the tutorial.

Get Text Between tags

To get Text between multiple tags, let's see some examples:

text between multi <p> tags

from bs4 import BeautifulSoup # 👉️ Import BeautifulSoup module

# 👇 HTML Source
my_html = '''
<div>
    <p>Hello P1</p>
    <p>Hello P2</p>
    <p>Hello P3</p>
    <p>Hello P4</p>
</div>
'''

soup = BeautifulSoup(my_html, 'html.parser') # 👉️ Parsing

els = soup.find_all("p") # 👉️ Find All <p>

# 👇 Get Text Inside <p> Tags
for el in els:
    print(el.string)

We've used the find_all() method to find all <p> tags. Then we've looped over the <p> tags. Finally, we used .string to get a tag from each tag.

Output:

Hello P1
Hello P2
Hello P3
Hello P4

Get text between <br tags

When you try to get content inside a tag that has another tag, you will get a None response. For example:

from bs4 import BeautifulSoup # 👉️ Import BeautifulSoup module

# 👇 HTML Source
my_html = '''
<div>
    <p>Hello: <br> P1  <br> P2</p>
</div>
'''

soup = BeautifulSoup(my_html, 'html.parser') # 👉️ Parsing

el = soup.find("p") # 👉️ Find <p>

print(el.string) # 👉️ Get Text

Output:

None

To Solve the problem, we should use the .strings property like the following example.

rom bs4 import BeautifulSoup # 👉️ Import BeautifulSoup module

# 👇 HTML Source
my_html = '''
<div>
    <p>Hello: <br> P1  <br> P2</p>
</div>
'''

soup = BeautifulSoup(my_html, 'html.parser') # 👉️ Parsing

el = soup.find("p") # 👉️ Find <p>

text = " ".join(el.strings) # 👉️ Get Text Inside <p> & <br>

print(text)

Output:

Hello:   P1    P2

We need to use the strip() function to remove the multi-space from the response.

text = " ".join(el.strip() for el in el.strings)  # 👉️ Get Text Between <p> & <br>
print(text)

Output:

Hello: P1 P2

Get text between <p>, <span> tags

from bs4 import BeautifulSoup # 👉️ Import BeautifulSoup module

# 👇 HTML Source
my_html = '''
<p>Hello P <span>I'm Span</span></p>
'''

soup = BeautifulSoup(my_html, 'html.parser') # 👉️ Parsing

el = soup.find("p") # 👉️ Find <span>

text = " ".join(el.strip() for el in el.strings)  # 👉️ Get Text Between <p> and <span>

print(text)

Output:

Hello P I'm Span