Last modified: Jan 10, 2023 By Alexander Williams
Beautifulsoup: How to Get Text Inside Tag or Tags
There are many ways to get the text inside a tag in BeautifulSoup. In this article, we'll explore some of the most common ways to:
- get the text inside the tag
- get the text between tags
Get Text inside Tag
The common way to get the text inside a tag is to use the .string property. Let's see how to use it with examples:
Get text inside the <p> tag
from bs4 import BeautifulSoup # 👉️ Import BeautifulSoup module
# 👇 HTML Source
my_html = '''
<p>Hello P</p>
'''
soup = BeautifulSoup(my_html, 'html.parser') # 👉️ Parsing
el = soup.find("p") # 👉️ Find <p>
el_text = el.string # 👉️ Get Text Inside <p>
print(el_text)
Output:
Hello P
Get text inside Div tag
from bs4 import BeautifulSoup # 👉️ Import BeautifulSoup module
# 👇 HTML Source
my_html = '''
<div>Hello Div</div>
'''
soup = BeautifulSoup(my_html, 'html.parser') # 👉️ Parsing
el = soup.find("div") # 👉️ Find <div>
el_text = el.string # 👉️ Get Text Inside <div>
print(el_text)
Output:
Hello Div
Get text inside Span tag
from bs4 import BeautifulSoup # 👉️ Import BeautifulSoup module
# 👇 HTML Source
my_html = '''
<p>Hello P <span>I'm Span</span></p>
'''
soup = BeautifulSoup(my_html, 'html.parser') # 👉️ Parsing
el = soup.find("span") # 👉️ Find <span>
el_text = el.string # 👉️ Get Text Inside <span>
print(el_text)
Output:
I'm Span
If you want to get text inside <p> and span tags, you'll find it in the next part of the tutorial.
Get Text Between tags
To get Text between multiple tags, let's see some examples:
text between multi <p> tags
from bs4 import BeautifulSoup # 👉️ Import BeautifulSoup module
# 👇 HTML Source
my_html = '''
<div>
<p>Hello P1</p>
<p>Hello P2</p>
<p>Hello P3</p>
<p>Hello P4</p>
</div>
'''
soup = BeautifulSoup(my_html, 'html.parser') # 👉️ Parsing
els = soup.find_all("p") # 👉️ Find All <p>
# 👇 Get Text Inside <p> Tags
for el in els:
print(el.string)
We've used the find_all() method to find all <p> tags. Then we've looped over the <p> tags. Finally, we used .string to get a tag from each tag.
Output:
Hello P1
Hello P2
Hello P3
Hello P4
Get text between <br tags
When you try to get content inside a tag that has another tag, you will get a None response. For example:
from bs4 import BeautifulSoup # 👉️ Import BeautifulSoup module
# 👇 HTML Source
my_html = '''
<div>
<p>Hello: <br> P1 <br> P2</p>
</div>
'''
soup = BeautifulSoup(my_html, 'html.parser') # 👉️ Parsing
el = soup.find("p") # 👉️ Find <p>
print(el.string) # 👉️ Get Text
Output:
None
To Solve the problem, we should use the .strings property like the following example.
rom bs4 import BeautifulSoup # 👉️ Import BeautifulSoup module
# 👇 HTML Source
my_html = '''
<div>
<p>Hello: <br> P1 <br> P2</p>
</div>
'''
soup = BeautifulSoup(my_html, 'html.parser') # 👉️ Parsing
el = soup.find("p") # 👉️ Find <p>
text = " ".join(el.strings) # 👉️ Get Text Inside <p> & <br>
print(text)
Output:
Hello: P1 P2
We need to use the strip() function to remove the multi-space from the response.
text = " ".join(el.strip() for el in el.strings) # 👉️ Get Text Between <p> & <br>
print(text)
Output:
Hello: P1 P2
Get text between <p>, <span> tags
from bs4 import BeautifulSoup # 👉️ Import BeautifulSoup module
# 👇 HTML Source
my_html = '''
<p>Hello P <span>I'm Span</span></p>
'''
soup = BeautifulSoup(my_html, 'html.parser') # 👉️ Parsing
el = soup.find("p") # 👉️ Find <span>
text = " ".join(el.strip() for el in el.strings) # 👉️ Get Text Between <p> and <span>
print(text)
Output:
Hello P I'm Span