Last modified: Feb 15, 2023 By Alexander Williams
Understand How to Use gettext() in Beautifulsoup
gettext() is a Beatifoulsoup method that uses to get all child strings concatenated using the given separator. In this tutorial, we will learn how to use gettext() with examples, and we'll also know the difference between gettext() and the .string property.
Let's get started.
gettext() Syntax
get_text(separator, strip)
Arguments:
- Separator : identify the delimiter to split.
- Strip : removes space at the beginning and the end.
Arguments by default:
- separator=u""
- strip=False
And all of these arguments are Optional
How to use gettext()
Let's see an example to understand how to use the get_text() method. In the following example, we'll get all child text of the <div> .
from bs4 import BeautifulSoup # ποΈ Import BeautifulSoup
# HTML source
html_source = '''
<div>
<p>child 1</p>
<p>child 2</p>
<p>child 3</p>
</div>
'''
soup = BeautifulSoup(html_source, 'html.parser') # ποΈ Parsing
el = soup.find("div") # ποΈ Find <div> TAG
g_txt = el.get_text() # ποΈ Get text of the <div>
print(g_txt) # ποΈ Print output
Output:
child 1
child 2
child 3
As you can see in the code, we've used get_text() with no arguments.
If you want to remove the newlines \n from the output, set strip=True in the parameter like the example below.
from bs4 import BeautifulSoup # ποΈ Import BeautifulSoup
# HTML source
html_source = '''
<div>
<p>child 1</p>
<p>child 2</p>
<p>child 3</p>
</div>
'''
soup = BeautifulSoup(html_source, 'html.parser') # ποΈ Parsing
el = soup.find("div") # ποΈ Find <div> TAG
g_txt = el.get_text(strip=True) # ποΈ Get Text of the <div> and Remove newline from the output
print(g_txt) # ποΈ Print output
Output:
child 1child 2child 3
To add space between strings, set separator parameters like the example below.
from bs4 import BeautifulSoup # ποΈ Import BeautifulSoup
# HTML source
html_source = '''
<div>
<p>child 1</p>
<p>child 2</p>
<p>child 3</p>
</div>
'''
soup = BeautifulSoup(html_source, 'html.parser') # ποΈ Parsing
el = soup.find("div") # ποΈ Find <div> TAG
g_txt = el.get_text(strip=True, separator=" ") # ποΈ Set separator an dstript
print(g_txt) # ποΈ Print output
Output:
child 1 child 2 child 3
Now, we'll split the response by \n and strip it.
from bs4 import BeautifulSoup # ποΈ Import BeautifulSoup
# HTML source
html_source = '''
<div>
<p>child 1</p>
<p>child 2</p>
<p>child 3</p>
</div>
'''
soup = BeautifulSoup(html_source, 'html.parser') # ποΈ Parsing
el = soup.find("div") # ποΈ find <div> TAG
g_txt = el.get_text(strip=True, separator="\n") # ποΈ Set separator and strip
print(g_txt) # ποΈ Print output
Output:
child 1
child 2
child 3
The difference between get_text() and .string
Let's see some examples to figure out the difference between the get_text() method and the .string property.
Example -1:
from bs4 import BeautifulSoup # ποΈ Import BeautifulSoup
# HTML source
html_source = '''
<div>
<p>child 1</p>
<p>child 2</p>
<p>child 3</p>
</div>
'''
soup = BeautifulSoup(html_source, 'html.parser') # ποΈ Parsing
el = soup.find("div") # ποΈ Find <div> TAG
print(el.get_text()) # ποΈ Get content of div using get_text()
print(el.string) # ποΈ Get Content of <div> using .string
Output of get_text() :
child 1
child 2
child 3
Output of .string :
None
As you can see, the get_text returns the text of div children instead of the .string property. That is because .string is used for getting the text of the given element. And the div tag have no text.
Example -2:
from bs4 import BeautifulSoup # ποΈ Import BeautifulSoup
# ποΈ HTML source
html_source = '''
<div></div>
'''
soup = BeautifulSoup(html_source, 'html.parser') # ποΈ Parsing
el = soup.find("div") # ποΈ Find <div> TAG
print(el.get_text()) # ποΈ Get Content of empty <div> using .string
print(el.string) # ποΈ Get content of empty <div> using .string
Output of get_text() :
Output of .string :
None
When we try to get the text of an empty tag:
- get_text() returns empty value
- .string returns None
Conclusion
To summarize this article, I'd like to say you should use the get_text() method to get all text inside an element.
For more articles about Beatifoulsoup, scroll down and happy learning </>