Last modified: Jan 10, 2023 By Alexander Williams
BeautifulSoup: Extract the Contents of Element
Beautiful soup has the .contents property that you can use to extract the contents of an element.
Extract contents of an element
Get all contents of div:
from bs4 import BeautifulSoup
html = '''
<div>
<span class="span" aria-label="4 people reacted to this post"<click</span>
<a href="url.com">Link</a>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
#Find Div
c = soup.find('div')
#print Div's content
print(c.contents)
Output:
['\n', <div><p>hello</p></div>, '\n', <span aria-label="4 people reacted to this post" class="span">click</span>, '\n', <a href="url.com">Link</a>, '\n']
Print element one by one:
for e in c.contents:
print(e)
Output:
<div><p>hello</p></div> <span aria-label="4 people reacted to this post" class="span">click</span> <a href="url.com">Link</a>
Check if the tag's name is <a>:
for e in c.contents:
if e.name == "a":
print(e)
Output:
<a href="url.com"<Link</a>