Last modified: Feb 15, 2023 By Alexander Williams
Understand How to Use the attribute in Beautifulsoup Python
In this tutorial, we're going to cover how to use the attribute in Beautifulsoup.
Beautifulsoup: Find all by attribute
To find by attribute, you need to follow this syntax.
syntax:
soup.find_all(attrs={"attribute" : "value"})
let's see examples.
In the following example, we'll find all elements that have "setting-up-django-sitemaps" in the href attribute.
from bs4 import BeautifulSoup
# Html source
html_source = '''
<div>
<a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>
</div>
'''
# Parsing
soup = BeautifulSoup(html_source, 'html.parser')
# Find by href attribute
els = soup.find_all(attrs={"href" : "setting-up-django-sitemaps"})
# Print Output
print(els)
output:
[<a href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>]
In this example, we'll find all elements that have POST in the method attribute.
from bs4 import BeautifulSoup
# Html source
html_source = '''
<div>
<a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>
</div>
<form method="POST">
<input name="username">
</form>
'''
# Parsing
soup = BeautifulSoup(html_source, 'html.parser')
#Find by method attribute
els = soup.find_all(attrs={"method" : "POST"})
print(els)
output:
[<form method="POST"> <input name="username"/> </form>]
Beautifulsoup: get the attribute of an element
Syntax
element.attrs
To get all attributes of an element, you need to follow this code:
from bs4 import BeautifulSoup
# Html source
html = """
<div>
<h2 class="recent" id="attribute">Recent Posts:</h2>
</div>
"""
# Parse
soup = BeautifulSoup(html, 'html.parser')
# Get h2 tag
h2 = soup.h2
# Print h2 attribute
print(h2.attrs)
Output:
{'class': ['recent'], 'id': 'attribute'}
Beautifulsoup: Get the attribute value of an element
syntax:
element['attribute name']
Let's see how to get the attribute class.
# Html source
html_source = '''
<div class="rightSideBarParent">
<ul class="leftBarList">
<li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
</ul>
</div>
'''
# Find all <ul> elements
els = soup.find_all("ul")
for el in els:
# Print Class attribute
print(el['class'])
Let me explain:
- Find all by ul tag.
- Iterate over the result.
- Get the class value of each element.
output:
['leftBarList']
In the below example, we'll get the value of thehref attribute.
# Html source
html_source = '''
<div class="rightSideBarParent">
<ul class="leftBarList">
<li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
</ul>
</div>
'''
# Find all <a> elements
els = soup.find_all("a")
for el in els:
print(el['href'])
output:
setting-up-django-sitemaps
Beautifulsoup: Find all by multiple attributes
syntax:
attrs={"attribute":"value", "attribute":"value",...}
Let say we want to find all elements that have "setting-up-django-sitemaps" in the href attribute and "link" in the id.
# Html source
html_source = '''
<div class="leftSideBar">
<ul class="leftBarList">
<li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
</ul>
</div>
'''
# Parsing
soup = BeautifulSoup(html_source, 'html.parser')
# Find by href and id attribute
els = soup.find_all(attrs={"href":"setting-up-django-sitemaps", "id":"link"})
# Print result
print(els)
output:
[<a href="setting-up-django-sitemaps" id="link">How to Create Django Sitemaps</a>]
Beautifulsoup: Check if an attribute exists
syntax:
has_attr('some_attribute')
Return True if the attribute exists, otherwise False.
example:
from bs4 import BeautifulSoup
html_source = '''
<div class="rightSideBarParent">
<a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>
</div>
'''
soup = BeautifulSoup(html_source, 'html.parser')
# Find <a> element
el = soup.find("a")
#check href attribute
print(el.has_attr('href'))
#check name attribute
print(el.has_attr('name'))
output:
True False
Beautifulsoup: Find attribute contains a number
in this last part of this tutorial, we'll find elements that contain a number in the id attribute value.
To do this, we need to use Regex with Beautifulsoup.
example:
from bs4 import BeautifulSoup
import re
# Html source
html_source = '''
<div class="d-flex justify-content-center mt-2" id="ads3">
<div id="ad"> <p>good</p> </div>
</div>
<form method="POST">
<input name="username">
</form>
'''
# Parsing
soup = BeautifulSoup(html_source, 'html.parser')
# Find all elements contain number on id
els = soup.find_all(id=re.compile("\d"))
print(els)
"\d": Matches any decimal digit. Equivalent to [0-9].
output:
<div class="d-flex justify-content-center mt-2" id="ads3"> <div id="ad"> <p>good</p> </div> </div>