Last modified: Feb 15, 2023 By Alexander Williams
Understand How to Use the attribute in Beautifulsoup Python
In this tutorial, we're going to cover how to use the attribute in Beautifulsoup.
Table Of Contents
Beautifulsoup: Find all by attribute
To find by attribute, you need to follow this syntax.
syntax:
soup.find_all(attrs={"attribute" : "value"})
let's see examples.
In the following example, we'll find all elements that have "setting-up-django-sitemaps" in the href attribute.
from bs4 import BeautifulSoup
# Html source
html_source = '''
<div>
<a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>
</div>
'''
# Parsing
soup = BeautifulSoup(html_source, 'html.parser')
# Find by href attribute
els = soup.find_all(attrs={"href" : "setting-up-django-sitemaps"})
# Print Output
print(els)
output:
[<a href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>]
In this example, we'll find all elements that have POST in the method attribute.
from bs4 import BeautifulSoup
# Html source
html_source = '''
<div>
<a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>
</div>
<form method="POST">
<input name="username">
</form>
'''
# Parsing
soup = BeautifulSoup(html_source, 'html.parser')
#Find by method attribute
els = soup.find_all(attrs={"method" : "POST"})
print(els)
output:
[<form method="POST"> <input name="username"/> </form>]
Beautifulsoup: get the attribute of an element
Syntax
element.attrs
To get all attributes of an element, you need to follow this code:
from bs4 import BeautifulSoup
# Html source
html = """
<div>
<h2 class="recent" id="attribute">Recent Posts:</h2>
</div>
"""
# Parse
soup = BeautifulSoup(html, 'html.parser')
# Get h2 tag
h2 = soup.h2
# Print h2 attribute
print(h2.attrs)
Output:
{'class': ['recent'], 'id': 'attribute'}
Beautifulsoup: Get the attribute value of an element
syntax:
element['attribute name']
Let's see how to get the attribute class.
# Html source
html_source = '''
<div class="rightSideBarParent">
<ul class="leftBarList">
<li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
</ul>
</div>
'''
# Find all <ul> elements
els = soup.find_all("ul")
for el in els:
# Print Class attribute
print(el['class'])
Let me explain:
- Find all by ul tag.
- Iterate over the result.
- Get the class value of each element.
output:
['leftBarList']
In the below example, we'll get the value of thehref attribute.
# Html source
html_source = '''
<div class="rightSideBarParent">
<ul class="leftBarList">
<li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
</ul>
</div>
'''
# Find all <a> elements
els = soup.find_all("a")
for el in els:
print(el['href'])
output:
setting-up-django-sitemaps
Beautifulsoup: Find all by multiple attributes
syntax:
attrs={"attribute":"value", "attribute":"value",...}
Let say we want to find all elements that have "setting-up-django-sitemaps" in the href attribute and "link" in the id.
# Html source
html_source = '''
<div class="leftSideBar">
<ul class="leftBarList">
<li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
</ul>
</div>
'''
# Parsing
soup = BeautifulSoup(html_source, 'html.parser')
# Find by href and id attribute
els = soup.find_all(attrs={"href":"setting-up-django-sitemaps", "id":"link"})
# Print result
print(els)
output:
[<a href="setting-up-django-sitemaps" id="link">How to Create Django Sitemaps</a>]
Beautifulsoup: Check if an attribute exists
syntax:
has_attr('some_attribute')
Return True if the attribute exists, otherwise False.
example:
from bs4 import BeautifulSoup
html_source = '''
<div class="rightSideBarParent">
<a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>
</div>
'''
soup = BeautifulSoup(html_source, 'html.parser')
# Find <a> element
el = soup.find("a")
#check href attribute
print(el.has_attr('href'))
#check name attribute
print(el.has_attr('name'))
output:
True False
Beautifulsoup: Find attribute contains a number
in this last part of this tutorial, we'll find elements that contain a number in the id attribute value.
To do this, we need to use Regex with Beautifulsoup.
example:
from bs4 import BeautifulSoup
import re
# Html source
html_source = '''
<div class="d-flex justify-content-center mt-2" id="ads3">
<div id="ad"> <p>good</p> </div>
</div>
<form method="POST">
<input name="username">
</form>
'''
# Parsing
soup = BeautifulSoup(html_source, 'html.parser')
# Find all elements contain number on id
els = soup.find_all(id=re.compile("\d"))
print(els)
"\d": Matches any decimal digit. Equivalent to [0-9].
output:
<div class="d-flex justify-content-center mt-2" id="ads3"> <div id="ad"> <p>good</p> </div> </div>