Understand How to Use the attribute in Beautifulsoup Python
- Last modified: 08 December 2020
- Category: python libraries
In this tutorial, we're going to cover how to use the attribute in Beautifulsoup.
Contents
1. Beautifulsoup: Find all by attribute
To find by attribute, you need to follow this syntax.
syntax:
soup.find_all(attrs={"attribute" : "value"})
let's code some examples.
example #1:
from bs4 import BeautifulSoup html_source = ''' <div class="rightSideBarParent"> <div class="leftSideBar"> <ul class="leftBarList"> <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li> </ul> </div> </div> <div class="d-flex justify-content-center mt-2" id="ads3"> <div id="ad"> <p>good</p> </div> </div> <form method="POST"> <input name="username"> </form> ''' soup = BeautifulSoup(html_source, 'html.parser') els = soup.find_all(attrs={"href" : "setting-up-django-sitemaps"}) print(els)
In the above example, we tried to find all elements that have "setting-up-django-sitemaps" in the href attribute.
output:
[<a href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>]
Example #2:
In this example, we'll find all elements that have POST in the method attribute.
els = soup.find_all(attrs={"method" : "POST"}) print(els)
output:
[<form method="POST"> <input name="username"/> </form>]
2. Beautifulsoup: Get the attribute value of an element
syntax:
element['attribute name']
example #1:
html_source = ''' <div class="rightSideBarParent"> <div class="leftSideBar"> <ul class="leftBarList"> <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li> </ul> </div> </div> <div class="d-flex justify-content-center mt-2" id="ads3"> <div id="ad"> <p>good</p> </div> </div> <form method="POST"> <input name="username"> </form> ''' els = soup.find_all("ul") for el in els: print(el['class'])
Let me explain:
1. Find all by ul tag.
2. Iterate over the result.
3. Get the class value of each element.
output:
['leftBarList']
Example #2:
In the following example, well get the href attribute value.
els = soup.find_all("a") for el in els: print(el['href'])
output:
setting-up-django-sitemaps
3. Beautifulsoup: Find all by multiple attributes
syntax:
attrs={"attribute":"value", "attribute":"value",...}
example:
Let say we want to find all elements that have "setting-up-django-sitemaps" in the href attribute and "link" in the Id.
html_source = ''' <div class="rightSideBarParent"> <div class="leftSideBar"> <ul class="leftBarList"> <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li> </ul> </div> </div> <div class="d-flex justify-content-center mt-2" id="ads3"> <div id="ad"> <p>good</p> </div> </div> <form method="POST"> <input name="username"> </form> ''' soup = BeautifulSoup(html_source, 'html.parser') els = soup.find_all(attrs={"href":"setting-up-django-sitemaps", "id":"link"}) print(els)
output:
[<a href="setting-up-django-sitemaps" id="link">How to Create Django Sitemaps</a>]
4. Beautifulsoup: Check if an attribute exists
In this part of the tutorial, we'll learn how to check an element attribute is exists.
syntax:
has_attr('some_attribute')
In the following example, we'll check if <a> attribute href and name exist.
If so, it will return True, and if not, it will return False.
example:
html_source = ''' <div class="rightSideBarParent"> <div class="leftSideBar"> <ul class="leftBarList"> <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li> </ul> </div> </div> <div class="d-flex justify-content-center mt-2" id="ads3"> <div id="ad"> <p>good</p> </div> </div> <form method="POST"> <input name="username"> </form> ''' soup = BeautifulSoup(html_source, 'html.parser') els = soup.find_all("a") for el in els: #check href attribute print(el.has_attr('href')) #check name attribute print(el.has_attr('name'))
output:
True False
5. Beautifulsoup: Find attribute contains a number
in this last part of this tutorial, we'll find elements that contain a number in the id attribute value.
To do this, we need to use Regex with Beautifulsoup.
example:
html_source = ''' <div class="rightSideBarParent"> <div class="leftSideBar"> <ul class="leftBarList"> <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li> </ul> </div> </div> <div class="d-flex justify-content-center mt-2" id="ads3"> <div id="ad"> <p>good</p> </div> </div> <form method="POST"> <input name="username"> </form> ''' soup = BeautifulSoup(html_source, 'html.parser') els = soup.find_all(id=re.compile("\d")) print(els)
"\d": Matches any decimal digit. Equivalent to [0-9].
output:
<div class="d-flex justify-content-center mt-2" id="ads3"> <div id="ad"> <p>good</p> </div> </div>
English today is not an art to be mastered it's just a tool to use to get a result