Understand How to Use the attribute in Beautifulsoup Python

In this tutorial, we're going to cover how to use the attribute in Beautifulsoup.

1. Beautifulsoup: Find all by attribute

To find by attribute, you need to follow this syntax.

syntax:

soup.find_all(attrs={"attribute" : "value"})

let's code some examples.

example #1:

from bs4 import BeautifulSoup

html_source = '''
    <div class="rightSideBarParent">
        <div class="leftSideBar">
            <ul class="leftBarList">
                <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
            </ul>
        </div>
    </div>
    
    <div class="d-flex justify-content-center mt-2" id="ads3">
           <div id="ad"> <p>good</p> </div>
    </div>

    <form method="POST">
      <input name="username">
    </form>
'''

soup = BeautifulSoup(html_source, 'html.parser')

els = soup.find_all(attrs={"href" : "setting-up-django-sitemaps"})

print(els)

In the above example, we tried to find all elements that have "setting-up-django-sitemaps" in the href attribute.

output:

[<a href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>]

Example #2:

In this example, we'll find all elements that have POST in the method attribute.

els = soup.find_all(attrs={"method" : "POST"})

print(els)

output:

[<form method="POST">
<input name="username"/>
</form>]

2. Beautifulsoup: Get the attribute value of an element

syntax:

element['attribute name']

example #1:

html_source = '''
    <div class="rightSideBarParent">
        <div class="leftSideBar">
            <ul class="leftBarList">
                <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
            </ul>
        </div>
    </div>
    
    <div class="d-flex justify-content-center mt-2" id="ads3">
           <div id="ad"> <p>good</p> </div>
    </div>

    <form method="POST">
      <input name="username">
    </form>
'''

els = soup.find_all("ul")
for el in els:
  print(el['class'])

Let me explain:

1. Find all by ul tag.

2. Iterate over the result.

3. Get the class value of each element.

output:

['leftBarList']

Example #2:

In the following example, well get the href attribute value.

els = soup.find_all("a")

for el in els:
  print(el['href'])

output:

setting-up-django-sitemaps

3. Beautifulsoup: Find all by multiple attributes

syntax:

attrs={"attribute":"value", "attribute":"value",...}

example:

Let say we want to find all elements that have "setting-up-django-sitemaps" in the href attribute and "link" in the Id.

html_source = '''
    <div class="rightSideBarParent">
        <div class="leftSideBar">
            <ul class="leftBarList">
                <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
            </ul>
        </div>
    </div>
    
    <div class="d-flex justify-content-center mt-2" id="ads3">
           <div id="ad"> <p>good</p> </div>
    </div>

    <form method="POST">
      <input name="username">
    </form>
'''

soup = BeautifulSoup(html_source, 'html.parser')

els = soup.find_all(attrs={"href":"setting-up-django-sitemaps", "id":"link"})

print(els)

output:

[<a href="setting-up-django-sitemaps" id="link">How to Create Django Sitemaps</a>]

4. Beautifulsoup: Check if an attribute exists

In this part of the tutorial, we'll learn how to check an element attribute is exists.

syntax:

has_attr('some_attribute')

In the following example, we'll check if <a> attribute href and name exist.
If so, it will return True, and if not, it will return False.

example:

html_source = '''
    <div class="rightSideBarParent">
        <div class="leftSideBar">
            <ul class="leftBarList">
                <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
            </ul>
        </div>
    </div>
    
    <div class="d-flex justify-content-center mt-2" id="ads3">
           <div id="ad"> <p>good</p> </div>
    </div>

    <form method="POST">
      <input name="username">
    </form>
'''

soup = BeautifulSoup(html_source, 'html.parser')

els = soup.find_all("a")

for el in els:
    #check href attribute
    print(el.has_attr('href'))
    #check name attribute
    print(el.has_attr('name'))

output:

True
False

5. Beautifulsoup: Find attribute contains a number

in this last part of this tutorial, we'll find elements that contain a number in the id attribute value.
To do this, we need to use Regex with Beautifulsoup.

example:

html_source = '''
    <div class="rightSideBarParent">
        <div class="leftSideBar">
            <ul class="leftBarList">
                <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
            </ul>
        </div>
    </div>
    
    <div class="d-flex justify-content-center mt-2" id="ads3">
           <div id="ad"> <p>good</p> </div>
    </div>

    <form method="POST">
      <input name="username">
    </form>
'''

soup = BeautifulSoup(html_source, 'html.parser')

els = soup.find_all(id=re.compile("\d"))
print(els)

"\d": Matches any decimal digit. Equivalent to [0-9].

output:

<div class="d-flex justify-content-center mt-2" id="ads3">
<div id="ad"> <p>good</p> </div>
</div>

English today is not an art to be mastered it's just a tool to use to get a result