Last modified: Feb 15, 2023 By Alexander Williams

Understand How to Use the attribute in Beautifulsoup Python

In this tutorial, we're going to cover how to use the attribute in Beautifulsoup.

Beautifulsoup: Find all by attribute

To find by attribute, you need to follow this syntax.

syntax:


soup.find_all(attrs={"attribute" : "value"})

let's see examples.

In the following example, we'll find all elements that have "setting-up-django-sitemaps" in the href attribute.


from bs4 import BeautifulSoup

# Html source
html_source = '''
    <div> 
        <a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a> 
    </div>
'''

# Parsing
soup = BeautifulSoup(html_source, 'html.parser')

# Find by href attribute
els = soup.find_all(attrs={"href" : "setting-up-django-sitemaps"})

# Print Output
print(els)

output:

[<a href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>]

 

 

In this example, we'll find all elements that have POST in the method attribute.

 

 


from bs4 import BeautifulSoup

# Html source
html_source = '''
    <div>
       <a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>
    </div>

    <form method="POST">
      <input name="username">
    </form>
'''

# Parsing
soup = BeautifulSoup(html_source, 'html.parser')

#Find by method attribute
els = soup.find_all(attrs={"method" : "POST"})

print(els)

output:

[<form method="POST">
<input name="username"/>
</form>]

Beautifulsoup: get the attribute of an element

Syntax

element.attrs

To get all attributes of an element, you need to follow this code:


from bs4 import BeautifulSoup

# Html source
html = """
<div>
<h2 class="recent" id="attribute">Recent Posts:</h2>
</div>
"""

# Parse
soup = BeautifulSoup(html, 'html.parser')

# Get h2 tag
h2 = soup.h2

# Print h2 attribute
print(h2.attrs)

Output:

{'class': ['recent'], 'id': 'attribute'}

Beautifulsoup: Get the attribute value of an element

syntax:


element['attribute name']

Let's see how to get the attribute class.


# Html source
html_source = '''
    <div class="rightSideBarParent">
            <ul class="leftBarList">
                <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
            </ul>
    </div>
'''

# Find all <ul> elements
els = soup.find_all("ul")

for el in els:
  # Print Class attribute
  print(el['class'])

Let me explain:

  1. Find all by ul tag.
  2. Iterate over the result.
  3. Get the class value of each element.

output:

['leftBarList']

 

In the below example, we'll get the value of thehref attribute.

 


# Html source
html_source = '''
    <div class="rightSideBarParent">
            <ul class="leftBarList">
                <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
            </ul>
    </div>
'''
# Find all <a> elements
els = soup.find_all("a")

for el in els:
  print(el['href'])

output:

setting-up-django-sitemaps

Beautifulsoup: Find all by multiple attributes

syntax:


attrs={"attribute":"value", "attribute":"value",...}

 

Let say we want to find all elements that have "setting-up-django-sitemaps" in the href attribute and "link" in the id.

 


# Html source
html_source = '''
    <div class="leftSideBar">
        <ul class="leftBarList">
            <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
        </ul>
    </div>
'''

# Parsing
soup = BeautifulSoup(html_source, 'html.parser')

# Find by href and id attribute
els = soup.find_all(attrs={"href":"setting-up-django-sitemaps", "id":"link"})

# Print result
print(els)

output:

[<a href="setting-up-django-sitemaps" id="link">How to Create Django Sitemaps</a>]

Beautifulsoup: Check if an attribute exists

syntax:


has_attr('some_attribute')

Return True if the attribute exists, otherwise False.

example:

from bs4 import BeautifulSoup

html_source = '''
    <div class="rightSideBarParent">
    <a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>
    </div>
'''

soup = BeautifulSoup(html_source, 'html.parser')

# Find <a> element
el = soup.find("a")


#check href attribute
print(el.has_attr('href'))

#check name attribute
print(el.has_attr('name'))

output:

True
False

Beautifulsoup: Find attribute contains a number

in this last part of this tutorial, we'll find elements that contain a number in the id attribute value.
To do this, we need to use Regex with Beautifulsoup.

example:


from bs4 import BeautifulSoup
import re

# Html source
html_source = '''
    <div class="d-flex justify-content-center mt-2" id="ads3">
           <div id="ad"> <p>good</p> </div>
    </div>

    <form method="POST">
      <input name="username">
    </form>
'''
# Parsing
soup = BeautifulSoup(html_source, 'html.parser')

# Find all elements contain number on id
els = soup.find_all(id=re.compile("\d"))

print(els)

"\d": Matches any decimal digit. Equivalent to [0-9].

output:

<div class="d-flex justify-content-center mt-2" id="ads3">
<div id="ad"> <p>good</p> </div>
</div>