Understand How to Use the attribute in Beautifulsoup Python

Understand How to Use the attribute in Beautifulsoup Python

In this tutorial, we're going to cover how to use the attribute in Beautifulsoup.

Beautifulsoup: Find all by attribute

To find by attribute, you need to follow this syntax.

syntax:


soup.find_all(attrs={"attribute" : "value"})

let's see examples.

In the following example, we'll find all elements that have "setting-up-django-sitemaps" in the href attribute.


from bs4 import BeautifulSoup

# Html source
html_source = '''
    <div> 
        <a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a> 
    </div>
'''

# Parsing
soup = BeautifulSoup(html_source, 'html.parser')

# Find by href attribute
els = soup.find_all(attrs={"href" : "setting-up-django-sitemaps"})

# Print Output
print(els)

output:

[<a href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>]


In this example, we'll find all elements that have POST in the method attribute.


from bs4 import BeautifulSoup

# Html source
html_source = '''
    <div>
       <a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>
    </div>

    <form method="POST">
      <input name="username">
    </form>
'''

# Parsing
soup = BeautifulSoup(html_source, 'html.parser')

#Find by method attribute
els = soup.find_all(attrs={"method" : "POST"})

print(els)

output:

[<form method="POST">
<input name="username"/>
</form>]

Beautifulsoup: get the attribute of an element

Syntax

element.attrs

To get all attributes of an element, you need to follow this code:


from bs4 import BeautifulSoup

# Html source
html = """
<div>
<h2 class="recent" id="attribute">Recent Posts:</h2>
</div>
"""

# Parse
soup = BeautifulSoup(html, 'html.parser')

# Get h2 tag
h2 = soup.h2

# Print h2 attribute
print(h2.attrs)

Output:

{'class': ['recent'], 'id': 'attribute'}

Beautifulsoup: Get the attribute value of an element

syntax:


element['attribute name']

Let's see how to get the attribute class.


# Html source
html_source = '''
    <div class="rightSideBarParent">
            <ul class="leftBarList">
                <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
            </ul>
    </div>
'''

# Find all <ul> elements
els = soup.find_all("ul")

for el in els:
  # Print Class attribute
  print(el['class'])

Let me explain:

  1. Find all by ul tag.
  2. Iterate over the result.
  3. Get the class value of each element.

output:

['leftBarList']

In the below example, we'll get the value of thehref attribute.


# Html source
html_source = '''
    <div class="rightSideBarParent">
            <ul class="leftBarList">
                <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
            </ul>
    </div>
'''
# Find all <a> elements
els = soup.find_all("a")

for el in els:
  print(el['href'])

output:

setting-up-django-sitemaps

Beautifulsoup: Find all by multiple attributes

syntax:


attrs={"attribute":"value", "attribute":"value",...}


Let say we want to find all elements that have "setting-up-django-sitemaps" in the href attribute and "link" in the id.


# Html source
html_source = '''
    <div class="leftSideBar">
        <ul class="leftBarList">
            <li><a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a></li>
        </ul>
    </div>
'''

# Parsing
soup = BeautifulSoup(html_source, 'html.parser')

# Find by href and id attribute
els = soup.find_all(attrs={"href":"setting-up-django-sitemaps", "id":"link"})

# Print result
print(els)

output:

[<a href="setting-up-django-sitemaps" id="link">How to Create Django Sitemaps</a>]

Beautifulsoup: Check if an attribute exists

syntax:


has_attr('some_attribute')

Return True if the attribute exists, otherwise False.

example:

from bs4 import BeautifulSoup

html_source = '''
    <div class="rightSideBarParent">
    <a id="link" href="setting-up-django-sitemaps">How to Create Django Sitemaps</a>
    </div>
'''

soup = BeautifulSoup(html_source, 'html.parser')

# Find <a> element
el = soup.find("a")


#check href attribute
print(el.has_attr('href'))

#check name attribute
print(el.has_attr('name'))

output:

True
False

Beautifulsoup: Find attribute contains a number

in this last part of this tutorial, we'll find elements that contain a number in the id attribute value.
To do this, we need to use Regex with Beautifulsoup.

example:


from bs4 import BeautifulSoup
import re

# Html source
html_source = '''
    <div class="d-flex justify-content-center mt-2" id="ads3">
           <div id="ad"> <p>good</p> </div>
    </div>

    <form method="POST">
      <input name="username">
    </form>
'''
# Parsing
soup = BeautifulSoup(html_source, 'html.parser')

# Find all elements contain number on id
els = soup.find_all(id=re.compile("\d"))

print(els)

"\d": Matches any decimal digit. Equivalent to [0-9].

output:

<div class="d-flex justify-content-center mt-2" id="ads3">
<div id="ad"> <p>good</p> </div>
</div>