How to Get href of Element using BeautifulSoup [Easily]
- Last modified: 04 December 2020
- Category: python libraries
In this article, we're going to learn how to get the href attribute of an element by using python BeautifulSoup.
1. Getting all href attributes
In the first example, we'll get all elements that have a href attribute.
syntax:
soup.find_all(href=True)
Example
from bs4 import BeautifulSoup
html_source = '''
<link rel="stylesheet" type="text/css" href="/theme/css/bootstrap.min.css">
<link rel="stylesheet" type="text/css" href="/theme/css/style.css">
<div>
<a class="mode" href="https://pytutorial.com"><p>Converting File Size in Python</p></a>
<a href="https://ex.com/home"><p>Converting File Size in Python</p></a>
</div>
'''
soup = BeautifulSoup(html_source, 'html.parser')
find_all_a = soup.find_all(href=True)
print(find_all_a)
output
[<link href="/theme/css/bootstrap.min.css" rel="stylesheet" type="text/css"/>, <link href="/theme/css/style.css" rel="stylesheet" type="text/css"/>, <a class="mode" href="https://pytutorial.com"><p>Converting File Size in Python</p></a>, <a href="https://ex.com/home"><p>Converting File Size in Python</p></a>]
As you can see, we got <link> and <a> elements.
Now, to get href content, we need first iterate over the result's list then use the following syntax.
syntax:
el['href']
example:
for el in find_all_a:
print(el['href'])
output
/theme/css/bootstrap.min.css /theme/css/style.css https://pytutorial.com https://ex.com/home
2. Getting href of <a> tag
Let's say we want to get href of <a> elements.
syntax:
soup.find_all("a", href=True)
Example:
from bs4 import BeautifulSoup
html_source = '''
<link rel="stylesheet" type="text/css" href="/theme/css/bootstrap.min.css">
<link rel="stylesheet" type="text/css" href="/theme/css/style.css">
<div>
<a class="mode" href="https://pytutorial.com"><p>Converting File Size in Python</p></a>
<a href="https://ex.com/home"><p>Converting File Size in Python</p></a>
</div>
'''
soup = BeautifulSoup(html_source, 'html.parser')
find_all_a = soup.find_all("a", href=True)
for el in find_all_a:
print(el['href'])
output
https://pytutorial.com
https://ex.com/home
Let me explain.
1. find all elements that have <a> tag and href attribute.
2. iterate over the result.
3. print href by using el['href'].
English today is not an art to be mastered it's just a tool to use to get a result