Last modified: Feb 22, 2023 By Alexander Williams

Beautifulsoup Get all Links

In this article, we'll learn how to use a BeautifulSoup to get all links from HTML code and web pages.

Get all links from HTML Code

We can use find_all() or select() methods to get all links from HTML code.

Using findall()

Here's an example of how to use find_all() in BeautifulSoup to get all links from an HTML Code:

from bs4 import BeautifulSoup

html = '''
<a href="example1.com">example1</a>
<div>
    <a href="example2.com">example2</a>
    <a href="example3.com">example3</a>
    <a href="example4.com">example4</a>
</div>
'''

soup = BeautifulSoup(html, "html.parser") # Parse HTML

links = soup.find_all("a") # Get All <a> Tag

for link in links:
    print(link['href']) # Print Link

Output:

example1.com
example2.com
example3.com
example4.com

As you can see, we've:

  1.  Used the findall() function to find all <a> tags
  2. Looped through the list of a tags
  3. Printed the value of each tag's href attribute.

Let's see how to get only the links inside the div tag.

div = soup.find("div") # Get Div

links = div.find_all("a") # Get All <a> Tag

for link in links:
    print(link['href']) # Print Link

Using Select()

We can also use select() to get all inks from HTML, as demonstrated in the following example.

from bs4 import BeautifulSoup

html = '''
<a href="example1.com">example1</a>
<div>
    <a href="example2.com">example2</a>
    <a href="example3.com">example3</a>
    <a href="example4.com">example4</a>
</div>
'''

soup = BeautifulSoup(html, "html.parser") # Parse HTML

links = soup.select("a") # Get <a> Tags

for link in links:
    print(link['href']) # Print Link

Output:

example1.com
example2.com
example3.com
example4.com

Here is an example of using select() to extract links inside the div tag.

from bs4 import BeautifulSoup

html = '''
<a href="example1.com">example1</a>
<div>
    <a href="example2.com">example2</a>
    <a href="example3.com">example3</a>
    <a href="example4.com">example4</a>
</div>
'''

soup = BeautifulSoup(html, "html.parser") # Parse HTML

links = soup.select("div a") # Get All Links Inside Div

for link in links:
    print(link['href']) # Print Link

Output:

example2.com
example3.com
example4.com

Get All links From Web Page

To get all links from a web page, we need the requests library to get the web page's source by making an HTTP request to the URL of the page.

In the following example, we will get all the links from the homepage of "pytutorial.com".

from bs4 import BeautifulSoup
import requests # pip install requests

w = requests.get("https://pytutorial.com") # Get Page Source

soup = BeautifulSoup(w.text, "html.parser") # Parse

links =  soup.find_all("a")

for link in links:
    print(link['href'])

 Output:

/category/python-tutorial
/category/django-tutorial
/about-us
/contact-us
/find-a-word-in-a-list-python
/find-a-word-in-a-list-python
/remove-comma-in-number-python
/remove-comma-in-number-python
/convert-your-django-project-to-a-static-site-and-host-it-for-free
/convert-your-django-project-to-a-static-site-and-host-it-for-free
/how-to-use-beautifulsoup-to-extract-title-tag
/how-to-use-beautifulsoup-to-extract-title-tag
/remove-first-character-of-string-in-python
/remove-first-character-of-string-in-python
/python-variable-in-string
/python-variable-in-string
/Python-check-internet-connection
/Python-check-internet-connection
/python-capture-screenshot-mouse-clicked
/python-capture-screenshot-mouse-clicked
/how-to-use-pyautogui-python-library
/how-to-use-pyautogui-python-library
/how-to-use-glob-module-in-python
/how-to-use-glob-module-in-python
https://www.facebook.com/Pytutorial-108500610683725/?modal=admin_todo_tour
https://twitter.com/pytutorial
https://www.youtube.com/@pytutorial9501

As you can see, all the links have been gotten successfully. It is worth noting that you can also use select() to achieve the same result.

Counclusion

In conclusion, using the BeautifulSoup library in Python makes getting all the links from an HTML document or web page easy. By using the find_all() or select() methods. To extract links from a web page, we typically need to use the requests library first to retrieve the HTML source code for the page we're interested in.