Beautifulsoup - How to Get all images

Beautifulsoup - How to Get all images

In this article, we'll learn how to get all images with Beautifulsoup using find_all() or select(). We'll also learn how to get all images from a web page.

Get all images using find_all() method

findall() is a method to find specific data from HTML and return the result as a list. We will use this method to get all images from HTML code.

First of all, let's see the syntax and then an example.

Syntax

soup.find_all("img")

Example

from bs4 import BeautifulSoup

# HTML source
html_source = '''
<div class="category-bar">

<img src="image1.jpg" alt="image1">
<img src="image2.jpg" alt="image2">

<div class="section">
    <img src="image3.jpg" alt="image3">
    <img src="image4.jpg" alt="image4">
    <img src="image5.jpg" alt="image5">
    <img src="image6.jpg" alt="image6">
</div>

</div>
'''

# Parse
soup = BeautifulSoup(html_source, "html.parser")

# Find all images element using find_all()
images = soup.find_all("img")

# Print images
print(images)

Output:

[<img alt="image1" src="image1.jpg"/>, <img alt="image2" src="image2.jpg"/>, <img alt="image3" src="image3.jpg"/>, <img alt="image4" src="image4.jpg"/>, <img alt="image5" src="image5.jpg"/>, <img alt="image6" src="image6.jpg"/>]

As you can see, we got all the image elements. Now, let's print the src attribute of each image.

# Print the src attribute
for image in images:
    print(image['src']) # Print src attribute value

output:

image1.jpg
image2.jpg
image3.jpg
image4.jpg
image5.jpg
image6.jpg

For more information about the Beautifulsoup attribute, please visit Understand attribute in Beautifulsoup.

Get all images using select() method

select() is a method that finds specific data from HTML by CSS selector and returns the result as a list.

Here the syntax:

Syntax

soup.select("img")

Example

from bs4 import BeautifulSoup

# HTML source
html_source = '''
<div class="category-bar">

<img src="image1.jpg" alt="image1">
<img src="image2.jpg" alt="image2">

<div class="section">
    <img src="image3.jpg" alt="image3">
    <img src="image4.jpg" alt="image4">
    <img src="image5.jpg" alt="image5">
    <img src="image6.jpg" alt="image6">
</div>

</div>
'''

# Parse
soup = BeautifulSoup(html_source, "html.parser")

# Find all images using select()
images = soup.select("img")

# Print images
print(images)

Output:

[<img alt="image1" src="image1.jpg"/>, <img alt="image2" src="image2.jpg"/>, <img alt="image3" src="image3.jpg"/>, <img alt="image4" src="image4.jpg"/>, <img alt="image5" src="image5.jpg"/>, <img alt="image6" src="image6.jpg"/>]

The same output as findall() example.

Get all images from website's page

We need the requests library to be installed to get all images from a web page.

installation

Installing requests via pip:

# PIP 2 (Python2)
pip install requests

# PIP 3 (Python3)
pip3 install requests

Example

Let's say we want to get all images from our pytutorial home page.

import requests
from bs4 import BeautifulSoup

# Page Url
page_url = "https://pytutorial.com"

# Request 
req = requests.get(page_url)

# Page source
page_source = req.text

# Parse
soup = BeautifulSoup(page_source, "html.parser")

# Find all images
images = soup.find_all("img")

# Print images
print(images)

Output:

[<img alt="logo" height="80" src="/theme/img/logo_w.png" width="150"/>, <img alt="How to Get inner Div Using Beautifulsoup" class="img-fluid" src="/theme/img/articles_image/python_cover/how-to-get-inner-div-using-beautifulsoup.png"/>, <img alt="How to Update Variable [string, list, dictionary] in for Loop Python" class="img-fluid" src="/theme/img/articles_image/python_cover/how-to-update-variable-string-list-dictionary-in-for-loop-python.png"/>, <img alt="2 Methods to Set two Variables to the Same Value in python" class="img-fluid" src="/theme/img/articles_image/python_cover/2-methods-to-set-two-variables-to-same-value-python.png"/>, <img alt="How to solve ModuleNotFoundError: No module named 'django_heroku'" class="img-fluid" src="/theme/img/articles_image/py.jpg"/>, <img alt="How to solve ModuleNotFoundError: No module named 'yaml'" class="img-fluid" src="/theme/img/articles_image/py.jpg"/>, <img alt="How to solve modulenotfounderror: no module named pycocotools" class="img-fluid" src="/theme/img/articles_image/py.jpg"/>, <img alt="How to Solve modulenotfounderror no module named six" class="img-fluid" src="/theme/img/articles_image/py.jpg"/>, <img alt="Python: Capture screenshot when the mouse is clicked" class="img-fluid" src="/theme/img/articles_image/py.jpg"/>, <img alt="BeautifulSoup: Get Text value of Element using .string &amp; .strings properties" class="img-fluid" src="/theme/img/articles_image/bs.png"/>, <img alt="Python convert Sass to Css" class="img-fluid" src="/theme/img/articles_image/py.jpg"/>, <img alt="Quantcast" border="0" height="1" src="//pixel.quantserve.com/pixel/p-31iz6hfFutd16.gif?labels=Domain.pytutorial_com,DomainId.228000" width="1"/>]

Great! we've got the image elements.

Conclusion

We are done with this article. I hope you understood how to get all images Beautifulsoup.
For more articles about Beautifulsoup, Scroll down, and Happy learning ♥