Last modified: Jan 19, 2026 By Alexander Williams

BeautifulSoup Tutorial: Scrape & Download Images

Web scraping is a powerful skill. It lets you gather data from websites. Images are a common target. This tutorial will show you how.

We will use Python, BeautifulSoup, and Requests. You will learn to find and save images. This is perfect for beginners.

Why Scrape Images?

You might need images for a project. Maybe for a dataset or a gallery. Manual downloading is slow and tedious.

Automation saves time. You can collect hundreds of images quickly. Always check a website's robots.txt file first.

Respect the website's terms of service. Do not overload their servers. This is ethical scraping.

Setup and Installation

First, ensure you have Python installed. Then, install the necessary libraries. Use pip, the Python package manager.

Open your terminal or command prompt. Run the following commands. This will install BeautifulSoup and Requests.


pip install beautifulsoup4 requests

These are the core tools. BeautifulSoup parses HTML. Requests fetches web pages.

You are now ready to start coding. Create a new Python file. Let's name it image_scraper.py.

Fetching the Web Page HTML

The first step is to get the page content. We use the requests.get() function. It sends a GET request to a URL.

We then check if the request was successful. The status code should be 200. We pass the HTML to BeautifulSoup.


import requests
from bs4 import BeautifulSoup

# URL of the page you want to scrape
url = 'https://example.com/gallery'

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(response.content, 'html.parser')
    print("Page fetched successfully!")
else:
    print(f"Failed to retrieve page. Status code: {response.status_code}")

This code gets the page. It creates a BeautifulSoup object. This object is named soup.

The soup object lets us search the HTML. We can find tags, classes, and IDs. Next, we find image tags.

Finding All Image Tags

Images in HTML are defined with the <img> tag. The source URL is in the src attribute.

We use BeautifulSoup's find_all() method. It returns a list of all matching tags. We look for 'img' tags.


# Find all image tags on the page
image_tags = soup.find_all('img')

# Print the number of images found
print(f"Found {len(image_tags)} image(s).")

# Let's look at the first few image sources
for img in image_tags[:3]:
    print(img.get('src'))


Found 15 image(s).
/images/photo1.jpg
https://cdn.example.com/pic2.png
/data/image3.svg

The output shows image sources. They can be relative or absolute URLs. We must handle both types correctly.

Relative URLs need the base website URL. We use the urljoin function from urllib.parse. It creates full URLs.

Building Complete Image URLs

Not all src attributes are full links. Some are relative like /images/photo.jpg.

We must convert them to absolute URLs. This ensures our download requests work. We import urljoin.


from urllib.parse import urljoin

base_url = 'https://example.com/gallery'

for img in image_tags:
    src = img.get('src')
    if src: # Check if src exists
        # Construct the full URL
        full_url = urljoin(base_url, src)
        print(full_url)

Now every image has a complete address. This is crucial for downloading. The next step is to save them to disk.

Downloading and Saving the Images

We loop through our list of full image URLs. For each, we send another GET request. We get the raw image data.

We then write this data to a file. We use a binary write mode ('wb'). We also extract a filename from the URL.


import os

# Create a directory to save images
save_dir = 'downloaded_images'
os.makedirs(save_dir, exist_ok=True)

for i, img in enumerate(image_tags):
    src = img.get('src')
    if not src:
        continue # Skip if no src

    full_url = urljoin(base_url, src)

    try:
        # Request the image data
        img_response = requests.get(full_url, stream=True)
        img_response.raise_for_status() # Check for errors

        # Create a filename
        # Use the last part of the URL, or generate a name
        filename = os.path.join(save_dir, f'image_{i}.jpg')

        # Save the image
        with open(filename, 'wb') as f:
            for chunk in img_response.iter_content(chunk_size=8192):
                f.write(chunk)
        print(f"Downloaded: {filename}")

    except Exception as e:
        print(f"Failed to download {full_url}: {e}")

This script downloads all images. It saves them in a folder. Each file gets a unique name.

The stream=True argument is important. It handles large files efficiently. It downloads in chunks.

Handling Common Challenges

Real websites can be tricky. Images might be in lazy-loaded formats. The src might be in a data-src attribute.

You need to inspect the page HTML. Use your browser's developer tools. Look for the true image source.

Update the code to check multiple attributes. For example, check for data-src if src is empty.


# Check for common lazy-loading attributes
src = img.get('src') or img.get('data-src') or img.get('data-lazy-src')

Another challenge is filtering images. You may only want certain sizes or types. You can filter by file extension.

Check if the URL ends with .jpg, .png, or .gif. This ensures you only download actual image files.

For more complex scraping, like handling pagination, see our guide on Advanced BeautifulSoup Pagination & Infinite Scroll.

Complete Example Script

Here is the full script. It combines all the steps. You can adapt it for your needs.


import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import os

def scrape_images(url, save_folder='scraped_images'):
    """Scrape and download all images from a given URL."""
    response = requests.get(url)
    if response.status_code != 200:
        print("Failed to fetch page.")
        return

    soup = BeautifulSoup(response.content, 'html.parser')
    image_tags = soup.find_all('img')

    os.makedirs(save_folder, exist_ok=True)

    for i, img in enumerate(image_tags):
        # Try multiple possible attributes for the image source
        src = img.get('src') or img.get('data-src')
        if not src:
            continue

        full_url = urljoin(url, src)

        # Optional: Filter by image extension
        if not (full_url.lower().endswith(('.jpg', '.jpeg', '.png', '.gif', '.webp'))):
            continue

        try:
            img_data = requests.get(full_url, stream=True)
            img_data.raise_for_status()

            # Create a sensible filename
            filename = os.path.basename(full_url)
            if not filename or '.' not in filename:
                filename = f'image_{i}.jpg'
            filepath = os.path.join(save_folder, filename)

            with open(filepath, 'wb') as f:
                for chunk in img_data.iter_content(chunk_size=8192):
                    f.write(chunk)
            print(f"Saved: {filepath}")

        except Exception as e:
            print(f"Error downloading {full_url}: {e}")

if __name__ == '__main__':
    target_url = 'https://example.com/photos' # Replace with your target URL
    scrape_images(target_url)

This is a robust script. It includes error handling and filtering. It's a great starting point.

For more foundational knowledge, our Web Scraping Guide with BeautifulSoup for Beginners is perfect.

Conclusion

You have learned to scrape images with BeautifulSoup. The process is simple. Fetch HTML, find tags, build URLs, and download.

Remember to scrape responsibly. Check robots.txt. Do not harm the website. Use delays between requests.

This skill is useful for many projects. You can build image datasets. You can monitor website changes.

To take this further, learn to Schedule Automate Web Scraping BeautifulSoup. Automation makes your scripts more powerful.

Happy scraping! Start with a simple website. Experiment with the code. You will master it quickly.