Last modified: Jan 20, 2026 By Alexander Williams
Build Web Crawler BeautifulSoup SQLite
Web scraping is a powerful skill. It lets you gather data from websites. This data can be used for analysis, research, or projects. Python makes scraping easy. Two key libraries are Requests and BeautifulSoup.
But collecting data is only half the battle. You need to store it properly. That's where SQLite comes in. It is a lightweight database. You can store data in tables for easy access.
This guide will walk you through the entire process. We will build a simple crawler. It will extract data from a sample page. Then we will save it to an SQLite database.
Prerequisites and Setup
You need Python installed on your computer. Basic knowledge of Python is helpful. We will install the required libraries using pip.
Open your terminal or command prompt. Run the following commands. They will install Requests, BeautifulSoup, and SQLite support.
pip install requests beautifulsoup4
SQLite is built into Python. You do not need to install it separately. Now, create a new Python file. You can name it crawler.py.
Step 1: Fetching the Web Page
The first step is to get the HTML content. We use the requests.get() function. It sends a request to a URL. The server sends back the page content.
Always check the response status. A status code of 200 means success. We will also handle potential errors.
import requests
from bs4 import BeautifulSoup
# URL of the page to scrape
url = 'https://quotes.toscrape.com/'
try:
# Send a GET request to the URL
response = requests.get(url)
# Raise an error for bad status codes
response.raise_for_status()
print("Page fetched successfully!")
except requests.exceptions.RequestException as e:
print(f"Error fetching the page: {e}")
exit()
This code fetches the HTML. The content is in response.text. For more on using Requests with BeautifulSoup, see our guide on Build a Web Scraper with BeautifulSoup Requests.
Step 2: Parsing HTML with BeautifulSoup
Now we have raw HTML. It is messy and hard to read. BeautifulSoup parses it. It creates a tree structure. This makes it easy to find elements.
We create a BeautifulSoup object. We pass the HTML text and a parser. We will use Python's built-in 'html.parser'.
# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
# Let's see the page title
print(f"Page Title: {soup.title.string}")
Page fetched successfully!
Page Title: Quotes to Scrape
BeautifulSoup provides many methods. Use find() and find_all() to locate tags. You can search by tag name, class, or id.
Step 3: Extracting Data from the Page
Let's define what data we want. Our target site lists quotes. Each quote has text, an author, and tags. We need to find the HTML pattern.
Inspect the page with browser tools. Quotes are in <div class="quote">. Inside, we find the text, author, and tags.
# Find all quote containers
quote_divs = soup.find_all('div', class_='quote')
quotes_data = [] # List to store extracted data
for div in quote_divs:
# Extract the quote text
text = div.find('span', class_='text').get_text(strip=True)
# Extract the author name
author = div.find('small', class_='author').get_text(strip=True)
# Extract all tags for this quote
tags = [tag.get_text(strip=True) for tag in div.find_all('a', class_='tag')]
# Store as a dictionary
quote_info = {
'text': text,
'author': author,
'tags': ', '.join(tags) # Convert list to a comma-separated string
}
quotes_data.append(quote_info)
print(f"Extracted: {text[:50]}... by {author}")
print(f"\nTotal quotes extracted: {len(quotes_data)}")
This loop goes through each quote div. It extracts the three pieces of data. It stores them in a list of dictionaries. This is a common pattern in scraping.
Sometimes you may encounter errors while parsing. Our BeautifulSoup Common Errors Troubleshooting Guide can help you solve them.
Step 4: Storing Data in SQLite Database
We have data in a Python list. Now we save it to a database. SQLite stores data in a single file. It's perfect for small to medium projects.
First, we connect to a database file. If it doesn't exist, SQLite creates it. Then we create a table with the right columns.
import sqlite3
# Connect to SQLite database (or create it)
conn = sqlite3.connect('quotes.db')
cursor = conn.cursor()
# Create a table to store quotes
create_table_query = '''
CREATE TABLE IF NOT EXISTS quotes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
quote_text TEXT NOT NULL,
author TEXT NOT NULL,
tags TEXT
)
'''
cursor.execute(create_table_query)
print("Table 'quotes' created successfully.")
The table has four columns. An id, the quote text, the author, and tags. The id is the primary key. It auto-increments with each new row.
Step 5: Inserting the Scraped Data
Our data is ready in quotes_data. We loop through it. We insert each dictionary into the database table.
We use parameterized queries. This is important for security. It prevents SQL injection attacks.
# Insert each quote into the database
insert_query = "INSERT INTO quotes (quote_text, author, tags) VALUES (?, ?, ?)"
for quote in quotes_data:
cursor.execute(insert_query, (quote['text'], quote['author'], quote['tags']))
# Commit the transaction to save changes
conn.commit()
print(f"Inserted {len(quotes_data)} records into the database.")
# Don't forget to close the connection
conn.close()
The ? placeholders are replaced by our values. After inserting, we call conn.commit(). This saves all changes. Finally, we close the connection.
Step 6: Verifying and Querying the Data
Let's verify our work. Reconnect to the database. Run a simple SELECT query to fetch the data.
# Reconnect to verify the data
conn = sqlite3.connect('quotes.db')
cursor = conn.cursor()
# Query all records
cursor.execute("SELECT * FROM quotes")
all_rows = cursor.fetchall()
print("\nFirst 3 records in the database:")
for row in all_rows[:3]:
print(row)
conn.close()
Table 'quotes' created successfully.
Inserted 10 records into the database.
First 3 records in the database:
(1, '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'Albert Einstein', 'change, deep-thoughts, thinking, world')
(2, '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'J.K. Rowling', 'abilities, choices')
(3, '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'Albert Einstein', 'inspirational, life, live, miracle, miracles')
Success! The data is now persistently stored. You can query it anytime. You can analyze it with Python or other tools.
For more complex data cleaning tasks after scraping, check out Clean HTML Data with BeautifulSoup.
Conclusion
You have built a complete web crawler. It fetches, parses, extracts, and stores data. The process uses Requests, BeautifulSoup, and SQLite.
This is a foundational skill. You can adapt it for many websites. Always remember to check a site's robots.txt file. Respect the website's terms of service.
You can extend this project. Add error handling for network issues. Scrape multiple pages by handling pagination. Schedule the script to run automatically. The possibilities are vast.
Now you have a pipeline. It turns web data into structured information. This is valuable for data science, research, and automation.