Last modified: Jan 19, 2026 By Alexander Williams
Scrape Job Listings with BeautifulSoup to Excel
Job hunting can be time-consuming. Scraping can automate data collection.
This guide shows you how to use Python's BeautifulSoup. You will extract job listings.
Finally, you will save the clean data into an Excel file. This is perfect for analysis.
Prerequisites and Setup
You need Python installed on your computer. Basic Python knowledge helps.
Open your terminal or command prompt. Install the required libraries.
Use the pip package manager. Run the following command.
pip install beautifulsoup4 requests pandas openpyxl
BeautifulSoup parses HTML. Requests fetches web pages.
Pandas handles data. Openpyxl writes Excel files.
If you are new to BeautifulSoup, read our Web Scraping Guide with BeautifulSoup for Beginners.
Understanding the Target Website Structure
First, inspect the job board website. Right-click on a job listing.
Select "Inspect" or "Inspect Element". This opens developer tools.
Look for HTML tags containing job titles, companies, and locations.
Common tags are <div>, <h2>, and <span>.
Identify their class names or IDs. We will use them to extract data.
For this tutorial, we'll use a simple example structure.
Fetching the Web Page with Requests
Use the requests.get() function. Pass the target URL.
Check the response status. A 200 code means success.
Then, pass the page content to BeautifulSoup for parsing.
import requests
from bs4 import BeautifulSoup
# URL of the job listings page
url = 'https://example.com/jobs'
# Send a GET request to the website
response = requests.get(url)
# Check if the request was successful
if response.status_code == 200:
# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
print("Page fetched and parsed successfully.")
else:
print(f"Failed to retrieve page. Status code: {response.status_code}")
Parsing and Extracting Job Data
Now, use BeautifulSoup's find methods. Locate the job containers.
Use find_all() to get a list of all job posting elements.
Loop through each container. Extract specific details like title.
Store each job's data in a dictionary. Append it to a master list.
Our Clean HTML Data with BeautifulSoup guide can help with complex parsing.
# Find all job listing containers (adjust selector based on actual site)
job_listings = soup.find_all('div', class_='job-listing')
jobs_data = []
for job in job_listings:
# Extract job title
title_elem = job.find('h2', class_='job-title')
title = title_elem.text.strip() if title_elem else 'N/A'
# Extract company name
company_elem = job.find('span', class_='company')
company = company_elem.text.strip() if company_elem else 'N/A'
# Extract job location
location_elem = job.find('span', class_='location')
location = location_elem.text.strip() if location_elem else 'N/A'
# Extract job link
link_elem = job.find('a', href=True)
link = link_elem['href'] if link_elem else 'N/A'
# Make sure link is absolute
if link and link.startswith('/'):
link = 'https://example.com' + link
# Create a dictionary for this job
job_info = {
'Title': title,
'Company': company,
'Location': location,
'Link': link
}
jobs_data.append(job_info)
print(f"Extracted {len(jobs_data)} job listings.")
Page fetched and parsed successfully.
Extracted 10 job listings.
Saving Data to an Excel File with Pandas
Pandas makes saving data easy. Convert the list of dictionaries to a DataFrame.
Use the to_excel() method. Specify the filename.
The openpyxl engine will create the .xlsx file. It will be in your project folder.
import pandas as pd
# Create a DataFrame from the list of job dictionaries
df = pd.DataFrame(jobs_data)
# Save the DataFrame to an Excel file
excel_filename = 'job_listings.xlsx'
df.to_excel(excel_filename, index=False, engine='openpyxl')
print(f"Job listings successfully saved to {excel_filename}")
Job listings successfully saved to job_listings.xlsx
Handling Pagination and Complex Sites
Real job sites have multiple pages. You need to scrape them all.
Find the "Next" button link. Loop through pages until none are left.
Some sites use AJAX or infinite scroll. This requires advanced techniques.
For handling multiple pages, see our tutorial on Advanced BeautifulSoup Pagination & Infinite Scroll.
Always add delays between requests. Use time.sleep().
This respects the website's server. It prevents your IP from being blocked.
Ethical Scraping and Best Practices
Always check the website's robots.txt file. Respect the rules.
Do not overload servers with rapid requests. Space them out.
Use scraped data for personal analysis only. Do not republish it.
Some sites have APIs. Using an API is often better than scraping.
Identify yourself in requests. Use a descriptive User-Agent header.
Conclusion
You have learned to scrape job listings. BeautifulSoup and Requests fetch data.
Pandas saves it to a clean Excel spreadsheet. This automates data collection.
Start with a simple site. Master the basics of HTML structure.
Then move to more complex targets with pagination. Always scrape ethically.
This skill is valuable for job market research and data analysis projects.