Today we going to talk about websites scraping and how to use BeautifulSoup and requests to get data from the sites

What are BeautifulSoup and requests?

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. Requests is an elegant and simple HTTP library for Python, built for human beings.

How to use BeautifulSoup and requests for pulling data from websites

First, we need to install these two libraries

Code:


#instaling beautifulsoup4 Library
pip install beautifulsoup4

#Instaling requests library
pip install requests

Now, for getting a website source, we will use this function below
code:


import requests
from bs4 import BeautifulSoup

def extract_source(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}
    source=requests.get(url, headers=headers).text
    return source
    

url_source = extract_source(url="https://pytutorial.com") # https://pytutorial.com source


# BeautifulSoup
soup = BeautifulSoup(url_source, 'html.parser')

1. Getting title tag:
code:


print(soup.title)

output:

<title>pytutorial | the simplest website tutorial</title>

if you want to get just title's value, you have to do something like this
code:


print(soup.title.string)


output:
pytutorial | the simplest website  tutorial                                                                                            

2. Getting h1 tag
Code:


print(soup.h1)

Output:

<h1>pytutorial | the simplest website tutorial</h1>

Now, I know you are probably thinking, "what if the page has many h1 or p tags." so, in this situation, we should use "find_all" function,
code:


# list of p tags
p_scraping = soup.find_all('p') #list

printing out one by one


code:
for i in p_scraping:
    print(i)
    

3. getting href link from links

code:


a_scraping = soup.find_all('a') # 'a' tag
for i in a_scraping:
    print(i.get('href')) #  .get('href') / for geting link from 'a' tag
    

output:

/how-to-use-glob-module-in-python                                                                                                                                                                                                                                       
/category/python-tutorial

English today is not an art to be mastered it's just a tool to use to get a result