working with django, beautifulsoup and requests

Yesterday, I searched online for ways to use Beautifullsooup with Django. I found just one tutorial, and this tutorial used many technologies that are hard for beginners to understand.

So, I decided to make a simple tutorial in which we'll build an app to get <h1> content from any website.

let's see how our Final app looks like:

django beautifulsoup result

Let's get started:

Technologies

  • Django==3.1.4
  • beautifulsoup4==4.9.3
  • requests==2.25.1

Setting up Django Project

If you already have installed Django and libraries, you can move to the next part.

If not, you need to follow these steps:


mkdir DjangoBs4



cd DjangoBs4

Create a Virtual Environments.


virtualenv -p /usr/bin/python3 env

Activate it.


source env/bin/activate

Install Django.


pip install django

Install beautifulsoup4.


pip install beautifulsoup4

Install requests.


pip install requests

Start our project.


django-admin startproject DjangoBs

Start the app.


cd DjangoBs



django-admin startapp core

in DjangoBs/settings.py add app to INSTALLED_APPS


INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    
    #Apps
    'core',
]

Allow all hosts.


ALLOWED_HOSTS = ['*']

Set TEMPLATES directory path.


'DIRS': [os.path.join(BASE_DIR, 'TEMPLATES')],

Create TEMPLATES folder inside our project.


mkdir TEMPLATES

Migrate.


python3 manage.py migrate

Run server.


python3 manage.py runserver


System check identified no issues (0 silenced).
December 28, 2020 - 22:47:26
Django version 3.1.4, using settings 'Backend.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

Great!

Project structure:


├── core
│   ├── admin.py
│   ├── apps.py
│   ├── __init__.py
│   ├── migrations
│   ├── models.py
│   ├── __pycache__
│   ├── tests.py
│   └── views.py
├── db.sqlite3
├── DjangoBs
│   ├── asgi.py
│   ├── __init__.py
│   ├── __pycache__
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
├── manage.py
└── TEMPLATES
    └── django-bs.html


Using Django with Beautifulsoup

As I said, we'll build a simple app that gets <h1> content from any website but in two different types of Django views.

The first example with Function-Based Views (FBV).
The second example with Class-Based Views (CBV).

Example #1: Django Beautifulsoup (FBV)

In core/views.py, we need to import Beautifulsoup and requests libraries:


#bs
from bs4 import BeautifulSoup
#requests
import requests

And create the FBV function by following the code bellow:


def dj_bs(request):
    if request.method == "POST":

        #url
        url = request.POST.get('web_link', None)

        #requests
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}#headers
        source= requests.get(url, headers=headers).text # website's source
        
        #beautifulsoup
        soup = BeautifulSoup(source, 'html.parser')
        #check if <h1> element is found
        if soup.h1:
           result = soup.h1.string
        else:
            result = "H1 Element is not found"

        return render(request, 'django-bs.html', {'result':result})

    return render(request, 'django-bs.html')

Let me explain:

1. if the request's method is POST, we'll get the URL or link from the "web_link" input name.
2. Getting the URL page's source by using the requests library.
3. Parsing page source by using Beautifulsoup.
4. Checking if <h1> element is found.
5. If so, the user will get the <h1> content.
6. If not will get a not found message.
7. Setting the template's name and result variable.


in DjangoBs/urls.py:

add a path for our view:


urlpatterns = [
    path('admin/', admin.site.urls),

    #FBV
    path('django-bs/', dj_bs, name="django_bs"),
]

in the TEMPLATES directory, we need to create django-bs.html

Add the form and some CSS style to django-bs.html


<html>
    <head>
        <title>django and Bs</title>
    </head>
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">

    <style>
        
* {
  font-size: 14px;
}

body {
  background-color: #eee;
  margin-top: 50px;
  font-family: Verdana;
}

div.card {
  margin: auto;
  width: 400px;
  height: 190px;
  background-color: white;
  border-radius: 2px;
  box-shadow: 0px 3px 3px silver;
  padding: 25px;
  h1 {
    margin: 0 0 20px 0; 
    font-weight: normal;
    color: #03a9f4;
    font-size: 30px;
  }
  
  label {
    float: left;
    padding: 10px 10px 14px 0;
    width: 175px;
    margin-top: 10px;
    clear: left;
  }
  
  input {
    float: right;
    border: 2px solid silver;
    padding: 8px 0;
    border-width: 0 0 2px 0;
    width: 200px;
    margin-top: 15px;
    transition: border-color .3s;
    &:focus, &:hover {
      border-color: #03a9f4;
      outline: 0;
    }
    &.warning {
      border-color: #ff9800;
    }
    
    &.error {
      border-color: #f44336;
    }
    
    &.valid {
      border-color: #4caf50;
    }
    
    &[type=submit] {
      border: 0;
      background-color: white;
      color: #03a9f4;
      text-transform: uppercase;
      width: auto;
      cursor: pointer;
    }
  }
}


.output {
    text-align: center;
    margin-bottom: 50px;
}
    </style>
    <body>

    
    <!-- output -->
    <div class="output">
        <h2>{{result}}</h2>
    </div>


    <div class="card">
      <h3 class="mb-5">Get H1 Content From any Website</h3>

      <form action="{% url 'django_bs' %}" method="POST">
        {% csrf_token %}
        <label>LINK:</label>
        <input type="text" name="web_link"/>
        <input type="submit" name="submit" value="submit" />
      </form>
      
    </div>



    </body></html>

let's go to the http://127.0.0.1:8000/django-bs/ link, and see the result:

django beautifulsoup result

Example #2: Django Beautifulsoup (CBV)

If you would like to use Beatifulsoup with Django Class-Based Views, you should do it like the following example.

Note: we'll do the same as example #1

in our core/views.py, it looks like this:


from django.views.generic import TemplateView

class DjBs(TemplateView):
    template_name = "django-bs.html"

    def post(self, request, *args, **kwargs):

        website_link = request.POST.get('web_link', None)

        #requests
        url = website_link
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}#headers
        source= requests.get(url, headers=headers).text # website's source
        
        #beautifulsoup
        soup = BeautifulSoup(source, 'html.parser')

        #check if <h1> element is found
        if soup.h1:
            result = soup.h1.string
        else:
            result = "H1 Element is not found"

        return render(request, 'django-bs.html', {'result':result})


in DjangoBs/urls.py:


#CBV
path('django-bs-cbv', DjBs.as_view(), name="django_bs"),

Project on GitHub

The project is available on Github so, You can download it by clicking the link below:
Django and Beautifulsoup Github

Finally, I hope this tutorial helps.

See you later!

English today is not an art to be mastered it's just a tool to use to get a result