How to extract text with formatting using Beautifulsoup

If you would like to extract text and keep its formatting, here’s how to do it with Beautifulsoup.

Important: we will use a real-life example in this tutorial, so you will need requests and Beautifulsoup libraries installed.

Step 1. Let’s start by importing the Beautifulsoup library.

from bs4 import BeautifulSoup

Step 2. Then, import requests library.

import requests

Step 3. Get a source code of your target landing page. We will be using our landing page ‘How to Use Proxy SwitchyOmega on Chrome’ in this example.

r=requests.get("https://proxyway.com/guides/proxy-switchyomega-chrome")

An universal code might look like this:

r=requests.get("Your URL")

Step 4. Convert HTML code into a Beautifulsoup object named soup.

soup=BeautifulSoup(r.content,"html.parser")

In this tutorial, we will extract text with bold formatting, but you can use this code for any of the HTML formatting. A list of all HTML text formatting can be found here.

Step 5. Extract an array of bold elements.

bold=soup.find_all("b")

Step 6. Form an array with text without bold tag.

text=[]
for i in bold:
    text.append(i.get_text())

Step 7. Let’s check if our code works by printing it out.

print(text)


Results:
Congratulations, you’ve extracted text with formatting using Beautifulsoup. Here’s the full script:

from bs4 import BeautifulSoup
import requests
r=requests.get("https://proxyway.com/guides/proxy-switchyomega-chrome")
soup=BeautifulSoup(r.content,"html.parser")
bold=soup.find_all("b")
text=[]
for i in bold:
    text.append(i.get_text())
best-scraping-apis

Submit a comment

Your email address will not be published.

Rate this post