How to extract text with formatting using Beautifulsoup
If you would like to extract text and keep its formatting, here’s how to do it with Beautifulsoup.
Important: we will use a real-life example in this tutorial, so you will need requests and Beautifulsoup libraries installed.
Step 1. Let’s start by importing the Beautifulsoup library.
from bs4 import BeautifulSoup
Step 2. Then, import the requests library.
import requests
Step 3. Get a source code of your target landing page. We will be using our landing page ‘How to Use Proxy SwitchyOmega on Chrome’ in this example.
r=requests.get("https://proxyway.com/guides/proxy-switchyomega-chrome")
Universally applicable code would look like this:
r=requests.get("Your URL")
Step 4. Convert HTML code into a Beautifulsoup object named soup.
soup=BeautifulSoup(r.content,"html.parser")
In this tutorial, we will extract text with bold formatting, but you can use this code for any of the HTML formatting. A list of all HTML text formatting can be found here.
Step 5. Extract an array of bold elements.
bold=soup.find_all("b")
Step 6. Form an array with text without bold tag.
text=[]
for i in bold:
text.append(i.get_text())
Step 7. Let’s check if our code works by printing it out.
print(text)
Results:
Congratulations, you’ve extracted text with formatting using Beautifulsoup. Here’s the full script:
from bs4 import BeautifulSoup
import requests
r=requests.get("https://proxyway.com/guides/proxy-switchyomega-chrome")
soup=BeautifulSoup(r.content,"html.parser")
bold=soup.find_all("b")
text=[]
for i in bold:
text.append(i.get_text())