We use affiliate links. They let us sustain ourselves at no cost to you.

How to extract text with formatting using Beautifulsoup

If you would like to extract text and keep its formatting, here’s how to do it with Beautifulsoup.

Important: we will use a real-life example in this tutorial, so you will need requests and Beautifulsoup libraries installed.

Step 1. Let’s start by importing the Beautifulsoup library.

				
					from bs4 import BeautifulSoup
				
			

Step 2. Then, import the requests library.

				
					import requests
				
			

Step 3. Get a source code of your target landing page. We will be using our landing page ‘How to Use Proxy SwitchyOmega on Chrome’ in this example.

				
					r=requests.get("https://proxyway.com/guides/proxy-switchyomega-chrome")
				
			

Universally applicable code would look like this:

				
					r=requests.get("Your URL")
				
			

Step 4. Convert HTML code into a Beautifulsoup object named soup.

				
					soup=BeautifulSoup(r.content,"html.parser")
				
			

In this tutorial, we will extract text with bold formatting, but you can use this code for any of the HTML formatting. A list of all HTML text formatting can be found here.

Step 5. Extract an array of bold elements.

				
					bold=soup.find_all("b")
				
			

Step 6. Form an array with text without bold tag.

				
					text=[]
for i in bold:
    text.append(i.get_text())
				
			

Step 7. Let’s check if our code works by printing it out.

				
					print(text)
				
			
print text with formating

Results:

Congratulations, you’ve extracted text with formatting using Beautifulsoup. Here’s the full script:

				
					from bs4 import BeautifulSoup
import requests
r=requests.get("https://proxyway.com/guides/proxy-switchyomega-chrome")
soup=BeautifulSoup(r.content,"html.parser")
bold=soup.find_all("b")
text=[]
for i in bold:
    text.append(i.get_text())
				
			

Join Smartproxy’s webinar about ready-made scrapers on May 7, 10AM EST. Save your seat >