How to extract text from a table using Beautifulsoup
Important: we will use a real-life example in this tutorial, so you will need requests and Beautifulsoup libraries installed.
Step 1. Let’s start by importing the Beautifulsoup library.
from bs4 import BeautifulSoup
Step 2. Then, import the requests library.
import requests
Step 3. Get a source code of your target landing page. We will be using our landing page 5 Methods for Scraping Google Search Results in this example.
r=requests.get("https://proxyway.com/guides/how-to-scrape-google-search-results/")
Universally applicable code would look like this:
r=requests.get("Your URL")
Step 4. Convert HTML code into a Beautifulsoup object named soup.
soup=BeautifulSoup(r.content,"html.parser")
Step 5. Extract text from a table. There are several different ways to accomplish this task. If there is only one table in your HTML document (or you just need the text from the first table), this piece of code might be suitable for you.
text_table=soup.find("table").get_text()
However, if you need text from a specific table, there’s another way. Suppose we want to extract the text of the third table.
We will use this code for this task:
text_table=soup.select_one("table:nth-of-type(3)").get_text()
Universal code for extracting the table text according to its sequence number in the HTML document looks like this:
text_table=soup.select_one("table:nth-of-type(n)").get_text()
In this code, n is the sequence number of the table you selected.
Note:
If you know the id or class of a table whose text you want to extract, you can use the following code to extract text from a specific table:
For Classes:
text_table=soup.find("table",{"class":"class you want to extract"}).get_text()
For IDs:
text_table=soup.find("table",{"id":"id you want to extract"}).get_text()
Step 6. Let’s check if our code works by printing it out.
print(text_table)
Results:
Congratulations, you’ve extracted a text from a table using Beautifulsoup. Here’s the full script:
from bs4 import BeautifulSoup
import requests
r=requests.get("https://proxyway.com/guides/how-to-scrape-google-search-results")
soup=BeautifulSoup(r.content,"html.parser")
text_table=soup.select_one("table:nth-of-type(3)").get_text()
print(text_table)