Important: we will use real-life example in this tutorial, so you will need requests and Beautifulsoup libraries installed.
Step 1. Let’s start by importing the Beautifulsoup library.
from bs4 import BeautifulSoup
Step 2. Then, import requests library.
import requests
Step 3. Get a source code of your target landing page.
r=requests.get("https://books.toscrape.com/")
Step 4. Convert HTML code into a Beautifulsoup object named soup.
soup=BeautifulSoup(r.content,"html.parser")
Step 5. Inspect the page to find the image object you would like to extract.
The code of this image object looks like this:
thumbnail_elements = soup.find_all("img", class_ = "thumbnail")
NOTE: For this website, you can find images by looking for img tags that have a thumbnail class.
Step 6. Let’s check if our code works by printing it out.
print(thumbnail_elements)
Step 7. Now you need to get the src attribute from each element.
for element in thumbnail_elements:
print (element['src'])
Results:
Congratulations, you’ve found and extracted the content of an image source using Beautifulsoup. Here’s the full script:
from bs4 import BeautifulSoup import requests r = requests.get("https://books.toscrape.com/") soup = BeautifulSoup(r.content, "html.parser") thumbnail_elements = soup.find_all("img", class_ = "thumbnail") print(thumbnail_elements) for element in thumbnail_elements: print (element['src']) #for element in thumbnail_elements: # print ("https://books.toscrape.com/" + element['src'])
If you rebuild the full URL, you can access the image.