How to get ‘src’ attribute from ‘img’ tag using Beautifulsoup
The majority of data were collected in February and March of 2023.
Step 1. Let’s start by importing the Beautifulsoup library.
from bs4 import BeautifulSoup
Step 2. Then, import requests library.
import requests
Step 3. Get a source code of your target landing page.
r=requests.get("https://books.toscrape.com/")
Step 4. Convert the HTML code into a Beautifulsoup object named soup.
soup=BeautifulSoup(r.content,"html.parser")
Step 5. Inspect the page to find the image object you would like to extract.
The code of this image object looks like this:
thumbnail_elements = soup.find_all("img", class_ = "thumbnail")
NOTE: For this website, you can find images by looking for img tags that have a thumbnail class.
Step 6. Let’s check if our code works by printing it out.
print(thumbnail_elements)
Step 7. Now you need to get the src attribute from each element.
for element in thumbnail_elements:
print (element['src'])
Results:
Congratulations, you’ve found and extracted the content of an image source using Beautifulsoup. Here’s the full script:
from bs4 import BeautifulSoup
import requests
r = requests.get("https://books.toscrape.com/")
soup = BeautifulSoup(r.content, "html.parser")
thumbnail_elements = soup.find_all("img", class_ = "thumbnail")
print(thumbnail_elements)
for element in thumbnail_elements:
print (element['src'])
#for element in thumbnail_elements:
# print ("https://books.toscrape.com/" + element['src'])