Important: we’ll use a real-life example in this tutorial, so you’ll need Selenium library and browser drivers installed.
Step 1. Write your first Selenium script.
NOTE: We’ll be using Python and Chrome WebDriver. You can add the Chrome WebDriver to the Path.
Step 2. Now you’ll need to import the By selector module.
from selenium.webdriver.common.by import By
TIPS: Locating elements.
Step 3. Let’s try to find the stock availability of a book. We’ll be using the books.toscrape.com website in this example. Now, inspect the page source. You can find the word and the number of available books in the same element:
NOTE: We’ll use an XPath selector to locate the element as it has a text() method built-in.
TIP: If you need a refresher, look at the XPath cheatsheet.
Step 4. Then use this selector:
//*[contains (text(),'stock')]
It finds any element within the page, the text of which contains a stock string.
element_by_text = driver.find_element(By.XPATH, "//*[contains (text(),'stock')]").text
print (element_by_text)
NOTE: We’re using the driver.find_element() function to only get the first element found by the selector. It’s also possible to use the driver.find_elements() function to get a list of all elements.
This is the output of the script. It shows the elements you’ve just scraped.
Step 5. Now we can clean up the result by extracting the number from the text to do some other operations with it as an integer variable. You can do that with some simple regex.
Here we find the decimal number within the element_by_text string, assign it to a new variable and print it out separately:
in_stock = re.findall(r'\d+', element_by_text)[0]
print (f'In stock: {in_stock}')
This is the output of the script. It shows the stock availability of a book you’ve just scraped.
Results:
Congratulations, you’ve just extracted the stock availability of a book using Selenium.
from selenium import webdriver
from selenium.webdriver.common.by import By
import re
driver = webdriver.Chrome()
driver.get("http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html")
element_by_text = driver.find_element(By.XPATH, "//*[contains (text(),'stock')]").text
driver.quit()
print (element_by_text)
in_stock = re.findall(r'\d+', element_by_text)[0]
print (f'In stock: {in_stock}')