How Web Scraping is Used to Extract Liquor Prices and Delivery Status from Total Wine and Other Stores?

  • Python requests, requests and download the HTML script of the pages.
  • Selectorlib, extracts data with the use of YAML files that we created from the web pages that we
    download.
pip3 install requests selectorlib
from selectorlib import Extractor import requests import csv e = Extractor.from_yaml_file('selectors.yml') def scrape(url): headers = { 'authority': 'www.totalwine.com', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36', 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'referer': 'https://www.totalwine.com/beer/united-states/c/001304', 'accept-language': 'en-US,en;q=0.9', } r = requests.get(url, headers=headers) return e.extract(r.text, base_url=url) with open("urls.txt",'r') as urllist, open('data.csv','w') as outfile: writer = csv.DictWriter(outfile, fieldnames=["Name","Price","Size","InStock","DeliveryAvailable","URL"],quoting=csv.QUOTE_ALL) writer.writeheader() for url in urllist.read().splitlines(): data = scrape(url) if data: for r in data['Products']: writer.writerow(r)
  • It analyzes a list of Total Wine and other URLs from a file known as urls.txt.
  • It uses a selectorlib YAML file that will identify the information for the Total Wine page and gets saved in a file known as selectors.yml.
  • Extracts the information.
  • The data gets saved in CSV format called data.csv.

Developing the YAML file-Selectors.yml

Products: css: article.productCard__2nWxIKmi multiple: true type: Text children: Price: css: span.price__1JvDDp_x type: Text Name: css: 'h2.title__2RoYeYuO a' type: Text Size: css: 'h2.title__2RoYeYuO span' type: Text InStock: css: 'p:nth-of-type(1) span.message__IRMIwVd1' type: Text URL: css: 'h2.title__2RoYeYuO a' type: Link DeliveryAvailable: css: 'p:nth-of-type(2) span.message__IRMIwVd1' type: Text

Executing Total Wine and More Scraper

https://www.totalwine.com/spirits/scotch/single-malt/c/000887?viewall=true&pageSize=120&aty=0,0,0,0
  • If the website changes its design, for instance: the CSS selectors that we use for Price in the selectors.yaml file called price_1JvDDp_x will majorly change over time or even in regular days.
  • The “location selection” for your “local” store will be based more on variables rather than your geolocated IP address and the website will ask you to choose the location. This does not get managed in simple code.
  • The site will add new information points or edit the existing ones.
  • The website will block the used User-Agent.
  • The site will block the pattern to access this script will use.
  • The website will block your IP address or all the IPs from your proxy.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
3i Data Scraping

3i Data Scraping

44 Followers

3i Data Scraping is an Experienced Web Scraping Service Provider in the USA. We offering a Complete Range of Data Extraction from Websites and Online Outsource.