How to Scrape Rentals Websites Using BeautifulSoup and Python?

  • How do suburb's rents compare to the Toronto city’s rents?
  • How much can you potentially save if you have rented a basement unit?
  • Which suburbs have the lowest rent rates?

Scraping Rental Website Data through Web scraping using BeautifulSoup and Python

# Import Python Libraries # For HTML parsing from bs4 import BeautifulSoup # For website connections import requests # To prevent overwhelming the server between connections from time import sleep # Display the progress bar from tqdm import tqdm # For data wrangling import numpy as np import pandas as pd pd.set_option('display.max_columns', 500) pd.set_option('display.width', 1000) # For creating plots import matplotlib.pyplot as plt import plotly.graph_objects as go
def get_page(city, type, beds, page): url = f'https://www.torontorentals.com/{city}/{type}?beds={beds}%20&p={page}' # https://www.torontorentals.com/toronto/condos?beds=1%20&p=2 result = requests.get(url) # check HTTP response status codes to find if HTTP request has been successfully completed if result.status_code >= 100 and result.status_code <= 199: print('Informational response') if result.status_code >= 200 and result.status_code <= 299: print('Successful response') soup = BeautifulSoup(result.content, "lxml") if result.status_code >= 300 and result.status_code <= 399: print('Redirect') if result.status_code >= 400 and result.status_code <= 499: print('Client error') if result.status_code >= 500 and result.status_code <= 599: print('Server error') return soup
for page_num in tqdm(range(1, 250)): sleep(2) # get soup object of the page soup_page = get_page('toronto', 'condos', '1', page_num) # grab listing street for tag in soup_page.find_all('div', class_='listing-brief'): for tag2 in tag.find_all('span', class_='replace street'): # to check if data point is missing if not tag2.get_text(strip=True): listingStreet.append("empty") else: listingStreet.append(tag2.get_text(strip=True))
# create the dataframe df_Toronto_Condo = pd.DataFrame({'city_main':'Toronto', 'listing_type': 'Condo', 'street': listingStreet, 'city': listingCity, 'zip': listingZip, 'rent': listingRent, 'bed': listingBed,'bath': listingBath, 'dimensions': listingDim}) # saving the dataframe to csv file df_Toronto_Condo.to_csv('df_Toronto_Condo.csv')

Data Preparation & Cleaning with Pandas

Insights Produced from Data

Investigation of Relationship Between Dimensions and Rent

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
3i Data Scraping

3i Data Scraping

44 Followers

3i Data Scraping is an Experienced Web Scraping Service Provider in the USA. We offering a Complete Range of Data Extraction from Websites and Online Outsource.