Skip to content

A web crawling to extract doctor information from doctoralia site

Notifications You must be signed in to change notification settings

fabinhojorge/doctoralia

Repository files navigation

Doctolaria Web Crawler

It is a web Crawler and Scraper to extract data from doctolaria site.

The informations collected are:

  • Name
  • Image_link
  • Specializations
  • Experiences
  • City
  • State
  • Address
  • Address_telephone

How to install and Run

After activate your Python Virtual Environment (venv) run the below command to install the dependencies:

pip install -r requirements.txt

Libraries and files

  • chromedriver.exe - Web driver used by Selenium to call Chrome. This executable is for Windows x64. If you are not confident to use this .exe file, OR have another Operation System, you can download the correct version at Selenium Chrome webdriver

How to use

python DoctoraliaWebCrawler.py

Observations about the target site

  • The pagination are limited to 100 pages and locked to 20 doctors per page
  • A doctor can have multiple addresses. In this project we are only extracting the First Address and the Telephones for this address

About

A web crawling to extract doctor information from doctoralia site

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages