Skip to content

This series cover the basics of web scraping and more. It is intended for Python beginners,intermediate or any one who is interested in building data mining bots using Python.

License

Notifications You must be signed in to change notification settings

krizten/Beginners-Guide-To-Web-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraping Fundamentals

Python Version Beautiful Soup Selenium Srapy

This tutorial cover the basics of web scraping and it is intended for Python beginners,intermediate or any one who is interested in building data mining bots. The content of this repository is totally free for use. Please see note on licensing to learn more about copyright issues. Contributions are most welcome. Kindly open an issue or make a pull request for your contributions and feedback.

The tutorial is structured into five (5) sections, namely:

  • Preamble
  • Getting Started with Web Scraping
  • Scraping JavaScript-rendering pages
  • Scrapy Framework
  • Optimization and Extensions (Moving Forward)

TABLE OF CONTENTS

  • Preamble

    • Installation and setup of Python Interpreter, IDE/Text Editor
    • Web scraping theories (Introduction to web scraping, robots.txt, Sitemaps, legal policy and more)
    • Crawling Vs Scraping
  • Getting Started with Web Scraping

    • Devtool inspection for DOM elements and Network request & response
    • Introduction to Requests and BeautifulSoup
    • Project 2-1: Extracting data on smartphone from Jumia Nigeria e-commerce website. https://www.jumia.com.ng/smartphones/
  • Scraping JavaScript-rendered pages

    • Browser automation with Selenium
    • Project 3-1: Scraping frequently bought products using Selenium https://mall.industry.siemens.com/mall/en/us/Catalog/Product/3RV20111KA15
    • API as an alternative (mimicking API calls)
    • Project 3-2: Scraping frequently bought products using API requests https://mall.industry.siemens.com/mall/en/us/Catalog/Product/3RV20111KA15
  • Scrapy Framework

    • Installation and CLI-tools commands
    • Framework components explained
    • Learning Xpath
    • Project 4-1: Scraping free Computer Science courses & MOOCs from Class Central with infinite scroll (https://www.class-central.com/subject/cs)
    • Project 4-2: Simple web crawler http://books.toscrape.com/
  • Optimization and Extension

    • Rotating Proxies & User Agents
    • Scheduling Scraping tasks (Cronjobs)
    • Using Multithreading/Multiprocessing in web scraping
    • Storing data to SQL/NoSQL databases
    • Porting scripts to desktop/web apps and CLI tools
    • Analysis, Visualization and modelling of mined data

RESOURCES

LICENSE

MIT

About

This series cover the basics of web scraping and more. It is intended for Python beginners,intermediate or any one who is interested in building data mining bots using Python.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages