Skip to content

A blog scraper designed specifically to scrape the top 50 blogs in various categories from Detailed

Notifications You must be signed in to change notification settings

pranav2305/blog-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Task ID: Web Scraper


Setup

  1. Open a terminal

  2. Navigate to the directory where you want to install the project
    Example : cd wec-recs

  3. Clone the repository
    git clone https://github.com/pranav2305/blog-scraper.git

  4. Navigate to the project directory
    cd blog-scraper

  5. Install the node packages
    npm i

  6. Run the server
    node index.js

  7. Open the website on your browser
    http://localhost:3000/


How to Use

  1. Open the website or use the localhost if you cloned the repo.

  2. Select any one URL from the list of compatible URLs.

  3. Click on the Scrape button to scrape data from that URL.

  4. The scraped blogs will be displayed.


Tech Used

  • An express server was made using Node.js.
  • A node package called Axios was used to request data from a URL.
  • Cheerio was used to scrape the data.
  • EJS was used to create templates to render the website with dynamic data.
  • Bootsrap was used to make the website responsive by using their Grid system
  • Heroku to deploy the website.

About

A simple web scraper made to scrape blogs. This website take an URL as an input and displays the extracted data from that URL. Currently the scraper is limited to a few URLs only as listed below. As of now, the scraper only scrapes blogs from Detailed which updates the top 50 blogs in various categories, every 24 hours. The main aim of the scraper is to search the meaningful data from a website and ignore the unnecessary data to make it more understandable.


Samples

  1. Home Page
    home-page


  1. Tech Blogs (Desktop view)
    tech-blogs


  1. Art Blogs (Mobile view)
    art-blogs

  2. For other valid URLs (incompatible URLs)
    no-data



  1. For invalid URLs
    invalid-url


Demo Video

Link: https://youtu.be/SAH5qdraBnA


References

  1. Cheerio docs

  2. Bootstrap grid system


Compatible URLs

About

A blog scraper designed specifically to scrape the top 50 blogs in various categories from Detailed

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published