Skip to content

kaglowka/glomerulus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

glomerulus

install requirements

pip install -r requirements.txt

application entry points

in src/main.py

from glomerulus.search.search import get_links_for_keywords

get_links_for_keywords('dzieci papierosy')

or

python main.py

then find and replace '&sa=.*' with '' in data/links.txt and delete all lines not starting with 'http'

run spider:

scrapy runspider links_spider.py

run spider & save to json:

scrapy runspider links_spider.py -o <json file>

or run spider and save in "data/articles.json":

python scrapy_run.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages