Skip to content

Geolocation tool to analyze cities newspapers articles categories over the years.

Notifications You must be signed in to change notification settings

jfecunha/CitySonar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CitySonar

Goals:

  • Categorize newspapers news within different categories through a machine learning model trained from scratch.
  • Analyze trends of the different categories over the years.
  • Geolocation tool to see collected metrics on the different cities.

Vision: Using historical data from Arquivo.pt this tool could be helpful in providing insights about how subjects like Crime, Environment, and Health are evolving over the years in Portugal's district capitals. This could be used as a proxy for cities life quality.

Application

The application has two different views. One that is related to the general overview and other at the city level.

General overview

alt text

alt text

alt text

City overview

alt text

alt text

alt text

Language models

Model Usage Obs
Spacy Pos-tagging pt_core_news_lg model
Yake Keyword extraction

Tasks

Execute
make arquivo-scraper Pipeline for Arquivo data collection
make publico-scraper Pipeline for Publico data collection
make data-cleaning Pipeline for Arquivo data cleaning
make train-category-classifier Train fasttext category classifier
make extract-arquivo-categories Pipeline to extract categories from Arquivo data
make extract-keywords-publico Pipeline to extract keywords from text using YAKE
make run-app Runs Streamlit Geolocation app

TO DO

  • Make streamlit app available at the public endpoint (Just runs locally at the moment).
  • Display Arquivo articles on streamlit app with keywords. Currently, is just being shown the scrapped data directly from Público.

Resources

About

Geolocation tool to analyze cities newspapers articles categories over the years.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published