US county level COVID-19 case data.
Daily snapshots of US cases by county.
State | Scraper | Validator | Aggergator | Time Series |
---|---|---|---|---|
AK | Y | N | N | N |
AL | Y | N | N | N |
CA | Y | N | N | N |
CO | Y | N | N | N |
DE | Y | N | N | N |
FL | Y | N | N | N |
GA | Y | N | N | N |
IA | Y | N | N | N |
KS | Y | N | N | N |
KY | Y | N | N | N |
LA | Y | N | N | N |
MD | Y | N | N | N |
ME | Y | N | N | N |
MI | Y | N | N | N |
MO | Y | N | N | N |
MN | Y | N | N | N |
MT | Y | N | N | N |
NJ | Y | N | N | N |
NY | Y | N | N | N |
OH | Y | N | N | N |
PA | Y | N | N | N |
TN | Y | N | N | N |
TX | Y | N | N | N |
VA | Y | N | N | N |
WA | Y | N | N | N |
WY | Y | N | N | N |
/data # county level snapshots by scrape timestamp.
|
- {state}_by_county_{scraper_timestamp_in_EDT}.txt # snapshot of scraped results as of timestamp.
/source_page_backup # backup of source pages by scrape timestamp.
|
- {state}_county_{scrape_timestamp}.html # backup of source page. Extension depends on data source.
- main.ipynb # triggers crawler
- config.yaml # shared scraper configurations
- {state}_by_county.ipynb # State specific scapers
Scrapers are simple python scripts or jupyter notebooks that implement a fetch, save, and run method.
Returns - DataFrame containing positive cases by county. - Source data - HTML page, etc.
Fetch is responsible for getting and processing a page into a Pandas DataFrame. Fetch must return a DataFrame must contain county
and positive_cases
columns (additional columns are fine) and a string containing the data source being scraped.
Params:
- df (DataFrame): DataFrame containing county
and positive_cases
columns (additional columns are fine)
- source (str): string containing the data source page that was scraped.
Save handles persisting the Data Frame and source data. df is saved as a pipe delimited text file in the data directory with the scraping timestamp in EDT.
Handles fetch and save in one action. Used in main crawling job.