Skip to content

engineerchange/pain-pills

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pain-pills

Import Data

Data Source: https://www.washingtonpost.com/graphics/2019/investigations/dea-pain-pill-database/#download-resources

Use PowerShell to perform quick check on number of rows for reading pain-pills into R without keeping it all in memory:

$count = 0
$filepath = Resolve-Path ".\arcos_all_washpost.tsv.gz" #get long filepath within current wd
$reader = New-Object IO.StreamReader $filepath
while($reader.ReadLine() -ne $null){$count++} #readline row at a time
$count #print row number
$reader.Close()

52,567,689 rows


Read in R file a chunk at a time; limit myself to 27 chunks of 100k rows (2.7M rows in total):

Props to @amunategui: https://amunategui.github.io/dealing-with-large-files/index.html

See read-tsv_all.R

Some benchmarking in R:

pryr::mem_change(rm(data))
rows,size
1000000,530MB
2000000,1.01GB
3000000,1.58GB
4000000,2.02GB
5000000,2.48GB

In general, we can state that every 1M rows equates to about 0.50 GB memory. Due to lack of that much system memory, I abandon and use data extracts from DC, MD, VA (DMV).

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages