Skip to content

A repository for code that generates various simulated data.

Notifications You must be signed in to change notification settings

fomightez/simulated_data

Repository files navigation

simulated_data

A repository for code that generates various simulated data.

Subfolders found here concern:

Related efforts by others

"What is a good and easy way to select 5-10 Mb of neutral DNA sequences in the human genome? Would selecting random intergenic regions (say >10Kb away from genes) be enough? Has someone done something similar recently in a paper I could cite and use the same loci?"
https://twitter.com/vsbuffalo/status/1646212322833334272
"I had to do this recently — I took all exonic + phastcons + UTRS, merged them, and then add 200bp of buffer on both ends (all using bedtools). You could do this and even select out random regions. I did some sensitivity analysis and comparison to the CADD tracks and seemed good."
"Also (and perhaps this is being too paranoid) but I merged the refseq and ensembl tracks. They differ slightly in their percent of basepairs that annotated as coding, so I took the union."

"Go and grab 130G of long-read mock microbial community data from PromethION and 36G from MinION over here, if you fancy: https://github.com/LomanLab/mockcommunity … #UKGS18 - could be useful for bioinformatics pipeline validation and method development!"

https://twitter.com/Hasindu2008/status/1628569325895585793

"Squigulator r10 branch https://github.com/hasindu2008/squigulator/tree/r10 can simulate r10.4.1 signals. Also f5c r10 branch https://github.com/hasindu2008/f5c/tree/r10 can do resquiggle and eventalign for R10.4.1. Note: still work in progress and improvements are on the way. Thanks, @nanopore for providing the pore-model."

"To test different approaches for assembling genomes, I needed data with known microbial content. Only long reads were available, but I needed to test the algorithm on short paired-end reads. This script was written to create short reads from long reads."

"Our single-cell and spatial omics simulator scDesign3 is now online: https://nature.com/articles/s41587-023-01772-1 scDesign3 has two functionalities: (1) synthetic data simulation and (2) real data interpretation and modification 1/"