Skip to content

dgavr/Zebrafish_mappability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Mappability tracks for danRer7 and danRer10

Genomes contain a large number of repeated elements and regions bearing similarity. When mapping genomics reads it is important to bear in mind whether reads fall into these regionss and possibly ignore such regions or to determine the number of multimapping reads to allow.

Here is a resource using two methods to generate such "mappabilty" tracks using:

(1) a python script and bowtie mapper. (2) Using GEM package

As zebrafish genomes currently available lack such a resource, possibly useful danRer10 and danRer7 mappability tracks are proposed. Finished tracks are available as a hub in UCSC genome browser :

UCSC genome browser hubs for danRer7/Zv9 Mappability:

http://userweb.molbiol.ox.ac.uk/public/dariag/mappability/hub_qd_zv9.txt
http://userweb.molbiol.ox.ac.uk/public/dariag/mappability/hub_gem_zv9.txt

UCSC genome browser hubs for GRCz10/danRer10 Mappability:

http://userweb.molbiol.ox.ac.uk/public/dariag/mappability/hub_qd_zv10.txt
http://userweb.molbiol.ox.ac.uk/public/dariag/mappability/hub_gem_zv10.txt

This resource was generated as part of the FoxD3 project carried out in the laboratory of T. Sauka-Spengler at the Weatherall Institute of Molecular Medicine at the University of Oxford, in the UK.

Preliminary manuscript for this project is available on bioRxiv: https://www.biorxiv.org/content/biorxiv/early/2017/11/22/213611.full.pdf

For additional information on project please see tsslab/foxd3 github repository.

Approach I: Mapping-based alignability

  1. Get fasta sequence of genome.
  • For Zv9/danRer7 from UCSC genome browser here
  • For GRCz10/danRer10 from UCSC genome browserhere
  1. Make read set covering whole genome using genomeFasta2reads.py script

      To use :

genomeFasta2reads.py <genome.fa> <desired_read_length>

      This will generates fastq-formatted "reads" with a phred quality score H of desired length with a step of 1.

      Example:

@synthetic_read-danRer10:chr1:0:40
GATCTTAAACATTTATTCCCCCTGCAAACATTTTCAATCA
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
@synthetic_read-danRer10:chr1:1:41
ATCTTAAACATTTATTCCCCCTGCAAACATTTTCAATCAT
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
@synthetic_read-danRer10:chr1:2:42
TCTTAAACATTTATTCCCCCTGCAAACATTTTCAATCATT
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

      Links to the fastq.gz files:

  1. Map fastqs back to genome:

      Download bowtie v.1.1.0 index for danRer7

      Download bowtie v.1.1.0 index for danRer10

      Map "synthetic" fastqs of different back to genome using bowtie(v.1.1.0.) using -m parameters. m1


R1=danRer7_40bp_reads.fastq

bowtie -S -p 24 -m 1 danRer7 $R1 --chunkmb 500 > | samtools view -bS - > zv7_on_zv7_m1_40bp_b1.bam 
bowtie -S -p 24 -m 2 danRer7 $R1 --chunkmb 500  | samtools view -bS - > zv7_on_zv7_m2_40bp_b1.bam 
bowtie -S -p 24 -m 3 danRer7 $R1 --chunkmb 500  | samtools view -bS - > zv7_on_zv7_m3_40bp_b1.bam 

      If in a hurry, split fastqs into multiple smaller fastqs and map each individually and then merge using bamtools merge.

  1. Use bedtools genomeCoverageBed to generate bedgraph files. To visualise make bigwigs using bedGraphToBigWig from UCSC genome browser.

      Links to the bigwig files:

  1. Parse bedgraph file to select regions that have a signal corresponding to length of read. Then use bedtools complement to get the complement (all regions that do not have that signal).

awk '$4 == 40' bmerged_zv7_on_zv7_m1_40bp_b1.bg > bmerged_zv7_on_zv7_m1_40bp_b1_eq.bed
sort -k1,1 -k2,2n bmerged_zv7_on_zv7_m1_40bp_b1_eq.bed > bmerged_zv7_on_zv7_m1_40bp_b1_eq.sort.bed
bedtools complement -i bmerged_zv7_on_zv7_m1_40bp_b1_eq.sort.bed -g danRer7.chrom.sizes > com_bmerged_zv7_on_zv7_m1_40bp_b1_eq.sort.bed

awk '$4 == 40' bmerged_zv10_on_zv10_m1_40bp_b1.bg > bmerged_zv10_on_zv10_m1_40bp_b1_eq.bed
sort -k1,1 -k2,2n bmerged_zv10_on_zv10_m1_40bp_b1_eq.bed > bmerged_zv10_on_zv10_m1_40bp_b1_eq.sort.bed
bedtools complement -i bmerged_zv10_on_zv10_m1_40bp_b1_eq.sort.bed -g danRer10.chrom.sizes > com_bmerged_zv10_on_zv10_m1_40bp_b1_eq.sort.bed

      Links to the bed.gz files to regions to exclude:

Approach II: GEM

  1. Create Genome index file
gem-indexer -i danRer7.fa -o danRer7_index

danRer7 .gem file danRer10 .gem file

  1. Generate .mappability file
gem-mappability -I danRer7_index.gem -l 50 -o danRer7_40 -T 24

      Links to .mappability files:

  1. Generate .wig file and bigWig files

      Links to GEM bigwig files:

GC content

GC content bigwig file for danRer7
GC content bigwig file for danRer10