Skip to content

Using mash rather than kmds

John Lees edited this page Apr 5, 2017 · 9 revisions

Update I now prefer this method over kmds. It is the same, but much more efficient CPU and memory-wise.

NB please use v.1.1.3 or later when using this method.

In some cases kmds may be difficult to run/install. If you are in this situation I'd recommend using mash to generate the distance matrix -- this is fast and efficient, and can be fed directly into the R_mds.pl script.

To give the correct order of the samples in the mash reference, generate the <assemblies.fa> part of the command with

tr '\n' ' ' < sample_order.txt

The following code will do this

mash sketch -o reference <assemblies.fa>
mash dist reference.msh reference.msh > mash_distances.txt
perl scripts/mash2matrix.pl mash_distances.txt > all_distances.csv
perl scripts/R_mds.pl -d all_distances.csv -p sample_order.txt -o all_structure

You'll also get a scree plot output (scree_plot.pdf) which can help you choose the number of dimensions to retain in your dataset.

Clone this wiki locally