Skip to content

Troubleshooting

John Lees edited this page May 16, 2016 · 8 revisions

General

When I run the programs I get an error about a missing GLIBCXX

seer: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.17' not found (required by seer)
seer: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by seer)

Make sure you've got the gcc libstdc++ v4.9 or higher available. The default on your OS may be a much lower version than this.

If this isn't possible, use the static version under 'releases'

When running with --struct I get the error message: 'Number of rows in MDS matrix does not match number of samples'

Make sure that ARMA_USE_HDF5 is uncommented in include/armadillo_bits/config.hpp and run kmds again.

Analysis is slow

Most steps can be effectively parallelised in two ways. Increase the threading of the step with --threads, and/or split the input files and run each command independently on each input file.

split -n l/15 all_31mers 31mers

will split the input file all_31mers into 15 pieces which can be analysed independently. Use 'cat' to combine the results.

kmds

This takes ages to run

First, split up your input files into say 16 pieces, then subsample each one in its own process using --no_mds and --size.

Then list these output files in a text file, and input using --mds_concat. Using 16 threads this took about 30 hours and 53Gb on our test system of 3 000 samples, using 1% of the kmers.

Choose the number of kmers to subsample carefully, lower and this will run faster and require less memory. A size of 1% of the number of kmers is appropriate, but as low as 0.1% should work.

If this isn't possible, you can use mash to create the distance matrix from your assemblies, then run the final stage of kmds on this output.

I get error messages related to 'Intel MKL ERROR'

For example: Intel MKL ERROR: Parameter 10 was inccorect on entry to DGEMM Intel MKL ERROR: Parameter 2 was incorrect on entry to DSYEVD.

The .pheno file may not contain all of the assemblies counted in the first step, or may be malformed. Add any missing ones with a phenotype of 0 for this step.

You may also get this error message due to compile problems on static versions of the software, depending on the platform (see issue #30). In this case use the workaround script R_mds.pl

scripts/R_mds.pl -d all_structure.distances.csv -p metadata.pheno -o all_structture --pc 3 -R /software/R-3.2.2/bin/R

This requires R, and rhdf5 to be installed. To install the latter, run the following commands in R:

source("https://bioconductor.org/biocLite.R")
biocLite("rhdf5")

You will only need to do this once

seer

There is nothing in the output First turn off the p-value filtering with

--pval 1 --chisq 1

to check whether the program is running correctly, but no k-mers are passing the significance thresholds.

If this doesn't yield answers, make sure the --struct option is correct. Check the distance_matrix.csv to ensure the distances entries are non-zero, and do the same to the created structure file using h5dump.

The p-values are slightly different from what I expect

For very small p-values (when W > 5) an upper bound is calculated rather than the exact p-value for robustness and computational speed. This bound is accurate to +/- W^(-2) %

If you need the exact value, checkout the erfc branch which uses arbitrary precision floats and doesn't rely on this bound.

I get p-values of 0

p-value is too small to represent as a float (<10E-308). At this point it is a bit meaningless to assign a value, but essentially the data exactly fits the regression.

Biologically, either the effect size is enormous, or more likely the phenotype is monophyletic.

Check whether this is the case using a tool such as phylocanvas. If your phenotype is monophyletic then you won't have good resolution on which sequence elements may be related to it.

I get the error message 'A matrix inversion failed!'

The mds structure is probably poorly scaled compared to the kmers. kmers presence and absence is coded as 1 and 0 respectively.

kmds produced a rectangular matrix of mds values, where each row is a sample, and each column is a decreasing dimension. Each dimension is scaled to values in the range [-1,1].

These matrices are stored in hdf5 format, so you can use these tools to inspect them, and if necessary rescale them. Most convenient is the R package rhdf5.

I get error messages related to DLASCLS (or other LAPACK functions)

For example: Parameter 4 to routine DLASCLS was incorrect

The .pheno file is probably malformed -- ensure the first and second columns are identical. See the test/ directory for an example

Clone this wiki locally