Skip to content

Releases: TimD1/vcfdist

v2.5.3

10 Jun 22:31
Compare
Choose a tag to compare
v2.5.3 Pre-release
Pre-release

Fix variant clustering near contig edge

  • Fixed #28, an out-of-range error when the clustering alignment algorithm goes off the end of a contig

v2.5.2

08 May 20:31
Compare
Choose a tag to compare
v2.5.2 Pre-release
Pre-release

Bugfix in alignment distance

  • fixed indexing error (#27) in --distance option causing large RAM usage

v2.5.1

25 Mar 16:22
Compare
Choose a tag to compare
v2.5.1 Pre-release
Pre-release

Minor bugfixes

  • realignment options -rq and -rt no longer cause timer crashes
  • fixed off-by-one error in --distance causing additional edits

v2.5.0

11 Mar 20:03
Compare
Choose a tag to compare
v2.5.0 Pre-release
Pre-release

Major Changes

  • New definition of "sync groups" (complex variants) when attributing credit to variants. The new definition will break dependencies if the selected (rather than all possible) backtracking path(s) pass(es) through the reference diagonal. As a result, there should be more smaller sync groups, and fewer partial credit calls.
  • Precision-recall backtracking algorithm now maximizes TP calls
  • Removed the -s, --smallest-variant option. It offers no runtime benefits and will negatively impact performance (since small variants are prematurely filtered, they cannot be found equivalent to remaining variants). Instead, stratify variants after benchmarking or adjust the --sv-threshold and -l --largest-variant parameters to evaluate the desired variants.

Minor bugfixes

  • Fixed an erroneous return instead of break statement that caused segfaults in v2.4.0 when using --cluster gap or --cluster size.
  • Fixed a logical error that caused left_reach and right_reach to not be calculated for the first and last clusters on a contig, resulting in incorrect superclustering.

v2.4.0

17 Feb 15:04
Compare
Choose a tag to compare
v2.4.0 Pre-release
Pre-release

Major changes

  • changed handling of BED regions (see wiki) to exclude variants on border, necessary to be consistent with Truvari and how ground truth BEDs were generated

Minor updates

  • added -lm and -lstdc++ during linking, which should allow clang++ compilation (working towards bioconda release)
  • removed libstdc++fs dependency (further increasing compatibility)

v2.3.4

15 Feb 16:35
Compare
Choose a tag to compare
v2.3.4 Pre-release
Pre-release

Minor Improvements

  • added bcftools to Docker image, fixes #21
  • removed color printing when output redirected to file, fixes #20
  • increased Makefile compatibility with bioconda, progress towards #17

v2.3.3

09 Feb 16:17
Compare
Choose a tag to compare
v2.3.3 Pre-release
Pre-release

Minor updates

  • started the vcfdist wiki, which is currently a work-in-progress
  • added THRESHOLD column to precision-recall-summary.tsv, containing either NONE or BEST
  • added make install command
  • added new size-based clustering heuristic, explained in wiki
  • added evaluating ALL variants to to stdout and precision-recall.tsv
  • added RD and QD tags to summary.vcf, listing reference and query distances from truth sequence
  • added REF_DIST and QUERY_DIST columns to query.tsv and truth.tsv containing the same info

Minor bugfixes

  • fixed precision-recall-summary.tsv extra tab
  • added Makefile comment that libstdc++fs inclusion depends on GCC version
  • fixed off-by-one error that miscounted TRUTH_TP and TRUTH_FN (at g.max_qual only)
  • fixed segfault when no variants are present
  • FORMAT/BC tag in summary.vcf is now Float (not String)
  • fixed credit being set to 0.0 for all FP query variants below --credit-threshold

v2.3.2

18 Jan 21:45
Compare
Choose a tag to compare
v2.3.2 Pre-release
Pre-release

Major updates to analysis-v2 scripts

  • these scripts accompany the upcoming vcfdist-v2 paper

Minor updates

  • added --sv-threshold, which adds third precision/recall stratification

Minor bugfixes

  • fixed divide-by-zero if several variants are equivalent to no variants
  • fixed off-by-one error in phasing analysis logs

v2.3.1

15 Nov 00:20
Compare
Choose a tag to compare
v2.3.1 Pre-release
Pre-release

Bugfix: fixed phasing analysis error when switch and flip error occur in same supercluster.

v2.3.0

09 Nov 18:51
Compare
Choose a tag to compare
v2.3.0 Pre-release
Pre-release

Phasing analysis updates

  • added phasing threshold: superclusters are only considered phased if one phasing is an X% improvement over the other in terms of edit distance (this reduces false positive supercluster phasing flip errors that are actually variant calling errors), default 0.6
  • added phasing summary TSV (phasing-summary.tsv) that reports total flip errors, switch errors, phaseblock NG50, switch NGC50, and switchflip NGC50
  • add switchflip TSV (switchflips.tsv) that reports flip range, type, supercluster, and phase block
  • phase blocks are now computed from input phase sets, not backtracking, and per-phaseblock switch/flip errors were added to phase-blocks.tsv

Partial credit replaced with credit threshold

  • partial credit calculation is less intuitive and complicates matters more than necessary; I replaced this with a partial credit threshold where passing variants are counted as TP, default 0.7
  • I think that counting mostly-correct calls with a user-defined credit threshold is better

Runtime improvements: skip alignment distance and writing

  • alignment distance calculation is now skipped by default (I now think stratifying precision-recall curves by INDEL size may be more useful), can be turned on with -d, --distance
  • original and realigned truth/query VCFs are only written if --realign selected

Added new analyses

  • added analysis-v2 directory for upcoming paper figures

Minor fixes

  • GA4GH output VCF no longer always outputs gm: now it uses gm for TP, lm for PP, and . for FP/FN