Skip to content

Releases: sestaton/tephra

Tephra version 0.14.0

23 Apr 22:48
Compare
Choose a tag to compare

Summary of changes in this version:

New features:

  • Add 'info' subcommand to print a table of configuration info (Perl and Tephra versions, and versions of all installed
    external programs.

Bug fixes:

  • Adjust regex for getting divergence from PAML. This is something that unfortunately needs to be adjusted with
    updates to PAML.
  • Fixed the console messages when configuring and installing required packages so the output is much cleaner.

Misc:

  • Update Perl deps when building on v5.30+ (Pod::Find).
  • Add required modules for fetching muscle (and other packages) over https (Net::SSLeay and IO::Socket::SSL)
    during the initial configuration.
  • Change how the tephra command is being called when testing the full pipeline. Now it can be tested in place rather
    than expecting to be installed, which is much more desirable (avoids version conflicts and having to install the package
    to test the command).
  • Pin genometools version to v1.6 to ensure we are working with a stable version.
  • Update PAML from v4.8 to v4.10.6.
  • Capture the noisy output from EMBOSS compile process. Same with HTSlib and Tephra translate program.
  • Pin muscle to v3.8.31 and compile from source because the available binaries have issues in my tests.
  • Remove use of Travis-CI and switch to Github Actions for CI workflow.
  • Add coverage report from Codecov.
  • Add test file for new 'info' subcommand to evaluate results that are returned.
  • Update Docker and Github Actions to use cpm, which is much faster than to build than cpanm.

See the Changes file for more information.

Tephra version 0.13.1

04 Jul 03:46
Compare
Choose a tag to compare

Summary of changes in this version:

Bug fixes:

  • Fix issue with PATHs to Tephra configured programs (vmatch and mkvtree) not being set in *Annotation::MakeExemplars class.

Misc:

  • Updated TODO for this issue because it is hard to test (more details there).

See the Changes file for more information.

Tephra version 0.13.0

02 Jul 18:44
Compare
Choose a tag to compare

Summary of changes in this version:

New features:

  • Refactor build system to not used shared libraries and only included required dependency files.

Misc:

  • Correct alignment of reporting in log for LTR elements.

See the "Changes" file for more information.

Tephra version 0.12.6

07 Apr 23:56
Compare
Choose a tag to compare

Summary of changes in this version:

New features:

  • Add parallel search of both strands for non-LTR retrotransposons, which reduces
    user runtime by about 50% if multiple CPUs are available.
  • Use multiple CPUs (if requested) for HMM model searches for non-LTR elements.
  • Greatly simplify call to class methods for finding non-LTRs based on the use of
    new class attributes (req. fewer method arguments).
  • Add resusable for writing/annotating elements (both for families and singleton
    elements) in Tephra::Classify::Any.
  • Warn now when filtering Helitron/non-LTR elements based on N%.
  • Add --debug option for monitoring config/install process.

Bug fixes:

  • Adjust non-LTR element N% filtering logic to keep FASTA and GFF3 IDs properly in sync.
  • Improve logic for classifying Helitron and non-LTR retrotransposon families by storing
    full sequence IDs so no regex or transformations are needed.
  • Adjust method for writing elements in Tephra::Classify::Any to include all elements in
    a cluster (query and hits) above the threshold.
  • As with other family-level classification methods in Tephra, search for families only within
    a superfamily for Helitrons and non-LTR elements.

See the "Changes" file for more information.

Tephra version 0.12.5

26 Aug 21:30
Compare
Choose a tag to compare

Summary of changes in this version:

New features:

  • Add method for filtering LTR/TIR/TRIM elements against a user-provided gene set to
    remove spurious predictions that are likely tandemly repeated genes.
  • Handle compressed input for 'findltrs', 'findtirs', 'findtrims', and 'findfragments'
    commands.

Bug fixes:

  • Configure all external programs during installation instead of relying on those in PATH.
    This fixes the issue of not being able to find programs when running in a container.
    Fixes #41 on github.
  • Correct the number of elements discovered under different relaxed and strict conditions.
    What was being reported was the filtered numbers (correct for the total, but not the correct
    number initially identified).
  • Remove use of FTP protocol for configuring dependencies (and drop Net::FTP requirement). This
    protocol is not supported in all build environments, such as Travis, so it is better to use
    wget as a permanent replacement than to keep debugging FTP connection issues.
  • Configure install of EMBOSS during testing instead using package manager. Also, remove package
    installs of BLAST+ and EMBOSS to speed things up since we configure them locally.

For more information see the "Changes" file.

Tephra version 0.12.4

11 Jun 00:15
Compare
Choose a tag to compare

Summary of changes in this version:

New features:

  • Add protein domain coordinates for non-LTR elements to GFF output.
  • Parallelize search for EN and RVT domains in 'findnonltrs' command.

Bug fixes:

  • Fix method for how the number of non-LTR and Helitron families are calculated. Previously,
    there may have erroneously been single-copy families and singletons that should have been
    in families due to how the superfamilies where being indexed and combined.
  • In the *Annotation::Util class, there was a bug in getting codes for non-LTR superfamilies
    with some codes being redefined.

See the "Changes" file for more information.

Tephra version 0.12.3

04 Jan 20:56
Compare
Choose a tag to compare

Summary of changes in this version:

New features:

  • Add support for compressed input to 'all' command (and add tests with compressed data).
  • Add new suite of dev tests for the 'findtirs' and 'classifytirs' commands to
    ensure the bugs reported below with respect the element/family numbers are
    correct.
  • New methods for naming MITEs and LARDs so they are numbered sequentially instead
    of taking the number of the original TIR or LTR element.
  • Add role for removing repeat regions from feature store that is used in LTR/TIR
    classification routines.
  • Add simplified method for computing superfamily element counts from all routines,
    which involves only logging totals at the family stage after singletons are
    processed and classified.
  • Add new 'find_mites' method to *Classify::TIRSfams class and search for these elements
    prior to the family-level classification steps.
  • Store all repeat region features my chromosome source, and sort by chromosome then coordinate
    when writing features. This logic is used for all classification and reporting steps and
    greatly simplifies the code for ensuring the report is correct. Previously, the features
    were stored by region name/number only and sorted on that basis.

Bug fixes:

  • Bug fix for 'age' command with LTR/TIR coordinates being redefined and thus
    not being processed correctly.
  • Bug fix for target_site_duplication being assigned a parent of the TE instead
    of repeat_region.
  • Bug fix for defining the path to search for index files for 'findltr' and
    'findtir' commands.
  • Bug fix for 'findtirs' command reporting TIRs the same length of the full
    element span (this is bug in 'gt tirvish' but it is handled now).
  • Bug fix for 'classifyltrs' and 'classifytirs' commands with the number of elements
    per family being incorrectly reported.
  • Bug fix with MITE IDs not being updated in the FASTA even though the GFF3 includes
    these elements.

See the "Changes" file for more information.

Tephra version 0.12.2

25 Sep 01:52
Compare
Choose a tag to compare

Summary of changes in this version:

  • Modify the algoritm for how chromosomes/contigs are processed with the 'maskref' command to pre-process contigs shorter than the split size. This ensures that the max number of requested threads is always being used, and greatly reduces the time required to mask large genomes.

See the "Changes" file for more information.

Tephra version 0.12.1

12 Sep 16:25
Compare
Choose a tag to compare

Summary of changes in this version:

  • Update Dockerfile to git for creating an image from latest builds.
  • Add method to get muscle so all dependencies are handled during configuration.
  • Bug fix with making combined repeat library with 'all' command and not setting
    variable for logging results.
  • Add Bio::SearchIO::blastxml dependency to cpanfile. This was split out of BioPerl in
    in the 1.7x release.
  • Add Docker support to this version, and move core install to separate file.

See the "Changes" file for more information.

Tephra version 0.12.0

02 Aug 15:49
Compare
Choose a tag to compare

Summary of changes in this version:

New features:

  • Refactor LTRStats/TIRStats classes to put common methods in GFF Role or Tephra::Stats::Age class.
  • Remove 'ltrage' and 'tirage' commands and create single 'age' command to share refactored/common
    methods (updated tests for these changes).
  • Add 6 new HMM models to local Pfam database for tryrosine recombinases, endonucleases, and
    Helitron_like_N for classifying DIRs, non-LTRs, and Helitron elements, respectively.
  • Add methods to now handle unformatted repeat database with 'maskref' command. Previous versions
    would print a warning that no classifications could be determined based on the absense of the
    3-letter code in the header. Now, the Class, Order, and Repeat will be listed as 'Unknown' and
    the number of masked bases will be reported.
  • Remove redundant methods for finding LTR exemplars and place in common role under the LTR
    namespace (called *LTR::Role::Utils).
  • Add method to properly check for and annotate LARD elements. The statistics for these elements
    are now logged along with other LTR-RTs.
  • Add methods for finding TIR exemplars and place in common role under the TIR
    namespace (called *TIR::Role::Utils).
  • Refactor the control-flow for the 'age' command option and use new methods for getting exemplars,
    or if the --all option is given, use new logic to set output directory for both LTR/TIR types.
  • Refactor *SoloLTRSearch class to use more descriptive method for finding exemplars, which is now
    in the *LTR::Role::Utils class.
  • Add unclassified LTR elements to solo-LTR search instead of only Gypsy and Copia.
  • Add location of protein domains to family-level domain classfication report.
  • Complete re-write of solo-LTR search method. Now the 'sololtr' command uses nhmmer from HMMERv3
    and works straight from the LTR alignment so no model is constructed for the search. In addition, we
    now write the files as they are processed instead of writing all LTR FASTA files, to disk, then writing
    all alignments, etc. for each step in the process. This saves an enormous amount of time and space, and
    the method is more stable now as we're not writing thousands of files to disk.

Bug fixes:

  • Bug fix for 'maskref' command not cleaning up intermediate sub-directories. Fixes #24 on github.
  • Fix minor bug with header being printed multiple times in famliy-level domain architecture
    log file.
  • Fix major bug caused by a delete hash key statment in the write_families method in the
    *Classify::Fams class that was causing the writing of elements to get out of sync,
    and even some elements getting dropped from the combined file.
  • Fix bug with element IDs being updated in the domain architecture log and GFF3 after
    family-level classification but not FASTA. This caused the ID numbers in the FASTA to differ,
    but now they are consistent. This change is in the *Classify::Fams class, but required changes
    to both the 'classifyltrs' and 'classifytirs' commands to pass new object references containing
    a mapping of updated IDs.
  • Modify output in domain architecture log file so the results are printed in descending order
    by occurrence for easier interpretation of the results.
  • Remove delete key statement from loop when creating domain FASTA files for clustering (this fixes
    a bug with the abundance and family numbers getting out of sync).
  • Properly sort output of families with respect to abundance, and make sure singletons go into singleltons
    files and not families.
  • Add fragments to final FASTA repeat datbase so IDs in FASTA and GFF3 are consistent.
  • Do not remove singleton LTR sequences in family-level classification step. This allows all elements
    to now be considered in solo-LTR search.
  • Modify common LTR/TIR finding method to not delete FASTA file of feature parts from singleton elements.
    These fix allows us to use all LTR/TIR elements in age calculation and all LTR elements in
    solo-LTR search.
  • Minor bug fix for family-level classification when concatenating duplicate/split domains for clustering.
    Previously, the entire span of the domains was written as the location, which would be inclusive
    of other domains. This is corrected in the header now.
  • Bug fix for reporting order of protein domains in the family-level domain classification reports. The
    logic was correct previously, but the use of "keys" instead of a for-loop over an array
    caused the domain order to be randomized in the report.
  • We now only report one protein domain in the family-level domain classifcation report when two or more
    adjacent domains of the same type have been merged into one span. This is how the classification
    works, and hopefully the reporting is more logical now.