Releases: hallamlab/TreeSAPP
v0.9.7
Added
- New option in
treesapp package
called 'rename' to change the name of a reference package - MMSeqs2 and MAFFT versions included in
treesapp info
output - GitHub Actions for CI
- Unit tests
Fixed
- (#63) An annoying warning that a fasta with classified nucleotide sequences couldn't be created for protein sequence input.
- (#62) Corrected FPKM values.
- (#58) List available reference packages (link to RefPkgs in README).
Changed
- Updated required version of
samsum
to 0.1.4
Full city+ - Gedeo
This release is for version 0.9.0 and covers many new features. It is strongly recommended that everyone updates to this version.
There is no backwards compatability of between this and older versions. All reference packages built using older versions will need to be remade from scratch.
New features:
- Uses EPA-NG and RAxML-NG for phylogenetic placement and inference, respectively. Using EPA-NG drops the runtime drastically.
- Reference packages are stored as a single, pickled file (#48). The subcommand
treesapp package
has been specifically designed to allow users to still interact with this binary file. - Reduced RAM usage by not loading query FASTA, rather loading just headers and loading only the query sequences that matched a reference package's profile HMM (#49)
- Using the RefPkgs repository for version control of all non-core reference packages. Available for everyone to contribute to!
- Profile HMMs are dereplicated at the genus rank for more sensitive profiles.
- Users are able to update reference packages with GTDB-tk lineages by using the --seqs2lineage argument.
- The subcommand
treesapp train
fully supports checkpointing. Checkpointing in other subcommands is still to come (but well on its way). - We finally have some sort of a test suite.
Change notes:
- Format of the classification table (final_outputs/marker_contig_map.tsv) has been changed. Only a single column for the recommended taxonomy and the hmmsearch-derived E-value is reported.
- Reference trees are automatically rooted using ETE3's 'set_outgroup' function with the farthest node. Polytomies are also automatically resolved when the tree is built using FastTree.
- In
treesapp create
(and thereforetreesapp update
) reference sequence outlier detection and removal using OD-Seq has been made optional and can be requested using the '--outdet_align' flag.
Issues addressed:
#51
And many that went unreported :)
Full city - Agaro
This release is the version used in the forthcoming TreeSAPP manuscript.
Change notes
- Installation has been made significantly easier with TreeSAPP available as a Python package on PyPI, bundled with all dependencies using conda, and as a container through Singularity.
- The Python package samsum is now used to calculate FPKM of classified sequences from read alignments.
treesapp update
has a '--resolve' flag available for swapping original reference sequences with new ones that are more taxonomically resolved.- pyfastx is now used for reading FASTQ files. Extracting classified sequences from FASTQ is still not scalable, however, and this is not recommended. Making this more efficient is a goal for the next release.
Issues fixed
City+ - Guji
This new pre-release of TreeSAPP features a new subcommand treesapp purity
and many improvements to outputs and sub-command integration, particularly through the update
-create
-train
workflow.
There have also been either updates to pre-existing reference packages or entirely new refpkgs, like DsrAB and HydA. Some of these feature heavy use of layering metabolic (or activity, homolog or other) annotations to sort out phylogenetically similar homologs that may be mis-classified otherwise.
As usual, many bug-fixes accompany this version :)
City-Nyeri
TreeSAPP has been improved in nearly all aspects, expect possibly run time. New linear modelling techniques allow for improved taxonomic assignment and these have been included in the current set of reference packages. New hmmsearch and BMGE parameters have increased the number of true positive hits while placement filters have significantly reduced type one errors. This version performs best when combined with the latest version of RAxML - 8.2.12.
Cinnamon-Tarrazu
This version of TreeSAPP still employs a BLAST-genewise ORF prediction strategy. HMMER 2.4i was used for profile alignment to generate the MSAs. I have noticed that the HMMs were not as specific as those built using HMMER 3.1b; I've observed many false-positive alignments with them and hmmsearch. Other than these dependency differences, this version is compatible with the versions post fastsearch
merge.
It will mostly be used for benchmarking from here on out.
FYI, the version naming convention will be a 'coffee roast level'-'coffee growing region'. This version is still young and spritely, so I thought a Cinnamon roast from Costa Rica's Tarrazu region in would be fitting.
Green-Chanchamayo
This release is a vanilla reimplementation of the original MLTreeMap codebase (written in Perl) but written in Python. There are a few minor changes such as new functions to help with the user experience and multiprocessing of binaries. There are no algorithmic changes that would cause results between the original Perl version and this Python version to differ but there are newer versions of binaries being used (blastn, RAxML, etc.).
As the initial release, we are beginning with green coffee, from the Chanchamayo region on the Eastern slopes of the Peruvian Andes.