Skip to content

Installing TreeSAPP dependencies

Connor Morgan-Lang edited this page Jun 16, 2022 · 6 revisions

If you do not already have the dependencies for TreeSAPP installed on your computer, we've listed how to download and install each one below.

TreeSAPP can find these if they are already installed somewhere in your environment's PATH (e.g. /usr/local/bin/). Your path variable can be printed by typing echo $PATH on Linux and Mac systems.

Required

HMMER

TreeSAPP uses HMMER for identifying marker gene sequences in proteins and genomes and performing a profile alignment to include query sequences in a multiple sequence alignment prior to phylogenetic placement. The latest version is available at http://hmmer.org/. Download it from there and follow their installation guide under DOCUMENTATION.

Recommended version: 3.3

Citation: Eddy, S. R. (1998). Profile hidden Markov models. Bioinformatics (Oxford, England), 14(9), 755–763. https://doi.org/btb114

Prodigal

Prodigal is used for ORF prediction and can be downloaded from the GitHub page. Follow the installation guide on their GitHub wiki to install.

Recommended version: 2.6.3

Citation: Hyatt, D., Chen, G.-L., Locascio, P. F., Land, M. L., Larimer, F. W., & Hauser, L. J. (2010). Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 11, 119. https://doi.org/10.1186/1471-2105-11-119

BMGE

BMGE is used for selecting phylogenetically informative regions from multiple sequence alignments. This is optionally used prior to building reference phylogenies and phylogenetic placement due to significant reductions in compute time. It is distributed with TreeSAPP (in treesapp/sub_binaries/) but can also be installed using conda. Old download links are no longer functional.

Recommended version: 1.12

Citation: Criscuolo, A., & Gribaldo, S. (2010). BMGE (Block Mapping and Gathering with Entropy): A new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evolutionary Biology, 10(1). https://doi.org/10.1186/1471-2148-10-210

EPA-NG

EPA-NG is used for phylogenetic placement in TreeSAPP, the process of mapping query sequences onto branches of a reference tree. It can be installed using conda, or from source by following the instructions here.

Recommended version: >=0.3.7

Citation: Barbera, P., Kozlov, A. M., Czech, L., Morel, B., & Stamatakis, A. (2018). EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences. Systematic Biology, 0(0), 291658. https://doi.org/10.1101/291658

RAxML-NG

RAxML-NG is the next generation of RAxML - the previous workhorse of TreeSAPP. It is used for inferring phylogenies and bootstrapping. The easiest method for installing RAxML-NG is by downloading a pre-compiled binary to somewhere in your environment's path. It can also be compiled and installed from source by following their instructions.

Recommended version: >=1.0.0

Citation: Kozlov, A. M., Darriba, D., Flouri, T., Morel, B., & Stamatakis, A. (2019). RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics, 35(21), 4453–4455. https://doi.org/10.1093/bioinformatics/btz305

System Dependencies

Some of the Python packages depend on Linux libraries to compile. If you're experiencing issues installing some Python packages, try installing the following (for Ubuntu):

  • libsqlite3-dev
  • python-tk
  • libffi-dev

Optional

If you would like to build or update reference packages you will also need to install FastTree, MAFFT, USEARCH and OD-Seq.

You will need to install BWA if you have FASTQ files that you would like to derive relative abundance values of classified sequences from.

FastTree

FastTree can be used to build reference trees instead of RAxML by invoking "fast-mode" with the flag '--fast' in treesapp create. In practice, we haven't observed a drastic decrease in classification performance between RAxML and FastTree so its completely okay to use it in our opinion. FastTree can be installed using conda or by following the installation instructions at http://microbesonline.org/fasttree/#Install.

Recommended version: 2.1.10 Double precision

Citation: Price, M. N., Dehal, P. S., & Arkin, A. P. (2010). FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE, 5(3), e9490. https://doi.org/10.1371/journal.pone.0009490

MAFFT

MAFFT multiple alignment software is only required for creating and updating reference packages (treesapp create and treesapp update, respectively); it is not a part of the classification workflow. Therefore, feel free to skip installing MAFFT unless you plan on doing either one of those tasks. If not, here is the MAFFT webpage. Download and installation instructions are available from there.

Recommended version: 7.471

Citation: Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution, 30(4), 772–780. https://doi.org/10.1093/molbev/mst010

MMSeqs2

TreeSAPP uses MMSeqs2, specifically the linclust algorithm, for clustering sequences while building and training reference packages. It can be installed by downloading a statically compiled binary, Homebrew or conda. Details are here.

Recommended version: >=12-113e3

Citation: Steinegger, M., Söding, J. Clustering huge protein sequence sets in linear time. Nature Communications 9, 2542 (2018). https://doi.org/10.1038/s41467-018-04964-5

OD-Seq

OD-Seq is used for detecting mis-annotated or "outliers" in multiple sequence alignments when building new reference packages. Source files can be downloaded from the University College Dublin's website using this link. It can be compiled by either make all or in isolation with make odseq.

Recommended version: 1.0

Citation: Jehl, P., Sievers, F., & Higgins, D. G. (2015). OD-seq: Outlier detection in multiple sequence alignments. BMC Bioinformatics, 16(1), 1–11. https://doi.org/10.1186/s12859-015-0702-1

BWA

BWA MEM is used for mapping short reads to classified DNA open reading frames (ORFs) if ORF prediction was performed and the '--rpkm' flag was used in treesapp assign or treesapp abundance was called. BWA can be installed using conda or by following the instructions on the GitHub page.

Recommended version: 0.7.17

Citation: Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ArXiv Preprint ArXiv, 00(00), 3. https://doi.org/arXiv:1303.3997

Finishing up

I hope that wasn't too painful. If you think you have installed everything, try running treesapp info! It will check for the required executables up front and you will be quickly notified if some are missing or TreeSAPP is unable to find them.