Skip to content

Latest commit

 

History

History
313 lines (218 loc) · 18 KB

CHANGELOG.md

File metadata and controls

313 lines (218 loc) · 18 KB

wf-paired-end-illumina-assembly: Changelog

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

v3.0.0 - November 15, 2024

Added

  • Consistent metrics reported for each read cleaning step (@chrisgulvik)
  • Added SeqFu for FastQ format validation (@chrisgulvik)
  • Checksum (SHA-512) reporting of intermediate and output files (@chrisgulvik)
  • Report full input paths for each sample (@chrisgulvik)
  • For assembly depth reporting, added stdev depth metrics; added total paired+single mapped stats (@chrisgulvik)

Changed

  • Default uses SeqKit rather than SeqTk for downsampling (@chrisgulvik)
  • Output structure and filenames revised (@chrisgulvik)
  • For MLST, exclude all MLST databases with a *_ by default (> 1) to ensure the original MLST database version is used for each taxon (e.g., excludes leptospira_2 and leptospira_3) and avoids inconsistent versions used within a run which would occasionally give one sample a leptospira and a different sample leptospira_3 making it impossible to immediately compare between samples. (@chrisgulvik)
  • For MLST, store novel FastA when that situation occurs (@chrisgulvik)
  • Sample name in outputs and file content no longer contains assembler name (@chrisgulvik)
  • Changed RDP output to exclude unneccesary data columns such as "Phylum\nphylum", "Genus\ngenus" (@chrisgulvik)
  • Use both R1 and R2 and only Phred30 for estimate bp input for more accurate estimation of genome size (@chrisgulvik)
  • Changed default to always on to store stats and FastA of discarded contigs during biopython filtering (@chrisgulvik)
  • Output filenames within pipeline_info/ changed to show month by name and include day of the week (@chrisgulvik)

Fixed

  • Order of operations in Trimmomatic process now ensures final output reads have minimum sequence length (default: 50 bp) (@chrisgulvik)
  • Fixed issue with missing column header names in the .kraken_summary.tsv output files (@chrisgulvik)
  • Fixed trailing tab character in Kraken1 and Kraken2 output TSV summaries, which made pandas XLSX conversion fail due to different column numbers in header and data (@chrisgulvik)
  • Fixed VERSION reporting RDP bug by removing spaces (@chrisgulvik)

Updated

  • Coloring of workflow process now corresponds to tab color in XLSX output summary sheet (@chrisgulvik)
  • Docker container version updates (@chrisgulvik)
  • Updated description on output files based on new files created as well as some renamed output files (@chrisgulvik)

Deprecated

  • Removed gene calling from QUAST output summary (@chrisgulvik)

v2.4.0 - August 28, 2024

Added

  • Statistics output files during the downsampling routine are all stored as output files under their CleanedReads// (@chrisgulvik)
  • FastQ outputs for individual steps are now saved (perhaps temporary and to a non-default option) (@chrisgulvik)
  • SeqKit downsampling option (already had Seqtk) (@chrisgulvik)
  • Use of a default seed value for both SeqKit and Seqtk subsampling (@chrisgulvik)

Fixed

  • RDP Classifier always failed once before succeeding on retry, so a higher RAM request was given as label to make it succeed on first attempt for speed (@chrisgulvik)
  • RDP Classifier data output are not tab-delimited instead of space-delimited (@chrisgulvik)

Updated

  • TSV output data files have header names with no spaces, all underscores replaced them (@chrisgulvik)

Deprecated

v2.3.1 - August 23, 2024

Added

Fixed

  • #92 Refactored profiles to align with Seqera Tower requirements and added a Tower yaml for reporting (@slsevilla).
  • #88 Resolved issue where SPAdes fails when a non-integer is used with the memory parameter (@gregorysprenger).
  • #90 Add SKESA parameters to main params.config file to avoid null values (@gregorysprenger).
  • #88 Resolved issue where SPAdes fails when a non-integer is used with the memory parameter (@gregorysprenger).

Updated

Deprecated

v2.3.0 - March 20, 2024

Added

  • #82 Added CAT and CheckM2 to assess the assembly file formed (@chrisgulvik and @gregorysprenger).
  • #83 Added fastp as an alternative to Trimmomatic (@chrisgulvik and @gregorysprenger).

Fixed

Updated

Deprecated

v2.2.1 - February 16, 2024

Added

  • #73 Check if input FastQ files are corrupted before proceeding to downstream analyses (@gregorysprenger).

Fixed

  • #74 Fix emit statements to catch output files from modules (@gregorysprenger).

Updated

  • #69 Replace baseDir with projectDir due to deprecation in latest version of nextflow (@gregorysprenger).
  • #71 Updated path to Kraken1 database in bash wrapper script (@gregorysprenger).
  • #77 Set max resources for release.yml GitHub action (@gregorysprenger).

Deprecated

v2.2.0 - February 14, 2024

Added

  • #37 Allow customization of the MLST module, such as specifying schemes to include/exclude and minimum thresholds (@gregorysprenger).
  • #52 Concatenate kraken (1 and 2) output summaries and place into the output Summaries directory (@gregorysprenger).
  • #47 Added RDP Classifier as another tool to classify 16S ribosomal RNA (@taylorpaisie).
  • #51 Added header to BLAST output file before it is compressed (@gregorysprenger).
  • #35 Added parameters to customize Trimmomatic parameters (@gregorysprenger).
  • #61 Summarize all QA outputs and place into Summaries output directory, as well as able to convert TSV files to XLSX and a final excel report workbook (@gregorysprenger).

Fixed

  • #50 Host removal now runs all samples, QC file checks are implemented for host removal, fixed minimum file size param for host removal, and fixed kraken2 database issue (@gregorysprenger).
  • #67 Added configuration file for gitleaks so that FastA files are ignored when checking for security tokens with MegaLinter (@gregorysprenger).

Updated

  • #55 Replace linting GitHub workflows with MegaLinter to allow for more linters to be used, autofix issues and commit, etc. (@gregorysprenger).
  • #45 Updated output documentation to allow for better readability and interpretation (@gregorysprenger).
  • #56 Change how images are displayed based on light and dark themes due to GitHub changes (@gregorysprenger).
  • #62 Ignore warnings for RDP parameters (@taylorpaisie).

Deprecated

v2.1.0 - January 25, 2024

Added

  • #23 Added a miscellanous issue template (@taylorpaisie).
  • #25 Fix broken repository links (@gregorysprenger).
  • #24 Update .nfcore.yml file to ignore specific nf-core file checks that do not apply to this workflow (@gregorysprenger)
  • #41 Added ability to run hostile in remove_host_hostile channel (@chrisgulvik)
  • #41 Added ability to run NCBI's SRA Human Scrubber (scrub.sh) in remove_host_sra_human_scrubber channel (@chrisgulvik)
    • Added ability to remove the broken sister reads from scrub.sh in the remove_broken_pairs_bbtools_repair channel
    • Added new prepare_db_sra_human_scrubber channel for future proofing in case a DB file retreived eventually gets gunzip compressed
    • Added new update_db_sra_human_scrubber channel to fetch the human_filter.db file and avoid requiring it inside the container environment
  • #41 Added 2 new test config files to locally run the miniburk test data with and without host removal (@chrisgulvik)
  • #41 Added a new subworkflow for human host removal subworkflows/host_removal.nf (@chrisgulvik)

Fixed

  • #26 Updated database preparation modules to use meta.id to correctly rename .command.{out,err} files in the log directory (@gregorysprenger).
  • #30 Fixed the the parsing of PhiX removal information from BBDuk to be added to the summary files (@gregorysprenger).
  • #31 Removed the possibility of scientific notation in Summary.CleanedReads-Bases.tab file (@gregorysprenger).
  • #30 Fixed the the parsing of PhiX removal information from BBDuk to be added to the summary files (@gregorysprenger).

Updated

  • #29 Updated nf-core GTDB-Tk and BUSCO modules to latest release and added mash_db parameter for GTDB-Tk (@gregorysprenger).
  • #36 Updated readme and markdown files in docs/, as well as separate skea and spades options in nextflow schema file (@gregorysprenger).
  • #41 Updated usage to include the new --host_remove {both,hostile,skip,sra_human_scrubber}. Default uses "both" SRA Human Scrubber first and then hostile, but options also exist to "skip" host removal entirely, or invoke just one as "hostile" or "sra_human_scrubber" (@chrisgulvik)
  • #41 Updated workflows/assembly.nf to run the host removal subworkflow after infile handling and prior to downsampling (@chrisgulvik)

Deprecated

v2.0.4 - December 8, 2023

Added

  • Added ability to use nf-core lint on pipeline.
  • Validation parameters added to params.config for nf-core lint.
  • Removal of duplicate entries in errors.tsv when using the bash runner scripts.
  • Process name is now displayed when QC failures occur.
  • Assembler name is appended to sample name in QC filechecks.

Fixed

  • Barrnap module now receives all biopython outputs.

Updated

  • Renamed container cache parameter and added it to conda, docker, and singularity profiles.
  • Added missing parameters to nextflow_schema.json.
  • Removed system.exit(1) from extract.record.from.genbank.py script.
  • Updated error catching of biopython, barrnap, and QC failure that are added to errors.tsv when using bash runner scripts.
  • Use exit 1 instead of exit 2 in barrnap module.
  • Minimum nextflow version in github actions now matches the minimum version in nextflow.config.
  • Markdown files now use :::tip or :::note for tips and notes.
  • Removed genbank file input for barrnap module as it is not used.
  • Updated .nf-core.yml file to ignore certain lints when using nf-core lint.
  • Replaced gregorysprenger workflow prefixes with bacterial-genomics.

Deprecated

  • "Bigdata" parameter is no longer used as resources are different for everyone - create a custom config instead.

v2.0.3 - November 27, 2023

Added

  • Catching of errors in bash runner scripts for BBDuk and SPAdes.
  • Add sample name and assembler name to outputs.
  • Added and sorted error codes in bash runner script to ignore if process is resubmitted and completes.
  • Allow output of "fastq" or "fq" files from subsampled reads module.
  • Within rosalind_hpc.config, resubmit SPAdes if the exit code is anything other than zero.

Fixed

  • Fixed parsing of 16S BLAST database input so that results have species names.
  • Quast and Kraken1 software versions information is now parsed correctly.
  • Bash runner scripts are updated to correctly parse run information for errors.
  • Fixed publishDir for SPAdes and Kraken1 so that output files are placed into outdir.
  • Concatenation of QC filechecks to create one large summary.
  • Removal of input reads if reads are subsampled to avoid original and subsampled reads to be passed to downstream processes.

Updated

  • Ignore header when counting number of lines in genome coverage summary file within bash runner script.
  • Changed number of retries for SPAdes in rosalind_hpc.config to 5.
  • Removed non-working loop in SPAdes module as the process should be resubmitted to abide by resource constraints.
  • Changed rosalind_hpc.config high memory parameter and allow memory to be increased for resubmissions.
  • Consolidated input and outputs for each module.
  • Updated QC filecheck function to remove failed inputs from downstream processes without terminating entire workflow.
  • Parsed out meta information when collecting outputfiles to create a concatenated summary file.
  • Added the addition of headers to QC filechecks within each module.
  • Dropped forced reformatting of FastQ files within BBDuk module.

Deprecated

v2.0.2 - November 16, 2023

Added

Fixed

  • Added collect operators to database/reference channels to allow for all inputs to be analyzed for each process
  • Removed extra slashes '/' from run_assembly.uge-nextflow scripts
  • Updated paths for summary email when using run_assembly.uge-nextflow scripts

Updated

Deprecated

v2.0.1 - November 16, 2023

Added

Fixed

  • Made bin directory executable when added to github so Nextflow can use them
  • Changed log directory to pipeline_info directory in run_assembly.uge-nextflow scripts
  • Removed --gtdb_db from run_assembly.uge-nextflow scripts until GTDB-Tk module version is updated

Updated

Deprecated

v2.0.0 - November 15, 2023

Added

  • nf-core Styling
  • Allow samplesheet (XLSX, CSV, TSV) OR directory as input
  • Use SKESA instead of SPAdes for assembling contigs
  • Downsampling of input FastQ files
  • Contig taxonomic classification using GTDB-Tk
  • Intra-contig gene information using BUSCO
  • Ability to use a BUSCO config file
  • Always reformat FastQ files before removing PhiX using BBDuk
  • Database prep modules to decompress and extract databases
  • Function to check QC File Checks if they fail and place them in the QC log directory to remove the hassle of passing these files to each subsequent module
  • Comments in input/output channels in subworkflows to make it easier to reference for future changes/additions
  • Added "-<assembler>" to outputs after contigs are assembled
  • Added headers to each output file
  • Option to change the "mode" of SPAdes (ie. isolate, meta, etc.)
  • Option to use different k-mer sizes for SPAdes
  • Added evalue similarity cut-off to Prokka
  • Ability to use a curated protein file along with Prokka's database

Fixed

  • Errors in run_assembly.uge-nextflow scripts
  • Handle "lock" issues when using the same work directory with run_assembly.uge-nextflow scripts
  • Remove duplicate rows and junk information in errors.tsv if workflow fails due to a locked session
  • Updated "INFO" messages to be more readable

Updated

  • Docker containers to ensure they all have built in tests and removed built-in databases
  • More comments and whitespace to separate information in conf/params.config
  • Separated kraken1 and kraken2 modules
  • Use channels for databases instead of passing params
  • Use downloadable database .tar.gz files instead of using built-in databases in docker images
  • Converted spades and skesa workflows into one workflow and use subworkflows "downsampling.nf" and "assemble_contigs.nf"
  • Renamed assemble*{spades,skesa} to assemble_contigs*{spades,skesa}
  • CHANGED OUTPUT FILE STRUCTURE: Output files are generally under the name of the tool that produced them
  • Updated docs/output.md to reflect new file structure
  • Removed QC file checking from modules and use checkQCFileChecks function in workflows/assembly.nf and subworkflows/assemble_contigs.nf
  • Renamed emit names to hopefully clarify what each output means
  • Moved all publishDir information to conf/modules.config
  • Set filter contig length to 1000
  • Place .command.{out,err} on one line for each module
  • Added content section to main README.md to allow for easier navigation
  • Added BUSCO and GTDB-Tk references to CITATIONS.md
  • Parsing of BBDuk output to add more information to a PhiX Summary file
  • Fixed github actions to properly run

Deprecated

v1.0.0 - January 20, 2023

Initial release of wf-paired-end-illumina-assembly.