sequencing data (fastq)

These are tools for maniupalting fasta and fastq files. For tools that can programmatically download fasta and fastq` files from repositories, see the section on data download.

general purpose (swiss army knife) tools

seqkit has a lot of subcommands for most of the operations you might ever want to do on fastq or fasta files. It is really quick and well-documented.

Splitting FASTQ data

fastqsplitter

Fastqsplitter is a tool that allows you to split FASTQ files into a desired number of output files evenly.

Installation of fastqsplitter via conda:

mamba create -n fastqsplitter -c bioconda fastqsplitter

Usage of fastqsplitter:

fastqsplitter -i input.fastq.gz -o output_1.fastq.gz -o output_2.fastq.gz -o output_3.fastq.gz

In the example above, fastqsplitter divides the input fastq into 3 evenly distributed fastq files through inferring this number from the outputs that are specified by the user.

Filtering FASTQ data

The grepq tool enables filtering of FASTQ files via regular expressions.