sequencing data (fastq)
These are tools for maniupalting fasta
and fastq
files.
For tools that can programmatically download fasta
and fastq` files from repositories, see the section on data download.
general purpose (swiss army knife) tools
seqkit has a lot of subcommands for most of the operations you might ever want to do on fastq
or fasta
files.
It is really quick and well-documented.
Splitting FASTQ data
fastqsplitter
Fastqsplitter is a tool that allows you to split FASTQ files into a desired number of output files evenly.
Installation of fastqsplitter via conda:
mamba create -n fastqsplitter -c bioconda fastqsplitter
Usage of fastqsplitter:
fastqsplitter -i input.fastq.gz -o output_1.fastq.gz -o output_2.fastq.gz -o output_3.fastq.gz
In the example above, fastqsplitter divides the input fastq into 3 evenly distributed fastq files through inferring this number from the outputs that are specified by the user.
Filtering FASTQ data
The grepq tool enables filtering of FASTQ files via regular expressions.