tools for downloading from SRA, ENA, NCBI and Ensembl
fastq-dl
fastq-dl lets you download FASTQ data from the European Nucleotide Archive or the Sequence Read Archive repositories.
It allows you to use any ENA/SRA accession number, for a Study
, a Sample
, an Experiment
, or a Run
.
ncbi-datasets: gathering data across NCBI databases
ncbi-datasets is a very-well documented open source tool from NCBI (National Center for Biotechnology Information) for gathering various kinds of data including genes, genomes, annotations etc. belonging to the desired species. As a side note, it has specific options for retrieving SARS-CoV-2 data.
Installation
It is offered as a conda package and can be installed via:
conda create -n ncbi-download -c conda-forge ncbi-datasets-cli
Example usage
Retrieval of human reference genome:
datasets download genome taxon human --reference --filename human-reference.zip
Retrieval of genomic assemblies for a specific SARS-CoV-2 lineage with the host being homo sapiens:
datasets download virus genome taxon SARS-CoV-2 --complete-only --filename B.1.525.zip --lineage B.1.525 --host "homo sapiens"
A very nice illustration that shows the overall structure of data download in the ncbi-datasets repositories documentation .