linux tools
Any linux distribution comes with a set of linux tools for the command line.
In bioinformatics (and data science in general), especially the line by line tools for plain text files are very useful.
Thus, we will introduce a number of these tools here and provide links for useful tutorials and documentation for the more complex ones.
However, you can usually get detailed documentation for any of these tools on the command-line.
Simply write man
followed by the tool's name to get its full manual, for example man head
.
You can scroll up and down and leave the manual with q
to drop back to the command-line (controls are the same as the command-line text file viewer less
).
common command-line arguments
Most command-line tools have certain standard command-line arguments, that usually come in a short (one dash -
and just one letter) and a long version (two dashes --
and a full word):
-h
/--help
: display a help message--version
: display the version of the tool that you have installed--verbose
: provide more verbose output, oftentimes useful for debugging problems
head
and tail
head -n 5 myfile.txt
will display the first five lines of the plain text file myfile.txt
.
tail -n 5 myfile.txt
will display the last five lines of the plain text file myfile.txt
.
less
: quick look at text files
less
is a tool for quickly looking at text files on the command line.
Conjure it up with less myfile.txt
, scroll up and down with the arrow and page up/down keys, search for something like thisword
with /thisword
(followed by enter) and quit by hitting q
.
grep
: line by line plain text file searching
grep
allows you to search for strings and regular expressions in text files.
It will return any line containing a search string.
For example, grep "gene" myfile.txt
would return any line containing the string gene
in myfile.txt
.
sed
: line by line plain text file editing
sed
is a great tool to work on plain-text files line by line.
For example, you can easily search and replace patterns using regular expressions.
awk
: line by line editing of tabular plain text files
awk
is a great tool to work on plain-text tabular files (for example tab-separated or comma-separated files, .tsv
and .csv
files, respectively).
It provides you with direct accessors to individual columns (for example $1
for the first column) and with lots of functionality to work with them.
further resources
- useful bioinformatics one-liners: these mostly only use basic linux-tools to achieve common bioinformatics tasks