Package management with micromamba
A data analysis does not solely comprise of your configuration (for e.g. datavzrd) or programming code, but also of all the software packages needed to execute it. Outside of data analysis or programming, people will know the usual ways of installing software via software stores/centers, system wide package managers, or by installing manually downloaded packages. Deploying a software stack for a data analysis this way is not recommended.
It is not reproducible, as the software versions are not fixed.
It is not portable, as the software stack might not be compatible with other systems.
It is, as a manual process, not scalable.
It is not isolated, as the software would be installed system wide, being affected by future installations or updates.
To solve these issues, we use package managers. A package manager is a tool that automates the process of installing, updating, configuring, and removing software packages. In science, one of the most popular package management systems is the Conda ecosystem. There are various frontends for Conda. Here, we use the micromamba package manager.
Step 1: Installing micromamba
To install micromamba, you can use the following command inside your VSCode terminal (on Windows, issue this from a WSL session, cf. The development environment):
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)
Afterwards, open a new terminal session.
Step 2: Installing a software stack
The central feature of micromamba is the ability to create isolated software environments. This even means that multiple versions of the same software can be present on a system, in different environments.
We create a new environment that contains Datavzrd, a software for visualizing tables that we need in the next part of the course:
micromamba create -n test -c conda-forge datavzrd
This command creates a new environment named test
and installs the package datavzrd
from the conda-forge
channel.
Step 3: Activating the environment
To activate the environment, use the following command:
micromamba activate test
Now you are inside the environment and can use any software installed within the environment via the terminal, e.g.:
datavzrd --help
Step 4: Deactivating the environment
To deactivate the environment again, use the following command:
micromamba deactivate
Step 5: Reproducible environments
The problem with a manual command like the one above is that it is not reproducible. First, only the terminal history preserves what you did to create the environment. Second, only the environment name is preserved, not the exact software versions. The latter are just the latest available ones at the time of environment creation. While this can be fine in certain cases, it is certainly not when you want to ensure reproducibility and document what you actually did to generate a certain result.
To solve this, we can create a file that contains the exact software versions, called an environment file.
We do that for the example from before.
Create a folder envs
that holds micromamba environment files related to this course, and in there create the file datavzrd.yaml
with the following content:
channels:
- conda-forge
dependencies:
- datavzrd =2.41.0
Explanation
The first section specifies the channel, the second section specifies the software tools or libraries to install including their versions.
Now, you can create the environment from this file:
micromamba env create -f datavzrd.yaml -n datavzrd
Exercise
The environment test
from step 2 is no longer needed.
Find out how using micromamba --help
and remove it.