The development environment

There are numerous options for how to perform data analysis. A central question is which environment to use. This entails

  1. the operating system (OS),

  2. the development environment.

The operating system is crucial for ensuring reproducibility of results. Proprietary operating systems, such as Microsoft Windows or macOS, entail the risk of not being runnable on newer hardware after a while. Hence, one would have to update the OS, which can cause the analysis to break. Therefore, it is recommended to use open-source operating systems, such as Linux distributions, which are free to use and distribute. Luckily, linux is directly available from windows via the Windows Subsystem for Linux (WSL).

Choosing a suitable development environment can dramatically increase the productivity while performing data analysis. There are numerous options available. In this course, we will use Visual Studio Code (VSCode), which is one of the most popular options and offers numerous useful extensions for our tasks.

  • If you are on Windows, follow these instructions to install VSCode in combination with WSL.

  • Otherwise, just install VSCode.

  • Finally, you can also run this course in the browser, using Gitpod. For this purpose, first create an account on Github (a code hosting platform). Then, open the so-called Gitpod workspace for this course.

The VSCode window contains three main areas:

VSCode window
  1. The sidebar on the left contains the file explorer, which allows you to navigate through your files.

  2. The main area in the right center is where you edit files or notebooks.

  3. The area at the bottom right shows a so-called terminal, which allows you to run commands directly from within VSCode. It might be hidden initially, but can be opened by clicking on the “Terminal” menu and selecting “New Terminal”.

Install the following extensions in VSCode:

  • Python

  • Jupyter

  • Rainbow csv

  • indent-rainbow

  • Black Formatter