10 Python Environments: conda
conda is an open source package
and environment
management system for any programming language; though it is most popular in the Python community. conda
was originally developed by Anaconda Inc. and bundled with their Anaconda distribution of Python. However, conda
’s widespread popularity and utility led to its decoupling into its own package.
It is now available for installation via: - Anaconda Python distribution - Miniconda Python distribution (this is what we recommended most of you install for this course) - Miniforge (this is what we recommended folks with Mac ARM machines install for this course)
Conda builds of R and Python packages, are in fact R and Python packages and built from R and Python source code, but they are packaged up and built differently, and with a different tool chain. How to create conda
packages from R and Python source code is beyond the scope of this course. However, we direct keen learners of this topic to the documentation on how to do this: - Conda-build documentation
What we will focus on learning is how to use conda
to create virtual environments, record the components of the virtual environment and share the virtual environment with collaborators in a way that they can recreate it on their computer.
10.0.1 Managing Conda
Let’s first start by checking if conda is installed (it should be if we followed the recommended course computer setup instructions) by running:
conda --version
To see which conda commands are available, type conda --help
. To see the full documentation for any command of these commands, type the command followed by --help
. For example, to learn about the conda update command:
conda update --help
Let’s update our conda to the latest version. Note that you might already have the latest version since we downloaded it recently.
conda update conda
You will see some information about what there is to update and be asked if you want to confirm. The default choice is indicated with []
, and you can press Enter to accept it. It would look similar to this:
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ....
.Solving package specifications: .........
Package plan for installation in environment //anaconda:
The following packages will be downloaded:
package | build
---------------------------|-----------------
conda-env-2.6.0 | 0 601 B
ruamel_yaml-0.11.14 | py27_0 184 KB
conda-4.2.12 | py27_0 376 KB
------------------------------------------------------------
Total: 560 KB
The following NEW packages will be INSTALLED:
ruamel_yaml: 0.11.14-py27_0
The following packages will be UPDATED:
conda: 4.0.7-py27_0 --> 4.2.12-py27_0
conda-env: 2.4.5-py27_0 --> 2.6.0-0
python: 2.7.11-0 --> 2.7.12-1
sqlite: 3.9.2-0 --> 3.13.0-0
Proceed ([y]/n)? y
Fetching packages ...
conda-env-2.6. 100% |################################| Time: 0:00:00 360.78 kB/s
ruamel_yaml-0. 100% |################################| Time: 0:00:00 5.53 MB/s
conda-4.2.12-p 100% |################################| Time: 0:00:00 5.84 MB/s
Extracting packages ...
[ COMPLETE ]|###################################################| 100%
Unlinking packages ...
[ COMPLETE ]|###################################################| 100%
Linking packages ...
[ COMPLETE ]|###################################################| 100%
In this case, conda itself needed to be updated, and along with this update some dependencies also needed to be updated. There is also a NEW package that was INSTALLED in order to update conda. You don’t need to worry about remembering to update conda, it will let you know if it is out of date when you are installing new packages.
10.0.2 Managing conda
environments
What is a conda environment and why is it so useful?
Using conda
, you can create an isolated R or Python virtual environment for your project. The default environment is the base
environment, which contains only the essential packages from Miniconda (and anything else you have installed in it since installing Miniconda). You can see that your shell’s prompt string is prefaced with (base)
when you are inside this environment:
(base) Helps-MacBook-Pro:~ tiffany$
In the computer setup guide, we asked you to follow instructions so that this environment will be activated by default every time you open your terminal.
To create another environment on your computer, that is isolated from the (base)
environment you can either do this through:
- Manual specifications of packages.
- An environment file in YAML format (
environment.yml
).
We will now discuss both, as they are both relevant workflows for data science. When do you use one versus the other? I typically use the manual specifications of packages when I am creating a new data science project. From that I generate an environment file in YAML format that I can share with collaborators (or anyone else who wants to reproduce my work). Thus, I use an environment file in YAML format when I join a project as a collaborator and I need to use the same environment that has been previously used for that project, or when I want to reproduce someone else’s work.
10.0.3 Creating environment by manually specifying packages
We can create test_env
conda environment by typing conda -n <name-of-env>
. However, it is often useful to specify more than just the name of the environment, e.g. the channel from which to install packages, the Python version, and a list of packages to install into the new env. In the example below, I am creating the test_env
environment that uses python 3.12 and a list of libraries: jupyterlab
and pandas
.
conda create -n test_env -c conda-forge python=3.12 jupyterlab pandas=2.2.3
conda will solve any dependencies between the packages like before and create a new environment with those packages. Usually, we don’t need to specify the channel, but in this case I want to get the very latest version of these packages, and they are made available in conda-forge
before they reach the default conda channel.
To activate this new environment, you can type conda activate test_env
(and conda deactivate
for deactivating). Since you will do this often, we created an alias shortcut ca
that you can use to activate environments. To know the current environment that you’re in you can look at the prefix of the prompt string in your shell which now changed to (test_env
). And to see all your environments, you can type conda env list
.
10.0.4 Seeing what packages are available in an environment
We will now check packages that are available to us. The command below will list all the packages in an environment, in this case test_env
. The list will include versions of each package, the specific build, and the channel that the package was downloaded from. conda list
is also useful to ensure that you have installed the packages that you desire.
conda list
# packages in environment at //miniconda/envs/test_env:
#
Using Anaconda Cloud api site https://api.anaconda.org
blas 1.1 openblas conda-forge
ca-certificates 2016.9.26 0 conda-forge
certifi 2016.9.26 py27_0 conda-forge
cycler 0.10.0 py27_0 conda-forge
freetype 2.6.3 1 conda-forge
functools32 3.2.3.2 py27_1 conda-forge
libgfortran 3.0.0 0 conda-forge
10.0.5 Installing conda package
Under the name column of the result in the terminal or the package column in the Anaconda Cloud listing, shows the necessary information to install the package. e.g. conda-forge/rasterio. The first word list the channel that this package is from and the second part shows the name of the package.
To install the latest version available within the channel, do not specify in the install command. We will install version 0.35 of rasterio
from conda-forge into test_env
in this example. Conda will also automatically install the dependencies for this package.
conda install -c conda-forge rasterio=0.35
If you have a few trusted channels that you prefer to use, you can pre-configure these so that everytime you are creating an environment, you won’t need to explicitly declare the channel.
conda config --add channels conda-forge
10.0.6 Removing a conda package
We decided that rasterio is not needed in this tutorial, so we will remove it from test_env
. Note that this will remove the main package rasterio and its dependencies (unless a dependency was installed explicitly at an earlier point in time or is required be another package).
conda remove -n test_env rasterio
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata .........
Solving package specifications: ..........
Package plan for package removal in environment //anaconda/envs/test_env:
The following packages will be REMOVED:
rasterio: 0.35.1-np111py27_1 conda-forge
Proceed ([y]/n)? y
Unlinking packages ...
[ COMPLETE ]|#######################################################################################################| 100%
10.0.8 Creating environment from an environment file
Now, let’s install environment.yml
environment file above so that we can create a conda environment called test_env
.
$ conda env create --file environment.yml
Create an environment on your laptop with an older version of Python!
Clone this GitHub repository.
Try to run some antiquated (Python 3.0.0 and higher compatible) Python code, such as
python -c "print 'Back from the Future'"
. This should fail.In the terminal, navigate to the root of the repository and run:
conda env create --file environment.yml
(note: If you have a M1/M2 Mac, please run this command instead: CONDA_SUBDIR=osx-64 conda env create –file environment.yml).Activate the environment by typing
conda activate oldie_but_a_goodie
Try again to run some antiquated (Python 3.0.0 and higher compatible) Python code, such as
python -c "print 'Back from the Future'"
. If you activated your environment correctly, this should succeed!
10.0.9 Copying an environment
We can make an exact copy of an environment to an environment with a different name. This maybe useful for any testing versus live environments or different Python 3.7 versions for the same packages. In this example, test_env
is cloned to create live_env
.
conda create --name live_env --clone test_env
10.0.10 Listing all environments on your laptop
You may have created an environment and forgotten what you named it, or you want to do some cleanup and delete old environments (next topic), and so you want to see which exist on your computer and remove the ones you are no longer using. To do this we will use the info
command along with the --envs
flag/option.
conda info --envs
Listing all the conda
environments on your laptop with this command also shows you where conda
stores these environments. Typically conda
environments are stored in /Users/<USERNAME>/opt/miniconda3/envs
. This means that conda
environments are typically in your terminal’s path, resulting in the environments being accessible from any directory on your computer, regardless of where they were created. However, despite this flexibility, commonly one environment is created per project, and the environment.yml
file that is used for sharing the conda
environment is stored in the project root.
10.0.11 Deleting an environment
Since we are only testing out our environment, we will delete live_env
to remove some clutter. Make sure that you are not currently using live_env
.
conda env remove -n live_env
10.0.12 Making environments work well with JupyterLab
In brief, you need to install a kernel in the new conda environment in any new environment your create (ipykernel
for Python and the r-irkernel
package for R), and the nb_conda_kernels
package needs to be installed in the environment where JupyterLab is installed.
By default, JupyterLab only sees the conda environment where it is installed. Since it is quite annoying to install JupyterLab and its extensions separately in each environment, there is a package called nb_conda_kernels
that makes it possible to have a single installation of JupyterLab access kernels in other conda environments. This package needs to be installed in the conda environment where JupyterLab is installed. For the computer setup for this course, we did that in the base
environment, so that is where you would need to install nb_conda_kernels
to make this work.
More details are available in the nb_conda_kernels README).
Remember that when you forget a specific command you can type in the help command we have created mds-help
in you terminal to see a list of all commands we use in MDS.