- Use virtual environment to manage dependencies / libraries
- Use requirements.txt to store list of dependencies
- "pip freeze > requirements.txt" to create requirements.txt file with needed libraries
- "pip install -r requirements.txt" to install the libraries in your project environment
- Version control should always be used to manage source code, including trained models, scripts, training data and project configurations. Aalto uses version.aalto.fi Git version control system
- Jupyter Lab / Notebook is used for model development during experimentation phase. Final deployable model files should be converted to Python files (.py)
Install Python environment
Install Anaconda distribution which includes most of the libraries needed for data science work. "The open-source Anaconda Distribution is the easiest way to perform Python/R data science and machine learning on Linux, Windows, and Mac OS X. With over 15 million users worldwide, it is the industry standard for developing, testing, and training on a single machine, enabling individual data scientists to:"
Use pip tool to install additional libraries. "pip install package-name"
Installing Git version control
Aalto ITS - Machine Learning projects are stored on version.aalto.fi and can be accessed after the required permission is granted by the project owner. A Git desktop client can be used to clone the remote repositories to the local machine and thereafter make push/pull requests. Please follow the following instructions to get started with Git and connect with version.aalto.fi.
- Aalto Email ID and password can be used to log in into version.aalto.fi.
- Install Git on your local machine using the following links.
For Windows - https://gitforwindows.org/
For Mac - https://git-scm.com/download/mac
- Use the following instructions to generate a new SSH key pair and then link it to your Gitlab (version.aalto.fi) account.
- Generate a personal access token.
- Clone the remote repository after completing the aforementioned steps. The following tutorial uses Github Desktop but the steps should work for any Git desktop client.
Setting up virtual environment
Using virtual environments is important as it helps to maintain your system clean since you don’t install system-wide libraries that you are only going to need in a small project. It allows you to use a certain version of a library for one project and another version for another project: if you install the library system-wide and don’t use venv, then you can only use one version of the library
To get started with virtual environments:
Install virtual environment with pip
py -m pip install --user virtualenv
Execute "python -m venv myvirtualdirectory" to create a virtual environment under your project directory
Activate virtual environment
Go to your virtual directory: cd myvirtualdirectory
(Optionally) Execute "Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass"
Execute "Scripts\activate" to activate the virtual environment
Install packages within your virtual environment
pip install -r requirements.text (installs the libraries defined in your project requirements.txt file)
Work and develop code in your virtual environment
If you want to switch projects or otherwise leave your virtual environment, simply run:
For more information see links below:
Using virtual environment with Python notebook
In order to use Jupyter Notebook/Lab with virtual environment (and the libraries / dependencies in the virtual environment) follow these steps:
Go to your virtual environment and activate environment
Run "jupyter lab" in your virtual environment to start Jupyter Lab
See https://anbasile.github.io/programming/2017/06/25/jupyter-venv/ for more information
- Check jupyter version "jupyter --version"
- Check python version "py --version"