Development Environment Set-up

Install poetry

First we need to download poetry by using the official Poetry installer.:

$ curl -sSL https://install.python-poetry.org | python -
$ poetry --version

Note

If poetry version is not accessible, check your PATH and ensure poetry’s install location is findable. If default python version is not 3.8, then follow these instructions.

Create a virtual environment (conda)

We are going to create a conda virtual environment to store the development environment. If you need to install conda see the Miniconda website. We recomend installing conda locally:

$ wget https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-Linux-x86_64.sh
$ bash Miniconda3-py38_4.12.0-Linux-x86_64.sh -p miniconda3 -b
$ echo "export PATH=$(pwd)/miniconda3/bin:$PATH" >> ~/.bashrc
$ source ~/.bashrc
$ conda init

Restart you terminal.

Next you need to setup three channels in your conda config by running the following:

$ export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
$ conda update -n base -c defaults conda
$ conda install -n base -c conda-forge mamba
$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge

Next, To create our conda environment run:

$ conda create -n cgr-dev python=3.8 -y
$ conda activate cgr-dev  # This activates the virtual environment

Note

psutil and pysam are also runtime requirements. If you have gcc installed, they should be automatically built during the installation step below. However, if you have problems you may want to install them with conda activate cgr-dev && conda install psutil pysam.

Download GwasQcPipeline

Download the development version of GwasQcPipeline.:

$ git clone --recursive https://github.com/NCI-CGR/GwasQcPipeline.git
$ cd GwasQcPipeline

Note

Our test data (tests/data) is stored in a separate git repository. This repository is embedded as a git submodule. The --recursive tells git to go ahead and download tests/data.

Install dependencies and GwasQcPipeline (poetry)

This project uses poetry as a package manager and build tool. Poetry is a modern python build tool that uses the pyproject.toml format to track dependencies and build settings.

To install all runtime/development dependencies and GwasQcPipeline itself run:

$ conda activate cgr-dev      # Make sure we are in our conda environment
$ poetry env use /PATH/TO/miniconda3/envs/cgr-dev/bin/python # Enable poetry to manage your conda environment
$ poetry config virtualenvs.path /PATH/TO/miniconda3/envs/cgr-dev #This needs to be full path
$ poetry env info # This should show that both system and virtual env python is 3.8 and that the venv is conda
$ poetry install              # Install development and runtime dependencies
$ cgr version

Now lets make sure everything is working:

$ cgr --help                  # This is our main entry point to running the workflow
$ make -C docs html           # This will build documentation into docs/_build/html
$ poetry run pytest -vvv      # This will run the test suite

The main reason we are using poetry is because it makes building python packages easy. In order to match the new config.yaml version line with the new QwasQcPipeline version edit and update the version = “1.5.1” line in the pyproject.toml file:

$ poetry build                # Build artifacts are in ./dist

Once the changes are pushed to Github, tag the new version for release. While Github is building the new release, a drop and drag box will appear for additional assets. Add the new cgr_gwas_qc-X.X.X.tar.gz and cgr_gwas_qc-X.X.X-py3-none-any.whl files to the box.

Install pre-commit hooks for consistent development

There are a number of tools out there to make coding cleaner and more consistent. For example, there are code formatters (i.e., black, isort, snakefmt), code linters (i.e., flake8, rstcheck), type checkers (i.e., mypy). These tools also help catch small mistakes. This repository has a set of git pre-commit hooks (.pre-commit-config.yaml) that will run a suite of tools at each commit. This helps keep issues from making it into the code base.

There is a one-time install that you need to setup in your local version of GwasQcPipeline:

$ pre-commit install             # Installs the hooks
$ pre-commit run                 # Make sure everything is working

Note

The first time you run pre-commit it needs to download and setup virtual environments for each tool. This may take a few minutes.

Note

Tools are only run on files with changes, if this is a fresh clone of the repository then all tools will be skipped.

Note

Now, every time you commit files, it will run the required set of tools for the staged files. If an auto formatter detects a problem, it will make the changes, but you will have to re-stage that file. This will slow down making commits, but I find the benefits out weight the inconvenience.

Warning

Sometimes pre-commit will keep calling something a problem that you want to ignore. For example, codespell tends to interpret this "\nNumber " as a spelling error even thought it is really a formatting thing. You can skip running all pre-commit hooks using git commit --no-verify. However, make sure it is absolutely necessary!