Installation • PCGR

Quick Installation

If you know what conda is, you only need to run the following commands in order to install the PCGR software requirements:

PCGR_VERSION="1.0.1"
PCGR_REPO="https://raw.githubusercontent.com/sigven/pcgr/v${PCGR_VERSION}/conda/env/lock/"
PLATFORM="linux" # or "osx"

# mamba is a much faster alternative to conda
conda install mamba -c conda-forge

mamba create --file ${PCGR_REPO}/pcgr-${PLATFORM}-64.lock --prefix ./pcgr
mamba create --file ${PCGR_REPO}/pcgrr-${PLATFORM}-64.lock --prefix ./pcgrr

# you need to specify the directory of the conda env when using --prefix
conda activate ./pcgr

# test that it works
pcgr --version

For downloading the data bundle, see STEP 1 further below.

Detailed Installation

PCGR requires a data bundle that contains the reference data, sample inputs (e.g. somatic variants in a VCF), and an output directory to output the results to.

Here’s an example scenario that will be used in the following sections:

data bundle downloaded in /Users/you/dir1/data;
sample inputs at /Users/you/dir2/pcgr_inputs;
output goes to /Users/you/dir3/pcgr_outputs (make sure this directory exists);
your PCGR codebase is installed in /Users/you/dir4/PCGR;

STEP 1: Download data bundle

Download and unpack the human assembly-specific data bundle:

grch37 data bundle - 20220203 (approx 20Gb)
grch38 data bundle - 20220203 (approx 21Gb)
Example:

GENOME="grch38" # or "grch37"
BUNDLE_VERSION="20220203"
BUNDLE="pcgr.databundle.${GENOME}.${BUNDLE_VERSION}.tgz"

wget http://insilico.hpc.uio.no/pcgr/${BUNDLE}
gzip -dc ${BUNDLE} | tar xvf -

STEP 2: Download PCGR GitHub repository

Download and unpack the latest software release from https://github.com/sigven/pcgr/releases.

Alternatively if you have git installed, you can do:

PCGR_VERSION="1.0.1"
OUTPUT_DIRECTORY="PCGR"

git clone \
    -b "v${PCGR_VERSION}" \
    --depth 1 \
    https://github.com/sigven/pcgr.git \
    "${OUTPUT_DIRECTORY}"

STEP 3: Set up Conda or Docker

Step 3 depends on if you want to use Conda or Docker:

For Conda, continue reading the PCGR Conda setup.
For Docker, skip to the PCGR Docker setup.

Option 1: Conda

a) Miniconda and Mamba

Download and install the Miniconda installer from https://docs.conda.io/en/latest/miniconda.html:

Make sure to download the Linux or MacOSX script according to which platform you’re currently on.
Run bash miniconda.sh and follow the prompts (it should be okay to accept the defaults, unless you want to choose a different installation location than the default ~/miniconda3).
Exit your current terminal session and open a new one. You should now notice something like a (base) string as a prefix in your terminal prompt. This means that you’re in the base conda environment, and you’re ready to start installing the conda environments for PCGR.

Install Mamba in this base environment, which is a very fast conda package installer.

PLATFORM="MacOSX" # or "Linux"
MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-${PLATFORM}-x86_64.sh"
wget ${MINICONDA_URL} -O miniconda.sh && chmod +x miniconda.sh
bash miniconda.sh

# exit terminal and open new one - you should now see:
(base) $
(base) $ conda install -c conda-forge mamba

(base) $ mamba --version
mamba 0.19.1
conda 4.11.0

b) Create PCGR conda environments

The conda/env/lock directory in the PCGR codebase contains two .lock files which can be used to create the required conda environments for the Python component (pcgr) and the R components (pcgrr (and cpsr)). We install the conda dependencies for these two environments in the local conda/env directory in the following example:

cd /Users/you/dir4/PCGR
PLATFORM="osx-64" # or "linux-64"
PCGR_CONDA_ENV_DIR="./conda/env"

mamba create --prefix ${PCGR_CONDA_ENV_DIR}/pcgr --file ${PCGR_CONDA_ENV_DIR}/lock/pcgr-${PLATFORM}.lock
mamba create --prefix ${PCGR_CONDA_ENV_DIR}/pcgrr --file ${PCGR_CONDA_ENV_DIR}/lock/pcgrr-${PLATFORM}.lock

## Alternatively, for installing in your central conda directory, use the following:
# mamba create --name pcgr --file ${PCGR_CONDA_ENV_DIR}/lock/pcgr-${PLATFORM}.lock
# mamba create --name pcgrr --file ${PCGR_CONDA_ENV_DIR}/lock/pcgrr-${PLATFORM}.lock

The above process takes 10-15min when installing from scratch. In the end, you can confirm your conda environments have been installed correctly (notice how the paths are different to the base env installation after using the --prefix option above):

$ (base) conda env list
# conda environments:
#
base    *  /Users/you/miniconda3
pcgr       /Users/you/dir4/PCGR/conda/env/pcgr
pcgrr      /Users/you/dir4/PCGR/conda/env/pcgrr

c) Activate pcgr conda environment

You need to activate the PCGR/conda/env/pcgr conda environment, and test that it works correctly with e.g. pcgr --version:

$ cd /Users/you/dir4/PCGR
(base) $ conda activate ./conda/env/pcgr
# note how the full path to the locally installed conda environment is now displayed

(/Users/you/dir4/PCGR) $ which pcgr
/Users/you/dir4/PCGR/conda/env/pcgr/bin/pcgr

(/Users/you/dir4/PCGR) $ pcgr --version
pcgr X.X.X

(/Users/you/dir4/PCGR) $ which pcgrr.R
/Users/you/dir4/PCGR/conda/env/pcgr/bin/pcgrr.R

You should now be all set up to run PCGR! Continue on to an example run.

Option 2: Docker

a) Install Docker

For installing Docker, follow the instructions at https://docs.docker.com/engine/install/ for your Linux or MacOSX machine. NOTE: We have not been able to perform enough testing on the Windows platform, and we have received feedback that particular versions of Docker/Windows do not work with PCGR (an example being mounting of data volumes).

Test that Docker is running, e.g. by typing docker ps or docker images in the terminal window.
Adjust the computing resources dedicated to the Docker, i.e.: Memory of minimum 5GB, CPUs minimum 4 (see e.g. how to do that on MacOSX).

b) Download PCGR Docker Image

Pull the PCGR Docker image from DockerHub (approx 5.7Gb) with: docker pull sigven/pcgr:vX.X.X

c) Run PCGR Docker Container directly (recommended) or indirectly

This next step depends on how familiar you are with working with Docker volumes (https://docs.docker.com/storage/volumes/).

If you know how to use the -v <host>:<container> Docker option, you can use the PCGR Docker image directly, which would not involve having to set up a Python environment. Jump to the PCGR Docker direct setup for more details.
Alternatively, you can allow PCGR itself to handle the Docker volume setup intricacies, but this requires a Python environment setup (which can be a bit cumbersome if you’re not too familiar with conda or virtualenv). Jump to the PCGR Docker indirect setup for more details.

Directly

CLICK ME!

You’ll need to map your PCGR inputs to Docker container paths. For example, say you have the input VCF sampleX.vcf.gz stored in the directory /Users/you/project1. You would need to supply Docker with a --volume (or -v) option mapping the directory of that VCF with a directory inside the Docker container, e.g. /home/input_vcf_dir. That would become: -v /Users/you/project1:/home/input_vcf_dir (note the : separating your directory from the container’s directory).

Then your command would look something like this:

docker container run -it --rm \
    -v /Users/you/dir1/data:/root/pcgr_data \
    -v /Users/you/dir2/pcgr_inputs:/root/pcgr_inputs \
    -v /Users/you/dir3/pcgr_outputs:/root/pcgr_outputs \
    sigven/pcgr:1.0.1 \
    pcgr \
      --input_vcf "/root/pcgr_inputs/tumor_sample.BRCA.vcf.gz" \
      --pcgr_dir "/root/pcgr_data" \
      --output_dir "/root/pcgr_outputs" \
      --genome_assembly "grch38" \
      --sample_id "SampleB" \
      --assay "WGS" \
      --vcf2maf \
      --no_docker

Note the --no_docker option in the above command. Since you’re running that command directly inside the container, you need to use that option to bypass the indirect Docker PCGR run.
Also note the path mappings. You’re using the container paths in the command, not the host (your machine’s) paths.

Indirectly (not recommended)

CLICK ME!

Install the PCGR Python component on your local machine. Only requirement is Python > 3.6. We would strongly advise to install it in a virtual Python environment with conda (or virtualenv) (read the previous sections for how to install conda). Or else it will (probably) use your system’s default Python and you’ll end up in a situation like https://xkcd.com/1987/. Here’s an example using conda/mamba, with only Python 3.7 as a dependency:

(base) $ mamba create -n pcgr_docker_env -c conda-forge python=3.7
(base) $ conda activate pcgr_docker_env

(pcgr_docker_env) $ which python
/Users/you/miniconda3/envs/pcgr_docker_env/bin/python
(pcgr_docker_env) $ cd /Users/you/dir4/PCGR

(pcgr_docker_env) $ pip install -e .
Obtaining file:///Users/you/dir4/PCGR
  Preparing metadata (setup.py) ... done
Installing collected packages: pcgr
  Running setup.py develop for pcgr
Successfully installed pcgr-X.X.X

(pcgr_docker_env) $ which pcgr
/Users/you/miniconda3/envs/pcgr_docker/bin/pcgr
(pcgr_docker_env) $ pcgr --version
pcgr X.X.X

You should now be all set up to run PCGR from within that pcgr_docker_env conda environment! Here’s an example command:

(pcgr_docker_env) $ pcgr \
  --input_vcf "/Users/you/dir1/tumor_sample.BRCA.vcf.gz" \
  --pcgr_dir "/Users/you/dir2/data" \
  --output_dir "/Users/you/dir3/pcgr_outputs" \
  --genome_assembly "grch38" \
  --sample_id "SampleB" \
  --assay "WGS"

Note that we do not specify the --no_docker option. PCGR will automatically look for the more recent Docker container on your machine, and then run the above command inside it indirectly.
Also note the path mappings. You’re using the actual host (your machine’s) paths, not the container paths.