π Aim
Given a directory with results from a DRAGEN/UMCCR workflow, {dracarys} will grab files of interest and transform them into βtidierβ structures for output into TSV/Parquet/RDS format for downstream ingestion into a database/data lake. See supported workflows, running examples, and CLI options in the sections below.
π Installation
R
remotes::install_github("umccr/dracarys@vX.X.X") # for vX.X.X Release/Tag
β¨ Supported Workflows
{dracarys} supports most outputs from the following DRAGEN/UMCCR workflows:
Workflow | Description |
---|---|
bcl_convert | BCLConvert workflow |
tso_ctdna_tumor_only | ctDNA TSO500 workflow |
wgs_alignment_qc | DRAGEN DNA (alignment) workflow |
wts_alignment_qc | DRAGEN RNA (alignment) workflow |
wts_tumor_only | DRAGEN RNA workflow |
wgs_tumor_normal | DRAGEN Tumor/Normal workflow |
umccrise | umccrise workflow |
rnasum | RNAsum workflow |
sash | sash workflow |
oncoanalyser | oncoanalyser workflow |
See which output files from these workflows are supported in Supported Files.
π CLI
A dracarys.R
command line interface is available for convenience.
- If youβre using the conda package, the
dracarys.R
command will already be available inside the activated conda environment. - If youβre not using the conda package, you need to export the
dracarys/inst/cli/
directory to yourPATH
in order to usedracarys.R
.
dracarys_cli=$(Rscript -e 'x = system.file("cli", package = "dracarys"); cat(x, "\n")' | xargs)
export PATH="${dracarys_cli}:${PATH}"
dracarys.R --version
dracarys.R 0.16.0
#-----------------------------------#
dracarys.R --help
usage: dracarys.R [-h] [-v] {tidy} ...
π DRAGEN Output Post-Processing π₯
positional arguments:
{tidy} sub-command help
tidy Tidy UMCCR Workflow Outputs
options:
-h, --help show this help message and exit
-v, --version show program's version number and exit
#-----------------------------------#
#------- Tidy ----------------------#
dracarys.R tidy --help
usage: dracarys.R tidy [-h] -i IN_DIR -o OUT_DIR -p PREFIX [-t TOKEN]
[-l LOCAL_DIR] [-f FORMAT] [-n] [-q]
options:
-h, --help show this help message and exit
-i IN_DIR, --in_dir IN_DIR
βοΈ Directory with untidy UMCCR workflow results. Can
be GDS, S3 or local.
-o OUT_DIR, --out_dir OUT_DIR
π₯ Directory to output tidy results.
-p PREFIX, --prefix PREFIX
π» Prefix string used for all results.
-t TOKEN, --token TOKEN
π ICA access token. Default: ICA_ACCESS_TOKEN env var.
-l LOCAL_DIR, --local_dir LOCAL_DIR
π₯ If input is a GDS/S3 directory, download the
recognisable files to this directory. Default:
'<out_dir>/dracarys_<gds|s3>_sync'.
-f FORMAT, --format FORMAT
π¨ Format of output. Default: tsv.
-n, --dryrun π« Dry run - just show files to be tidied.
-q, --quiet π΄ Shush all the logs.
π Running
{dracarys} takes as input (--in_dir
) a directory with results from one of the UMCCR workflows. It will recursively scan that directory for supported files, download those into a local directory (--gds_local_dir
), and then it will parse, transform and write the tidied versions into the specified output directory (--out_dir
). A prefix (--prefix
) is prepended to each of the tidied files. The output file format (--format
) can be tsv, parquet, or both. To get just a list of supported files within the specified input directory, use the -n (--dryrun)
option.
R
# help(umccr_tidy)
in_dir <- "gds://path/to/subjectX_multiqc_data/"
out_dir <- tempdir()
prefix <- "subjectX"
umccr_tidy(in_dir = in_dir, out_dir = out_dir, prefix = prefix)
Mac/Linux
From within an activated conda environment or a shell with the dracarys.R
CLI available: