
π Aim
Given a directory with results from a DRAGEN/UMCCR workflow, {dracarys} will grab files of interest and transform them into βtidierβ structures for output into TSV/Parquet/RDS format for downstream ingestion into a database/data lake. See supported workflows, running examples, and CLI options in the sections below.
π Installation
R
remotes::install_github("umccr/dracarys@vX.X.X") # for vX.X.X Release/Tag⨠Supported Workflows
{dracarys} supports most outputs from the following DRAGEN/UMCCR workflows:
| Workflow | Description | 
|---|---|
| bcl_convert | BCLConvert workflow | 
| tso_ctdna_tumor_only | ctDNA TSO500 workflow | 
| wgs_alignment_qc | DRAGEN DNA (alignment) workflow | 
| wts_alignment_qc | DRAGEN RNA (alignment) workflow | 
| wts_tumor_only | DRAGEN RNA workflow | 
| wgs_tumor_normal | DRAGEN Tumor/Normal workflow | 
| umccrise | umccrise workflow | 
| rnasum | RNAsum workflow | 
| sash | sash workflow | 
| oncoanalyser | oncoanalyser workflow | 
See which output files from these workflows are supported in Supported Files.
π CLI
A dracarys.R command line interface is available for convenience.
- If youβre using the conda package, the 
dracarys.Rcommand will already be available inside the activated conda environment. - If youβre not using the conda package, you need to export the 
dracarys/inst/cli/directory to yourPATHin order to usedracarys.R. 
dracarys_cli=$(Rscript -e 'x = system.file("cli", package = "dracarys"); cat(x, "\n")' | xargs)
export PATH="${dracarys_cli}:${PATH}"dracarys.R --version
dracarys.R 0.16.0
#-----------------------------------#
dracarys.R --help
usage: dracarys.R [-h] [-v] {tidy} ...
π DRAGEN Output Post-Processing π₯
positional arguments:
  {tidy}         sub-command help
    tidy         Tidy UMCCR Workflow Outputs
options:
  -h, --help     show this help message and exit
  -v, --version  show program's version number and exit
#-----------------------------------#
#------- Tidy ----------------------#
dracarys.R tidy --help
usage: dracarys.R tidy [-h] -i IN_DIR -o OUT_DIR -p PREFIX [-t TOKEN]
                       [-l LOCAL_DIR] [-f FORMAT] [-n] [-q]
options:
  -h, --help            show this help message and exit
  -i IN_DIR, --in_dir IN_DIR
                        βοΈ Directory with untidy UMCCR workflow results. Can
                        be GDS, S3 or local.
  -o OUT_DIR, --out_dir OUT_DIR
                        π₯ Directory to output tidy results.
  -p PREFIX, --prefix PREFIX
                        π» Prefix string used for all results.
  -t TOKEN, --token TOKEN
                        π ICA access token. Default: ICA_ACCESS_TOKEN env var.
  -l LOCAL_DIR, --local_dir LOCAL_DIR
                        π₯ If input is a GDS/S3 directory, download the
                        recognisable files to this directory. Default:
                        '<out_dir>/dracarys_<gds|s3>_sync'.
  -f FORMAT, --format FORMAT
                        π¨ Format of output. Default: tsv.
  -n, --dryrun          π« Dry run - just show files to be tidied.
  -q, --quiet           π΄ Shush all the logs.π Running
{dracarys} takes as input (--in_dir) a directory with results from one of the UMCCR workflows. It will recursively scan that directory for supported files, download those into a local directory (--gds_local_dir), and then it will parse, transform and write the tidied versions into the specified output directory (--out_dir). A prefix (--prefix) is prepended to each of the tidied files. The output file format (--format) can be tsv, parquet, or both. To get just a list of supported files within the specified input directory, use the -n (--dryrun) option.
R
# help(umccr_tidy)
in_dir <- "gds://path/to/subjectX_multiqc_data/"
out_dir <- tempdir()
prefix <- "subjectX"
umccr_tidy(in_dir = in_dir, out_dir = out_dir, prefix = prefix)Mac/Linux
From within an activated conda environment or a shell with the dracarys.R CLI available: