FAQ • PCGR

Frequently asked questions regarding PCGR usage and functionality:

1. I do not see any data related to allelic depth/support in my report. I thought that PCGR can grab this information automatically from my VCF?

Answer: VCF variant genotype data (i.e. AD/DP) is something that you as a user need to specify explicitly when running PCGR. In our experience, there is currently no uniform way that variant callers format these types of data (allelic fraction/depth, tumor/normal) in the VCF, and this makes it very challenging for PCGR to automatically grab this information from any VCF. Please take a careful look at the example VCF files (examples folder) that comes with PCGR for how PCGR expects this information to be formatted, and make sure your VCF is formatted accordingly. There is also an in-depth explanation on the matter described here

2. Is it possible to utilize PCGR for analysis of multiple samples?

Answer: As the name of the tool implies, PCGR was developed for the detailed analysis of individual tumor samples. However, if you take advantage of the different outputs from PCGR, it can also be utilized for analysis of multiple samples. First, make sure your input files are organized per sample (i.e. one VCF file per sample, one CNA file per sample), so that they can be fed directly to PCGR. Now, once all samples have been processed with PCGR, note that all the tab-separated output files (i.e. tiers, mutational signatures, cna segments) contain the sample identifier, which enable them to be aggregated and suitable for a downstream multi-sample analysis.

Also note that the compressed JSON output pr. sample run contains ALL information presented in the report. Explore the JSON contents e.g. with the jsonlite package in R:

report_data <- jsonlite::fromJSON('<sample_id>.pcgr_acmg.grch37.json.gz')

E.g. tiered SNV/InDel output:

head(report_data$content$snv_indel$variant_set$tsv)

Or TMB estimate:

report_data$content$tmb$variant_statistic$tmb_estimate

Expanding further upon the example commands shown above, we are planning to offer a cohort analysis module in the pcgrr R package in the near future, which can assist users more easily with the aggregation of findings across individual samples.

3. I do not see the expected transcript-specific consequence for a particular variant. In what way is the primary variant consequence established?

Answer: PCGR relies upon VEP for consequence prioritization, in which a specific transcript-specific consequence is chosen as the primary variant consequence. In the PCGR configuration file, you may customise how this is chosen by changing the order of criteria applied when choosing a primary consequence block - parameter vep_pick_order

4. Is it possible to use RefSeq as the underlying gene transcript model in PCGR?

Answer: PCGR uses GENCODE as the primary gene transcript model, but we provide cross-references to corresponding RefSeq transcripts when this is available.

5. I have a VCF with structural variants detected in my tumor sample, can PCGR process those as well?

Answer: This is currently not supported as input for PCGR, but is something we want to incorporate in the future.

6. I am surprised to see a particular gene being located in TIER 3 for my sample, since I know that this gene is of potential clinical significance in the tumor type I am investigating?

Answer: PCGR classifies variants into tiers of significance through an implementation of published guidelines by ACMG/AMP. No manual efforts for individual tumor types are conducted beyond this rule-based scheme. The users need to keep this in mind when interpreting the tier contents of the report.

7. Is it possible to see all the invididual cancer subtypes that belong to each of the 30 different tumor sites?

Answer: Yes, see an overview of phenotypes associated with primary tumor sites. See also the related GitHub repository oncoPhenoMap

8. Is there any plans to incorporate data from OncoKB in PCGR?

Answer: No. PCGR relies upon publicly available open-source resources, and further that the PCGR data bundle can be distributed freely to the user community. It is our understanding that OncoKB’s terms of use do not fit well with this strategy.

9. Is it possible for the users to update the data bundle to get the most recent versions of all underlying data sources?

Answer: As of now, the data bundle is updated only with each release of PCGR. The data harmonization pipeline of knowledge databases in PCGR contain numerous and complex procedures, with several quality control and re-formatting steps, and and cannot be fully automated in its present form. The version of all databases and key software elements are outlined in each PCGR report.