v1.0.1
- Date: 2022-03-09
Fixed
- Writing to JSON crashes when size of input VCF is huge (variants in
the order of millions). If raw input set (VCF) contains > 500,000
variants, this set will, prior to reporting, be reduced by
- exclusion of intergenic and intronic variants, and
- exclusion of upstream_gene/downstream_gene variants (if variant set is still above 500,000 after step A)
- Bug in signature analysis for cases where the input variant set fits to > 18 different aetiologies.
v1.0.0
Date: 2022-02-25
Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine, KEGG, ChEMBL, Open Targets Platform, Disease Ontology, Experimental Factor Ontology
Added
- Command-line options
- VEP options
-
--vep_gencode_all
- use all GENCODE transcripts during VEP annotation (not only the basic GENCODE set) -
--prevalence_reference_signatures
- set minimum prevalence (percent) for selection of reference signatures included in refitting procedure for a given tumor type
-
- VEP options
Changed
- Complete restructure of Python and R components.Installation now
relies on two separate conda packages,
pcgr
(Python component) andpcgrr
(R component). Direct Docker support remains, with the Dockerfile simplified to rely exclusively on the installation of the above Conda packages.
v0.9.2
Date: 2021-06-30
Data updates: ClinVar, GWAS catalog, CIViC, CancerMine, dbNSFP, KEGG, ChEMBL, Disease Ontology/EFO, Open Targets Platform, UniProt KB, GENCODE
Software upgrades: R v4.1, Bioconductor v3.13, VEP (104) ++
Changed
- TOML-based configuration for PCGR is abandoned, all options to PCGR
are now configured through command-line parameters
-
NOTE: We recommend to turn on
--show_noncoding
and--vcf2maf
(prevously turned on by default in TOML). For tumor-only runs, we recommend to include--exclude_dbsnp_nonsomatic
andexclude_nonexonic
-
NOTE: We recommend to turn on
Added
- Command-line options
- Previously set in TOML file)
- Allelic support
--tumor_dp_tag
--tumor_af_tag
--control_dp_tag
--control_af_tag
--call_conf_tag
- Tumor-only options
--maf_onekg_eur
--maf_onekg_amr
--maf_onekg_afr
--maf_onekg_eas
--maf_onekg_sas
--maf_onekg_global
--maf_gnomad_nfe
--maf_gnomad_asj
--maf_gnomad_fin
--maf_gnomad_oth
--maf_gnomad_amr
--maf_gnomad_afr
--maf_gnomad_eas
--maf_gnomad_sas
--maf_gnomad_global
--exclude_pon
--exclude_likely_het_germline
--exclude_likely_hom_germline
--exclude_dbsnp_nonsomatic
--exclude_nonexonic
--report_theme
-
--preserved_info_tags
(previouslycustom_tags (TOML)
) -
--show_noncoding
(previouslylist_noncoding (TOML)
) -
--vcfanno_n_proc
(previouslyn_vcfanno_proc (TOML)
) -
--vep_n_forks
(previouslyn_vep_forks (TOML)
) --vep_pick_order
-
--vep_no_intergenic
(previouslyvep_skip_intergenic (TOML)
) --vcf2maf
- Allelic support
- New options
-
--report_nonfloating_toc
(NEW) - add the TOC at the top of the HTML report, not floating at the left of the document -
--cpsr_report
(NEW) - add a dedicated section in PCGR with main germline findings from CPSR analysis - (use the gzipped JSON output from CPSR as input) -
--vep_regulatory
(NEW) - append regulatory annotations to variants (TF binding sites etc.) -
--include_artefact_signatures
(NEW) - include sequencing artefacts in the reference collection of mutational signatures (COSMIC v3.2)
-
- Previously set in TOML file)
Fixed
- Bug in writing (large) report contents to JSON (issue #118)
- Bug (typo) in merge of clinical evidence items from different sources (CIVIC + CGI) (issue #126)
- Bug in value box for number of (high-confident) kataegis events - rmarkdown (issue #122)
- Bug in value box for tumor purity/ploidy -rmarkdown (issue #129)
v0.9.1
Date: 2020-11-30
-
Data updates:
- ClinVar,
- GWAS catalog
- CIViC
- CancerMine
- dbNSFP
- KEGG
- ChEMBL/DGIdb
- Disease Ontology, Experimental Factor Ontology
Added
- added possibility to configure algorithm for TMB calculation,
optional argument
tmb_algorithm
- all coding variants (all_coding) or non-synonymous variants only (nonsyn) - R code subject to static analysis with lintr
- Improved Conda recipe (i.e.
meta.yaml
) with version pinning of all package dependencies
Changed
- Removed DisGeNET annotations from output (associations from Open Targets Platform serve same purpose)
- Version pinning of software dependencies in Dockerfile:
- All R packages necessary for PCGR is installed using the renv framework, ensuring improved versioning and reproducibility
- Other tools/utilities and Python libraries that have been version
pinned:
- bedtools, samtools, numpy, cython, scipy, cyvcf2, toml, pandas
v0.9.0rc
Date: 2020-09-24
Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine, UniProt KB, dbNSFP, Pfam, KEGG, Open Targets Platform
Software updates: VEP 101
Changed
- All arguments to
pcgr.py
is now non-positional - Arguments to
pcgr.py
are divided into two groups: required and optional - Options allelic_support:tumor_dp_min,
allelic_support:tumor_af_min,
allelic_support:control_dp_min,
allelic_support:control_af_max in PCGR
configuration file are now optional arguments
--tumor_dp_min
,--tumor_af_min
,--control_dp_min
, –control_af_maxin
cpsr.py` - Option mutational_burden:mutational_burden
in PCGR configuration file is now optional argument
--estimate_tmb
inpcgr.py
- Option msi:msi in PCGR configuration file
is now optional argument
--estimate_msi_status
inpcgr.py
- Option mutational_signatures:mutational_signatures
in PCGR configuration file is now optional argument
--estimate_signatures
inpcgr.py
- Options mutational_signatures:mutsignatures_signature_limit, mutational_signatures:mutsignatures_normalization, mutational_signatures:mutsignatures_mutation_limit, mutational_signatures:mutsignatures_cutoff are removed (used for deconstructSigs analysis, which is no longer in use)
- Optional argument
--cna_overlap_pct
inpcgr.py
replaces cna:cna_overlap_pct in PCGR configuration file - Optional argument
--logr_gain
inpcgr.py
replaces cna:logr_gain in PCGR configuration file - Optional argument
--logr_homdel
inpcgr.py
replaces cna:logr_homdel in PCGR configuration file - Removed mutational_burden:tmb_low_limit and mutational_burden:tmb_intermediate_limit - TMB is no longer interpreted in the context of thresholds
- Classifications of genes as tumor suppressors/oncogenes are now based on a combination of CancerMine citation count and presence in Network of Cancer Genes
-
Settings section of report is now divived into
three:
- Metadata - sample and sequencing assay
- Report configuration
Added
- Optional argument
--include_trials
inpcgr.py
- includes a section with annotated clinical trials for the tumor type in question - Optional argument
--assay
inpcgr.py
- designates type of sequencing assay - Optional argument
--cell_line
inpcgr.py
- designates runs of tumor cell lines (only for display, not used to configure any analysis) - Optional argument
--min_mutations_signatures
inpcgr.py
- minimum number of required mutations for mutational signature analysis with MutationalPatterns - Optional argument
--all_reference_signatures
inpcgr.py
- considers all reference signatures during fitting of mutational profile to known signatures - Optional argument
--estimate_signatures
now also includes detection of potential kataegis events (WGS/WES assays only), and rainfall plot in the flexdashboard output - The user can now distinguish (through color codes) whether a biomarker has been mapped exactly (nucleotide change) or at a regional level (codon/exon)
- All variant-associated biomarkers (regardless of assignment to TIER 1/2) are now found in a new section (SNVs/InDels)
- For copy number amplifications, other putative drug targets in cancer are listed in a new section
- Detailed documentation of report contents are added to the Documentation section
- References are updated and all provided with DOI
v0.8.4
Date: 2019-11-18
Data updates: ClinVar, CIViC, CancerMine, UniProt KB
Software updates: VEP 98.3
v0.8.3
Date: 2019-10-14
Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine
Software updates: VEP 98.2, vcf2tsv
v0.8.2
Date: 2019-09-29
Data updates: ClinVar, GWAS catalog, GENCODE, DiseaseOntology, CIViC, CancerMine, UniProt KB
Software updates: VEP 97.3, vcfanno 0.3.2, LOFTEE (VEP plugin) 1.0.3
Fixed
- Bug in concatenation of clinical evidence items from different sources (CIVIC + CBMDB) (issues #83,#87)
- Silent variants that coincide with biomarkers reported at codon level are ignored
- Distinction between clinical evidence items of different origins (somatic + germline)
- Improved mapping between Ensembl transcripts and UniProt accessions (using also RefSeq accessions where available)
- Bug in UpSetPlot for cases where filtering produce less than two intersecting sets
Added
- New field ‘mane’ as criteria for pick order in configuration file (VEP)
- Sample identifier to copy number annotation output (convenient for concatenation of output from multiple samples)
- Capturing allelic depth (t_depth, t_ref_count etc.) in vcf2maf output (enhancement #52)
- Option tumor_only in pcgr.py, replaces vcf_tumor_only in configuration file, more convenient in terms of configuration
v0.8.0
- Date: 2019-05-20
Fixed
- Bug in value box for Tier 2 variants (new line carriage) Issue #73
Added
- Upgraded VEP to v96
- Skipping the –regulatory VEP option to avoid forking issues and to improve speed (See this issue)
- Added option to configure pick-order for choice of primary transcript in configuration file
- Pre-made configuration files for each tumor type in conf folder
- Possibility to append a CNA plot file (.png format) to the section of the report with Somatic CNAs previous feature request
- Added possibility to input estimates of tumor
purity and ploidy
- shown as value boxes in Main results
- Tumor mutational burden is now compared with the distribution of TMB
observed for TCGA’s cohorts (organized by primary site)
- Default target size is now 34Mb (approx. estimate from exome-wide calculation of protein-coding parts of GENCODE)
- Added flexibility for variant filtering in tumor-only input callsets
- Added additional options to exclude likely germline variants (both
requires the tumor VAF tag to be correctly specified in the input VCF)
- exclude_likely_hom_germline - removes any variant with an allelic fraction of 1 (100%) - very unlikely somatic event
-
exclude_likely_het_germline - removes any variant
with
- an allelic fraction between 0.4 and 0.6, and
- presence in dbSNP + gnomAD, and
- no presence as somatic event in COSMIC/TCGA
- Added possibility to input PANEL-OF-NORMALS VCF - this to
support the many labs that have sequenced a database/pool of healthy
controls. This set of variants are utilized in PCGR to improve the
variant filtering when running in tumor-only mode. The
PANEL-OF-NORMALS annotation work as follows:
- all variants in the tumor that coincide with any variant listed in the PANEL-OF-NORMALS VCF is appended with a PANEL_OF_NORMALS flag in the query VCF with tumor variants.
- If configuration parameter exclude_pon is set to True in tumor_only runs, all variants with a PANEL_OF_NORMALS flag are filtered/excluded
- Added additional options to exclude likely germline variants (both
requires the tumor VAF tag to be correctly specified in the input VCF)
- For tumor-only runs, added an UpSet plot showing how different filtering sources (gnomAD, 1KG Project, panel-of-normals etc) contribute in the germline filtering procedure
- Variants in Tier 3 / Tier 4 / Noncoding are now sorted (and color-coded) according to the target (gene) association score to the cancer phenotype, as provided by the OpenTargets Platform
- Added annotation of TCGA’s ten oncogenic signaling pathways
- Added EXONIC_STATUS annotation tag (VCF and TSV)
- exonic denotes all protein-altering AND cannonical splicesite altering AND synonymous variants, nonexonic denotes the complement
- Added CODING_STATUS annotation tag (VCF and TSV)
- coding denotes all protein-altering AND cannonical splicesite altering, noncoding denotes the complement
- Added SYMBOL_ENTREZ annotation tag (VCF)
- Official gene symbol from NCBI EntreZ (SYMBOL provided by VEP can sometimes be non-official/alias (i.e. for GENCODE v19/grch37))
- Added SIMPLEREPEATS_HIT annotation tag (VCF and TSV)
- Variant overlaps UCSC simpleRepeat sequence repeat track - used for MSI prediction
- Added WINMASKER_HIT annotation tag (VCF and TSV)
- Variant overlaps UCSC windowmaskerSdust sequence repeat track - used for MSI prediction
- Added PUTATIVE_DRIVER_MUTATION annotation tag (VCF and TSV)
- Putative cancer driver mutation discovered by multiple approaches from 9,423 tumor exomes in TCGA. Format: symbol:hgvsp:ensembl_transcript_id:discovery_approaches
- Added OPENTARGETS_DISEASE_ASSOCS annotation tag (VCF and
TSV)
- Associations between protein targets and disease based on multiple lines of evidence (mutations,affected pathways,GWAS, literature etc). Format: CUI:EFO_ID:IS_DIRECT:OVERALL_SCORE
- Added OPENTARGETS_TRACTABILITY_COMPOUND annotation tag (VCF
and TSV)
- Confidence for the existence of a modulator (small molecule) that interacts with the target (protein) to elicit a desired biological effect
- Added OPENTARGTES_TRACTABILITY_ANTIBODY annotation tag (VCF
and TSV)
- Confidence for the existence of a modulator (antibody) that interacts with the target (protein) to elicit a desired biological effect
- Added CLINVAR_REVIEW_STATUS_STARS annotation tag
- Rating of the ClinVar variant (0-4 stars) with respect to level of review
Changed
- Moved from IntoGen’s driver mutation resource to TCGA’s putative driver mutation list in display of driver mutation status
- Moved option for vcf_validation from configuration file to run
script (
--no_vcf_validate
)
v0.7.0
- Date: 2018-11-27
Fixed
- Bug in assignment of variants to tier1/tier2 Issue #61
- Missing config option for maf_gnomad_asj in TOML file (also
setting operator to
<=
) Issue #60 - Bug in new CancerMine oncogene/tumor suppressor annotation Issue #53
- vcfanno fix for empty Description (upgrade to vcfanno v0.3.1 Issue #49)
- Bug in message showing too few variants for MSI prediction, Issue #55
- Bug in appending of custom VCF tags
- Still unsolved: how to disambiguate identical FORMAT and INFO tags in vcf2tsv
- Bug in SCNA value box display for multiple copy number hits (Issue #47)
- Bug in vcf2tsv (handling INFO tags encoded with ‘Type = String’, Issue #39)
- Bug in search of UniProt functional features (BED feature regions spanning exons are now handled)
- Stripped off HTML elements (TCGA_FREQUENCY, DBSNP) in TSV output
- Some effect predictions from dbNSFP were not properly parsed (e.g. multiple prediction entries from multiple transcript isoforms), these should now be retrieved correctly
- Removed ‘COSM’ prefix in COSMIC mutation links
- Bug in retrieval of splice site predictions from dbscSNV
Added
- Possibility to run PCGR in a non-Docker environment (e.g. using the
–no-docker option). Thanks to an excellent contribution by Vlad Saveliev, Issue #35
- Added possibility to add docker user-id
- Possibility for MAF file output (converted with vcf2maf), must be configured by the user in the TOML file (i.e. vcf2maf = true, Issue #17)
- Possibility for adding custom VCF INFO tags to PCGR output files (JSON/TSV), must be configured by the user in the TOML file (i.e. custom_tags)
- Added MUTATION_HOTSPOT_CANCERTYPE in data tables (i.e. listing tumor types in which hotspot mutations have been found)
- Included the ‘rs’ prefix for dbSNP identifiers (HTML and TSV output)
- Individual entries/columns for variant effect predictions:
- Individual algorithms: SIFT_DBNSFP, M_CAP_DBNSFP, MUTPRED_DBNSFP, MUTATIONTASTER_DBNSFP, MUTATIONASSESSOR_DBNSFP, FATHMM_DBNSFP, FATHMM_MKL_DBNSFP, PROVEAN_DBNSFP
- Ensemble predictions (META_LR_DBNSFP), dbscSNV splice site predictions (SPLICE_SITE_RF_DBNSFP, SPLICE_SITE_ADA_DBNSFP)
- Upgraded samtools to v1.9 (makes vcf2maf work properly)
- Added Ensembl gene/transcript id and corresponding RefSeq mRNA id to TSV/JSON
- Added for future implementation:
- SeqKat + karyoploteR for exploration of kataegis/hypermutation
- CELLector - genomics-guided selection of cancer cell lines
- Upgraded VEP to v94
Changed
- Changed CANCER_MUTATION_HOTSPOT to MUTATION_HOTSPOT
- Moved from TSGene 2.0
to CancerMine for
annotation of tumor suppressor genes and proto-oncogenes
- A minimum of n=3 citations were required to include literatured-mined tumor suppressor genes and proto-oncogenes from CancerMine
v0.6.2
- Date: 2018-05-09
Fixed
- Bug in copy number segment display (missing variable initalization, Issue #34))
- Typo in gnomAD filter statistic (fraction, Issue #31)
- Bug in mutational signature analysis for grch38 (forgot to pass BSgenome object, Issue #27)
- Missing proper ASCII-encoding in vcf2tsv conversion, Issue #
- Removed ‘Noncoding mutations’ section when no input VCF is present
- Bug in annotation of copy number event type (focal/broad)
- Bug in copy number annotation (missing protein-coding transcripts)
- Updated MSI prediction (variable importance, performance measures)
v0.6.1
- Date: 2018-05-02
Fixed
- Bug in tier assignment ‘pcgr_acmg’ (case for no variants in tier1,2,3)
- Bug in tier assignment ‘pcgr_acmg’ (no tumor type specified, evidence items with weak support detected)
- Bug: duplicated variants in ‘Tier 3’ resulting from genes encoded with dual roles as tumor suppressor genes/oncogenes
- Bug: duplicated variants in ‘Tier 1/Noncoding variants’ resulting from rare cases of noncoding variants occurring in Tier 1 (synonymous variants with biomarker role)
v0.6.0
- Date: 2018-04-25
Added
- New argument in pcgr.py
- assembly (grch37/grch38)
- New option in pcgr.py
- –basic - run comprehensive VCF annotation only, skip report generation and additional analyses
- New sections in HTML report
- Settings and annotation sources - now also listing key PCGR configuration settings
- Main findings - Six value boxes indicating the main findings of clinical relevance
- New configuration options
- tier_model(string) - choice between pcgr_acmg and pcgr
-
mutational_burden - set TMB tertile limits
- tmb_low_limit (float)
- tmb_intermediate_limit (float)
-
tumor_type - choose between 34 tumor types/classes:
- Adrenal_Gland_Cancer_NOS (logical)
- Ampullary_Carcinoma_NOS (logical)
- Biliary_Tract_Cancer_NOS (logical)
- Bladder_Urinary_Tract_Cancer_NOS (logical)
- Blood_Cancer_NOS (logical)
- Bone_Cancer_NOS (logical)
- Breast_Cancer_NOS (logical)
- CNS_Brain_Cancer_NOS (logical)
- Colorectal_Cancer_NOS (logical)
- Cervical_Cancer_NOS (logical)
- Esophageal_Stomach_Cancer_NOS (logical)
- Head_And_Neck_Cancer_NOS (logical)
- Hereditary_Cancer_NOS (logical)
- Kidney_Cancer_NOS (logical)
- Leukemia_NOS (logical)
- Liver_Cancer_NOS (logical)
- Lung_Cancer_NOS (logical)
- Lymphoma_Hodgkin_NOS (logical)
- Lymphoma_Non_Hodgkin_NOS (logical)
- Ovarian_Fallopian_Tube_Cancer_NOS (logical)
- Pancreatic_Cancer_NOS (logical)
- Penile_Cancer_NOS (logical)
- Peripheral_Nervous_System_Cancer_NOS (logical)
- Peritoneal_Cancer_NOS (logical)
- Pleural_Cancer_NOS (logical)
- Prostate_Cancer_NOS (logical)
- Skin_Cancer_NOS (logical)
- Soft_Tissue_Cancer_NOS (logical)
- Stomach_Cancer_NOS (logical)
- Testicular_Cancer_NOS (logical)
- Thymic_Cancer_NOS (logical)
- Thyroid_Cancer_NOS (logical)
- Uterine_Cancer_NOS (logical)
- Vulvar_Vaginal_Cancer_NOS (logical)
-
mutational_signatures
- mutsignatures_cutoff (float) - discard any signature contributions with a weight less than the cutoff
-
cna
- transcript_cna_overlap (float) - minimum percent overlap between copy number segment and transcripts (average) for tumor suppressor gene/proto-oncogene to be reported
-
allelic_support
- If input VCF has correctly formatted depth/allelic fraction as INFO
tags, users can add thresholds on depth/support that are applied prior
to report generation
- tumor_dp_min (integer) - minimum sequencing depth for variant in tumor sample
- tumor_af_min (float) - minimum allelic fraction for variant in tumor sample
- normal_dp_min (integer) - minimum sequencing depth for variant in normal sample
- normal_af_max (float) - maximum allelic fraction for variant in normal sample
- If input VCF has correctly formatted depth/allelic fraction as INFO
tags, users can add thresholds on depth/support that are applied prior
to report generation
-
visual
- report_theme (string) - visual theme of report (Bootstrap)
-
other
- vcf_validation (logical) - keep/skip VCF validation by vcf-validator
- New output file - JSON output of HTML report content
- New INFO tags of PCGR-annotated VCF
- CANCER_PREDISPOSITION
- PFAM_DOMAIN
- TCGA_FREQUENCY
- TCGA_PANCANCER_COUNT
- ICGC_PCAWG_OCCURRENCE
- ICGC_PCAWG_AFFECTED_DONORS
- CLINVAR_MEDGEN_CUI
- New column entries in annotated SNV/InDel TSV file:
- CANCER_PREDISPOSITION
- ICGC_PCAWG_OCCURRENCE
- TCGA_FREQUENCY
- New column in CNA output
- TRANSCRIPTS - aberration-overlapping transcripts (Ensembl transcript IDs)
- MEAN_TRANSCRIPT_CNA_OVERLAP - Mean overlap (%) betweeen gene transcripts and aberration segment
Removed
- Elements of databundle (now annotated directly through VEP):
- dbsnp
- gnomad/exac
- 1000G project
- INFO tags of PCGR-annotated VCF
- DBSNPBUILDID
- DBSNP_VALIDATION
- DBSNP_SUBMISSIONS
- DBSNP_MAPPINGSTATUS
- GWAS_CATALOG_PMID
- GWAS_CATALOG_TRAIT_URI
- DOCM_DISEASE
- Output files
- TSV files with mutational signature results and biomarkers
(i.e. sample_id.pcgr.snvs_indels.biomarkers.tsv and
sample_id.pcgr.mutational_signatures.tsv)
- Data can still be retrieved - now from the JSON dump
- MAF file
- The previous MAF output was generated in a custom fashion, a more accurate MAF output based on https://github.com/mskcc/vcf2maf will be incorporated in the next release
- TSV files with mutational signature results and biomarkers
(i.e. sample_id.pcgr.snvs_indels.biomarkers.tsv and
sample_id.pcgr.mutational_signatures.tsv)
Changed
- HTML report sections
- Tier statistics and Variant statistics are now grouped into the section Tier and variant statistics
- Tier 5 is now Noncoding mutations (i.e. not considered a tier per se)
- Sliders for allelic fraction in the Global variant browser are now fixed from 0 to 1 (0.05 intervals)