Skip to contents

Introduction

PURPLE: Purity Ploidy Estimator (https://github.com/hartwigmedical/hmftools/tree/master/purple).

PURPLE combines B-allele frequency, read depth ratios, small variants and structural variants to estimate the purity and copy number profile of a tumor sample.

It outputs several files, some of which are displayed below.

Data Munging

Somatic CNVs (per chromosome)

cnv_som <- system.file("extdata/purple/purple.cnv.somatic.tsv", package = "gpgr") |>
  gpgr::purple_cnv_som_process()
Description
cnv_som$descr |>
  knitr::kable(caption = "PURPLE Somatic CNVs (per chromosome) Columns.")
PURPLE Somatic CNVs (per chromosome) Columns.
Column Description
Chr/Start/End Coordinates of copy number segment
CN Fitted absolute copy number of segment adjusted for purity and ploidy
CN Min+Maj CopyNumber of minor + major allele adjusted for purity
Start/End SegSupport Type of SV support for the CN breakpoint at start/end of region. Allowed values: CENTROMERE, TELOMERE, INV, DEL, DUP, BND (translocation), SGL (single breakend SV support), NONE (no SV support for CN breakpoint), MULT (multiple SV support at exact breakpoint)
Method Method used to determine the CN of the region. Allowed values: BAF_WEIGHTED (avg of all depth windows for the region), STRUCTURAL_VARIANT (inferred using ploidy of flanking SVs), LONG_ARM (inferred from the long arm), GERMLINE_AMPLIFICATION (inferred using special logic to handle regions of germline amplification)
BAF (count) Tumor BAF after adjusted for purity and ploidy (Count of AMBER baf points covered by this segment)
GC (windowCount) Proportion of segment that is G or C (Count of COBALT windows covered by this segment)
cnv_som$tab |>
  dplyr::slice(1:10) |>
  knitr::kable(caption = "PURPLE Somatic CNVs (per chromosome) Summary Table.")
PURPLE Somatic CNVs (per chromosome) Summary Table.
Chr Start End CN CN Min+Maj Start/End SegSupport Method BAF (count) GC (windowCount)
chr1 1 123605522 1.0 0+1 TELOMERE-CENTROMERE BAF_WEIGHTED 0.98 (20830) 0.42 (107822)
chr1 123605523 200044314 2.0 1+1 CENTROMERE-DUP BAF_WEIGHTED 0.5 (10239) 0.4 (47332)
chr1 200044315 200044570 2.8 1+1.8 DUP-DUP STRUCTURAL_VARIANT 0.65 (0) 0 (0)
chr1 200044571 248956422 2.0 1+1 DUP-TELOMERE BAF_WEIGHTED 0.5 (10341) 0.42 (43373)
chr2 1 93139350 2.0 1+1 TELOMERE-CENTROMERE BAF_WEIGHTED 0.51 (19339) 0.41 (81949)
chr2 93139351 219955359 2.0 1+1 CENTROMERE-BND BAF_WEIGHTED 0.5 (19047) 0.39 (112493)
chr2 219955360 225225069 1.0 0+1 BND-BND BAF_WEIGHTED 0.98 (1284) 0.4 (5069)
chr2 225225070 242193529 2.0 1+1 BND-TELOMERE BAF_WEIGHTED 0.51 (4236) 0.44 (15099)
chr3 1 92214015 1.0 0+1 TELOMERE-CENTROMERE BAF_WEIGHTED 0.98 (18629) 0.4 (83984)
chr3 92214016 198295559 2.0 1+1 CENTROMERE-TELOMERE BAF_WEIGHTED 0.5 (21114) 0.39 (95873)

Somatic CNVs (per gene)

umccr_key_genes <- system.file("extdata/ref/somatic_panel-v24.03.0.tsv", package = "gpgr")
cnv_som_gene <- system.file("extdata/purple/purple.cnv.gene.tsv", package = "gpgr") |>
  gpgr::purple_cnv_som_gene_process(g = umccr_key_genes)
Description
cnv_som_gene$descr |>
  knitr::kable(caption = "PURPLE Somatic CNVs (per gene) Columns.")
PURPLE Somatic CNVs (per gene) Columns.
Column Description
gene Name of gene
minCN/maxCN Min/Max copy number found in gene exons
chrom/start/end Chromosome/start/end location of gene transcript
chrBand Chromosome band of the gene
onco_or_ts oncogene (‘oncogene’), tumor suppressor (‘tsgene’), or both (‘onco+ts’), as reported by Cancermine
transcriptID Ensembl transcript ID (dot version)
minMinorAlleleCN Minimum allele ploidy found over the gene exons - useful for identifying LOH events
somReg (somaticRegions) Count of somatic copy number regions this gene spans
minReg (minRegions) Number of somatic regions inside the gene that share the min copy number
minRegStartEnd Start/End base of the copy number region overlapping the gene with the minimum copy number
minRegSupportStartEndMethod Start/end support of the CN region overlapping the gene with the min CN (plus determination method)
cnv_som_gene$tab |>
  dplyr::slice(1:10) |>
  knitr::kable(caption = "PURPLE Somatic CNVs (per gene) Summary Table.")
PURPLE Somatic CNVs (per gene) Summary Table.
gene minCN maxCN chrom start end chrBand onco_or_ts transcriptID minMinorAlleleCN somReg minReg minRegStartEnd minRegSupportStartEndMethod
SDHA 5.6640 5.6640 chr5 218303 257082 p15.33 tsgene ENST00000264932 1.8901 1 1 89179-297781 DEL-DEL (BAF_WEIGHTED)
DUSP22 4.6638 4.6638 chr6 291630 351355 p25.3 tsgene ENST00000419235 1.6754 1 1 1-834611 TELOMERE-DEL (BAF_WEIGHTED)
IRF4 4.6638 4.6638 chr6 391752 411443 p25.3 oncogene ENST00000380956 1.6754 1 1 1-834611 TELOMERE-DEL (BAF_WEIGHTED)
KDM5A 4.2953 4.2953 chr12 280057 389320 p13.33 oncogene ENST00000399788 0.0000 1 1 1-1210023 TELOMERE-DEL (BAF_WEIGHTED)
CRLF2 3.7271 3.7271 chrX 1187549 1212723 p22.33 oncogene ENST00000400841 0.0000 1 1 924013-2677577 DUP-DUP (BAF_WEIGHTED)

Purity

purity <- system.file("extdata/purple/purple.purity.tsv", package = "gpgr") |>
  gpgr::purple_purity_read()

purity$summary |>
  knitr::kable(caption = "PURPLE Purity Summary Table.")
PURPLE Purity Summary Table.
n variable value details
2 Purity 0.99 (0.98-1) Purity of tumor in the sample (and min-max with score within 10% of best).
3 Ploidy 2.86 (2.78-2.94) Average ploidy of tumor sample after adjusting for purity (and min-max with score within 10% of best).
4 Gender MALE Gender as inferred by AMBER/COBALT.
7 WGD TRUE Whole genome duplication (more than 10 autosomes have average major allele ploidy > 1.5).
8 MSI (indels/Mb) MSS (0.19) MSI status (MSI, MSS or UNKNOWN if somatic variants not supplied) & MS Indels per Mb.
9 PolyclonalProp 0.13 Proportion of CN regions that are more than 0.25 from a whole CN
10 DiploidyProp 0.03 (0.02-0.04) Proportion of CN regions that have 1 (+- 0.2) minor and major allele.
11 TMB 15.13 (HIGH) Tumor mutational burden (# PASS variants per Megabase) (Status: ‘HIGH’ (>10 PASS per Mb), ‘LOW’ or ‘UNKNOWN’).
12 TML 349 (HIGH) Tumor mutational load (# of missense variants) (Status: ‘HIGH’, ‘LOW’ or ‘UNKNOWN’).
13 TMB-SV 1267 # of non inferred, non single passing SVs.

Kataegis

kat <- system.file("extdata/purple/purple.somatic.vcf.gz", package = "gpgr") |>
  purple_kataegis()
kat$data |>
  dplyr::slice(1:10) |>
  knitr::kable(caption = "PURPLE Kataegis Table.")
PURPLE Kataegis Table.
CHROM POS KT PURPLE_AF PURPLE_CN PURPLE_MACN PURPLE_VCN SUBCL MH TNC
chr2 45006212 REV_1 0.2601 1.52 0.000 0.396 1 NA GGA
chr2 45007103 REV_1 0.2399 1.52 0.000 0.365 1 NA AGA
chr2 45007598 REV_1 0.2913 1.52 0.000 0.444 1 NA TGA
chr2 45007709 REV_1 0.2759 1.52 0.000 0.420 1 NA TGA
chr3 71010245 REV_2 0.4054 1.16 0.461 0.469 1 NA AGA
chr3 71010553 REV_2 0.2387 1.16 0.461 0.276 1 NA AGA
chr3 71011404 REV_2 0.3640 1.16 0.461 0.421 1 NA GGA
chr3 71011410 REV_2 0.3494 1.16 0.461 0.405 1 NA AGA
chr3 71011474 REV_2 0.3302 1.16 0.461 0.382 1 NA AGA
chr3 71327944 FWD_1 0.2494 1.26 0.438 0.314 1 NA TCC
Description
knitr::kable(kat$description, caption = "Kataegis column description.")
Kataegis column description.
ID Description
KT Forward/reverse kataegis id
MH Microhomology
PURPLE_AF Purity adjusted allelic frequency of variant
PURPLE_CN Purity adjusted copy number surrounding variant location
PURPLE_MACN Purity adjusted minor allele ploidy surrounding variant location
PURPLE_VCN Purity adjusted ploidy of variant
SUBCL Non-zero subclonal likelihood
TNC Tri-nucleotide context

QC

qc <- system.file("extdata/purple/purple.qc", package = "gpgr") |>
  gpgr::purple_qc_read()

qc$summary |>
  knitr::kable(caption = "PURPLE QC Summary Table.")
PURPLE QC Summary Table.
n variable value details
1 QC_Status FAIL_CONTAMINATION See ‘Description’.
13 Method NORMAL Fit method (NORMAL, HIGHLY_DIPLOID, SOMATIC or NO_TUMOR).
14 CopyNumberSegments 1428 (Unsupported: 2) # of CN segments.
2 Purity 0.8600
17 Gender Amber: MALE; Cobalt: MALE
14 DeletedGenes 150 # of homozygously deleted genes.
15 Contamination 0.807 Rate of contamination in tumor sample as determined by AMBER.
16 GermlineAberrations NONE Can be one or more of: KLINEFELTER, TRISOMY_X/21/13/18/15, XYY, MOSAIC_X.
18 AmberMeanDepth 128 Mean depth as determined by AMBER.

Session Info

Main packages used in this vignette.
package version datestamp source
base 4.2.3 2024-06-16 local
gpgr 2.1.1 2024-08-20 local
Platform information.
name value
version R version 4.2.3 (2023-03-15)
os Ubuntu 22.04.4 LTS
system x86_64, linux-gnu
ui X11
language en
collate C.UTF-8
ctype C.UTF-8
tz Etc/UTC
date 2024-08-20
pandoc 3.3 @ /home/runner/micromamba/envs/pkgdownenv/bin/ (via rmarkdown)