Skip to contents

Introduction

PURPLE: Purity Ploidy Estimator (https://github.com/hartwigmedical/hmftools/tree/master/purple).

PURPLE combines B-allele frequency, read depth ratios, small variants and structural variants to estimate the purity and copy number profile of a tumor sample.

It outputs several files, some of which are displayed below.

Data Munging

Somatic CNVs (per chromosome)

cnv_som <- system.file("extdata/purple/purple.cnv.somatic.tsv", package = "gpgr") |>
  gpgr::purple_cnv_som_process()
Description
cnv_som$descr |>
  knitr::kable(format = "html", caption = "PURPLE Somatic CNVs (per chromosome) Columns.")
PURPLE Somatic CNVs (per chromosome) Columns.
Column Description
Chr/Start/End Coordinates of copy number segment
CN Fitted absolute copy number of segment adjusted for purity and ploidy
CN Min+Maj CopyNumber of minor + major allele adjusted for purity
Start/End SegSupport Type of SV support for the CN breakpoint at start/end of region. Allowed values: CENTROMERE, TELOMERE, INV, DEL, DUP, BND (translocation), SGL (single breakend SV support), NONE (no SV support for CN breakpoint), MULT (multiple SV support at exact breakpoint)
Method Method used to determine the CN of the region. Allowed values: BAF_WEIGHTED (avg of all depth windows for the region), STRUCTURAL_VARIANT (inferred using ploidy of flanking SVs), LONG_ARM (inferred from the long arm), GERMLINE_AMPLIFICATION (inferred using special logic to handle regions of germline amplification)
BAF (count) Tumor BAF after adjusted for purity and ploidy (Count of AMBER baf points covered by this segment)
GC (windowCount) Proportion of segment that is G or C (Count of COBALT windows covered by this segment)
cnv_som$tab |>
  dplyr::slice(1:10) |>
  knitr::kable(format = "html", caption = "PURPLE Somatic CNVs (per chromosome) Summary Table.")
PURPLE Somatic CNVs (per chromosome) Summary Table.
Chr Start End CN CN Min+Maj Start/End SegSupport Method BAF (count) GC (windowCount)
chr1 1 123605522 1.0 0+1 TELOMERE-CENTROMERE BAF_WEIGHTED 0.98 (20830) 0.42 (107822)
chr1 123605523 200044314 2.0 1+1 CENTROMERE-DUP BAF_WEIGHTED 0.5 (10239) 0.4 (47332)
chr1 200044315 200044570 2.8 1+1.8 DUP-DUP STRUCTURAL_VARIANT 0.65 (0) 0 (0)
chr1 200044571 248956422 2.0 1+1 DUP-TELOMERE BAF_WEIGHTED 0.5 (10341) 0.42 (43373)
chr2 1 93139350 2.0 1+1 TELOMERE-CENTROMERE BAF_WEIGHTED 0.51 (19339) 0.41 (81949)
chr2 93139351 219955359 2.0 1+1 CENTROMERE-BND BAF_WEIGHTED 0.5 (19047) 0.39 (112493)
chr2 219955360 225225069 1.0 0+1 BND-BND BAF_WEIGHTED 0.98 (1284) 0.4 (5069)
chr2 225225070 242193529 2.0 1+1 BND-TELOMERE BAF_WEIGHTED 0.51 (4236) 0.44 (15099)
chr3 1 92214015 1.0 0+1 TELOMERE-CENTROMERE BAF_WEIGHTED 0.98 (18629) 0.4 (83984)
chr3 92214016 198295559 2.0 1+1 CENTROMERE-TELOMERE BAF_WEIGHTED 0.5 (21114) 0.39 (95873)

Somatic CNVs (per gene)

umccr_key_genes <- system.file("extdata/ref/umccr_cancer_genes_2019-03-20.tsv", package = "gpgr")
cnv_som_gene <- system.file("extdata/purple/purple.cnv.gene.tsv", package = "gpgr") |>
  gpgr::purple_cnv_som_gene_process(g = umccr_key_genes)
Description
cnv_som_gene$descr |>
  knitr::kable(format = "html", caption = "PURPLE Somatic CNVs (per gene) Columns.")
PURPLE Somatic CNVs (per gene) Columns.
Column Description
gene Name of gene
minCN/maxCN Min/Max copy number found in gene exons
chrom/start/end Chromosome/start/end location of gene transcript
chrBand Chromosome band of the gene
onco_or_ts oncogene (‘oncogene’), tumor suppressor (‘tsgene’), or both (‘onco+ts’), as reported by Cancermine
transcriptID Ensembl transcript ID (dot version)
minMinorAlleleCN Minimum allele ploidy found over the gene exons - useful for identifying LOH events
somReg (somaticRegions) Count of somatic copy number regions this gene spans
germDelReg (germlineHomDeletionRegions / germlineHetToHomDeletionRegions) Number of regions spanned by this gene that are (homozygously deleted in the germline / both heterozygously deleted in the germline and homozygously deleted in the tumor)
minReg (minRegions) Number of somatic regions inside the gene that share the min copy number
minRegStartEnd Start/End base of the copy number region overlapping the gene with the minimum copy number
minRegSupportStartEndMethod Start/end support of the CN region overlapping the gene with the min CN (plus determination method)
cnv_som_gene$tab |>
  dplyr::slice(1:10) |>
  knitr::kable(format = "html", caption = "PURPLE Somatic CNVs (per gene) Summary Table.")
PURPLE Somatic CNVs (per gene) Summary Table.
gene minCN maxCN chrom start end chrBand onco_or_ts transcriptID minMinorAlleleCN somReg germDelReg minReg minRegStartEnd minRegSupportStartEndMethod
CRBN 1.0268 1.0268 chr3 3150011 3179710 p26.2 ENST00000231948.8 0.0176 1 0/0 1 1-92214015 TELOMERE-CENTROMERE (BAF_WEIGHTED)
SDHA 2.0123 2.0123 chr5 218241 256700 p15.33 tsgene ENST00000264932.10 0.9939 1 0/0 1 1-48272853 TELOMERE-CENTROMERE (BAF_WEIGHTED)
DUSP22 2.0037 2.0037 chr6 292462 351353 p25.3 ENST00000419235.6 0.9868 1 0/0 1 1-59191910 TELOMERE-CENTROMERE (BAF_WEIGHTED)
IRF4 2.0037 2.0037 chr6 391739 411447 p25.3 oncogene ENST00000380956.8 0.9868 1 0/0 1 1-59191910 TELOMERE-CENTROMERE (BAF_WEIGHTED)
FOXQ1 2.0037 2.0037 chr6 1312440 1314748 p25.3 ENST00000296839.4 0.9868 1 0/0 1 1-59191910 TELOMERE-CENTROMERE (BAF_WEIGHTED)
DOCK8 1.9900 1.9900 chr9 214865 465259 p24.3 ENST00000432829.6 0.9886 1 0/0 1 1-44377362 TELOMERE-CENTROMERE (BAF_WEIGHTED)
LARP4B 1.9899 1.9899 chr10 806914 931705 p15.3 tsgene ENST00000612396.4 0.9825 1 0/0 1 1-40640101 TELOMERE-CENTROMERE (BAF_WEIGHTED)
SIRT3 2.0012 2.0012 chr11 215458 236431 p15.5 tsgene ENST00000382743.8 0.9888 1 0/0 1 1-52751710 TELOMERE-CENTROMERE (BAF_WEIGHTED)
KDM5A 1.9962 1.9962 chr12 280129 389454 p13.33 oncogene ENST00000399788.6 0.9879 1 0/0 1 1-35977329 TELOMERE-CENTROMERE (BAF_WEIGHTED)
ZMYM2 2.0017 2.0017 chr13 19958670 20091829 q12.11 ENST00000610343.4 0.9944 1 0/0 1 17025624-31577000 CENTROMERE-NONE (BAF_WEIGHTED)

Germline CNVs (per chromosome)

cnv_germ <- system.file("extdata/purple/purple.cnv.germline.tsv", package = "gpgr") |>
  gpgr::purple_cnv_germ_process()
Description
cnv_germ$descr |>
  knitr::kable(format = "html", caption = "PURPLE Germline CNVs (per chromosome) Columns.")
PURPLE Germline CNVs (per chromosome) Columns.
Column Description
Chr/Start/End Coordinates of copy number segment
CN Fitted absolute copy number of segment adjusted for purity and ploidy
CN Min+Maj CopyNumber of minor + major allele adjusted for purity
Start/End SegSupport Type of SV support for the CN breakpoint at start/end of region. Allowed values: CENTROMERE, TELOMERE, INV, DEL, DUP, BND (translocation), SGL (single breakend SV support), NONE (no SV support for CN breakpoint), MULT (multiple SV support at exact breakpoint)
Method Method used to determine the CN of the region. Allowed values: BAF_WEIGHTED (avg of all depth windows for the region), STRUCTURAL_VARIANT (inferred using ploidy of flanking SVs), LONG_ARM (inferred from the long arm), GERMLINE_AMPLIFICATION (inferred using special logic to handle regions of germline amplification)
BAF (count) Tumor BAF after adjusted for purity and ploidy (Count of AMBER baf points covered by this segment)
GC (windowCount) Proportion of segment that is G or C (Count of COBALT windows covered by this segment)
cnv_germ$tab |>
  dplyr::slice(1:10) |>
  knitr::kable(format = "html", caption = "PURPLE Germline CNVs (per chromosome) Summary Table.")
PURPLE Germline CNVs (per chromosome) Summary Table.
Chr Start End CN CN Min+Maj Start/End SegSupport Method BAF (count) GC (windowCount)
chr1 7510001 7511000 0.0 0+0 NONE-UNKNOWN GERMLINE_HET2HOM_DELETION 0.98 (0) 0.52 (1)
chr1 14110001 14113000 0.4 0+0.4 NONE-UNKNOWN GERMLINE_HET2HOM_DELETION 0.98 (0) 0.47 (3)
chr1 15825001 15829000 0.4 0+0.3 NONE-UNKNOWN GERMLINE_HET2HOM_DELETION 0.98 (0) 0.49 (2)
chr1 58278001 58279000 0.3 0+0.3 NONE-UNKNOWN GERMLINE_HET2HOM_DELETION 0.98 (0) 0.44 (1)
chr1 61617001 61618000 0.1 0+0.1 NONE-UNKNOWN GERMLINE_HET2HOM_DELETION 0.98 (0) 0.39 (1)
chr1 79756001 79757000 0.0 0+0 NONE-UNKNOWN GERMLINE_HOM_DELETION 0.98 (0) 0.35 (1)
chr1 85935001 85939000 0.2 0+0.2 NONE-UNKNOWN GERMLINE_HET2HOM_DELETION 0.98 (0) 0.41 (4)
chr1 89010001 89013000 0.1 0+0.1 NONE-UNKNOWN GERMLINE_HET2HOM_DELETION 0.98 (0) 0.42 (3)
chr1 105473001 105481000 0.1 0+0.1 NONE-UNKNOWN GERMLINE_HET2HOM_DELETION 0.98 (0) 0.37 (8)
chr1 110834001 110845000 0.0 0+0 NONE-UNKNOWN GERMLINE_HET2HOM_DELETION 0.98 (0) 0.33 (9)

Purity

purity <- system.file("extdata/purple/purple.purity.tsv", package = "gpgr") |>
  gpgr::purple_purity_read()

purity$summary |>
  knitr::kable(format = "html", caption = "PURPLE Purity Summary Table.")
PURPLE Purity Summary Table.
n variable value details
2 Purity 0.75 (0.71-0.78) Purity of tumor in the sample (and min-max with score within 10% of best).
3 Ploidy 1.87 (1.86-1.88) Average ploidy of tumor sample after adjusting for purity (and min-max with score within 10% of best).
4 Gender MALE Gender as inferred by AMBER/COBALT.
7 WGD FALSE Whole genome duplication (more than 10 autosomes have average major allele ploidy > 1.5).
8 MSI (indels/Mb) MSS (0) MSI status (MSI, MSS or UNKNOWN if somatic variants not supplied) & MS Indels per Mb.
9 PolyclonalProp 0 Proportion of CN regions that are more than 0.25 from a whole CN
10 DiploidyProp 0.86 (0.86-0.86) Proportion of CN regions that have 1 (+- 0.2) minor and major allele.
11 TMB 0 (LOW) Tumor mutational burden (# PASS variants per Megabase) (Status: ‘HIGH’ (>10 PASS per Mb), ‘LOW’ or ‘UNKNOWN’).
12 TML 0 (LOW) Tumor mutational load (# of missense variants) (Status: ‘HIGH’, ‘LOW’ or ‘UNKNOWN’).
13 TMB-SV 713 # of non inferred, non single passing SVs.

Kataegis

kat <- system.file("extdata/purple/purple.somatic.vcf.gz", package = "gpgr") |>
  purple_kataegis()
kat$data |>
  knitr::kable(format = "html", caption = "PURPLE Kataegis Table.")
PURPLE Kataegis Table.
CHROM POS KT AF PURPLE_AF PURPLE_CN PURPLE_MACN PURPLE_VCN SUBCL MH TNC
chr2 45006212 REV_1 0.2143 0.2601 1.52 0.000 0.396 1 NA GGA
chr2 45007103 REV_1 0.1977 0.2399 1.52 0.000 0.365 1 NA AGA
chr2 45007598 REV_1 0.24 0.2913 1.52 0.000 0.444 1 NA TGA
chr2 45007709 REV_1 0.2273 0.2759 1.52 0.000 0.420 1 NA TGA
chr3 71010245 REV_2 0.3165 0.4054 1.16 0.461 0.469 1 NA AGA
chr3 71010553 REV_2 0.1863 0.2387 1.16 0.461 0.276 1 NA AGA
chr3 71011404 REV_2 0.2841 0.3640 1.16 0.461 0.421 1 NA GGA
chr3 71011410 REV_2 0.2727 0.3494 1.16 0.461 0.405 1 NA AGA
chr3 71011474 REV_2 0.2577 0.3302 1.16 0.461 0.382 1 NA AGA
chr3 71327944 FWD_1 0.1982 0.2494 1.26 0.438 0.314 1 NA TCC
chr3 71327987 FWD_1 0.1947 0.2450 1.26 0.438 0.309 1 NA TCA
chr3 71327997 FWD_1 0.2 0.2517 1.26 0.438 0.317 1 NA TCA
chr3 71328028 FWD_1 0.15 0.1888 1.26 0.438 0.238 1 NA TCA
chr3 71328321 FWD_1 0.1951 0.2455 1.26 0.438 0.309 1 NA TCA
chr3 71328545 FWD_1 0.2188 0.2862 1.06 0.000 0.302 1 NA TCC
chr3 71329130 FWD_1 0.3295 0.4311 1.06 0.000 0.455 1 NA TCA
chr3 71329696 FWD_1 0.3671 0.4803 1.06 0.000 0.507 1 NA TCA
chr3 78940888 REV_3 0.2525 0.3301 1.06 0.392 0.350 1 NA AGA
chr3 78941023 REV_3 0.2589 0.3385 1.06 0.392 0.359 1 NA AGA
chr3 78941779 REV_3 0.3368 0.4403 1.06 0.392 0.467 1 NA TGA
chr3 78941859 REV_3 0.3488 0.4560 1.06 0.392 0.483 1 NA GGA
chr3 78942061 REV_3 0.2989 0.3907 1.06 0.392 0.414 1 NA GGA
chr3 78942101 REV_3 0.3077 0.4022 1.06 0.392 0.426 1 NA AGA
chr3 78942282 REV_3 0.3222 0.4212 1.06 0.392 0.446 1 NA AGA
chr3 78942348 REV_3 0.2892 0.3780 1.06 0.392 0.401 1 NA AGA
chr3 78942562 REV_3 0.2115 0.2765 1.06 0.392 0.293 1 NA AGA
Description
knitr::kable(kat$description, format = "html", caption = "Kataegis column description.")
Kataegis column description.
ID Description
AF Allele Frequency, for each ALT allele, in the same order as listed
KT Forward/reverse kataegis id
MH Microhomology
PURPLE_AF Purity adjusted allelic frequency of variant
PURPLE_CN Purity adjusted copy number surrounding variant location
PURPLE_MACN Purity adjusted minor allele ploidy surrounding variant location
PURPLE_VCN Purity adjusted ploidy of variant
SUBCL Non-zero subclonal likelihood
TNC Tri-nucleotide context

QC

qc <- system.file("extdata/purple/purple.qc", package = "gpgr") |>
  gpgr::purple_qc_read()

qc$summary |>
  knitr::kable(format = "html", caption = "PURPLE QC Summary Table.")
PURPLE QC Summary Table.
n variable value details
1 QC_Status WARN_DELETED_GENES See ‘Description’.
13 Method NORMAL Fit method (NORMAL, HIGHLY_DIPLOID, SOMATIC or NO_TUMOR).
14 CopyNumberSegments 1428 (Unsupported: 0) # of CN segments.
2 Purity 0.8600
17 Gender Amber: MALE; Cobalt: MALE
14 DeletedGenes 7782 # of homozygously deleted genes.
15 Contamination 0.0 Rate of contamination in tumor sample as determined by AMBER.
16 GermlineAberrations NONE Can be one or more of: KLINEFELTER, TRISOMY_X/21/13/18/15, XYY, MOSAIC_X.

Session Info

Main packages used in this vignette.
package version datestamp source
base 4.2.3 2023-07-13 local
gpgr 1.5.0 2023-08-22 local
Platform information.
name value
version R version 4.2.3 (2023-03-15)
os Ubuntu 22.04.3 LTS
system x86_64, linux-gnu
ui X11
language en
collate C.UTF-8
ctype C.UTF-8
tz Etc/UTC
date 2023-08-22
pandoc 3.1.3 @ /home/runner/micromamba/envs/pkgdownenv/bin/ (via rmarkdown)