Introduction
PURPLE: Purity Ploidy Estimator (https://github.com/hartwigmedical/hmftools/tree/master/purple).
PURPLE combines B-allele frequency, read depth ratios, small variants
and structural variants to estimate the purity and copy number profile
of a tumor sample.
It outputs several files, some of which are displayed below.
Data Munging
Somatic CNVs (per chromosome)
Description
cnv_som$descr |>
knitr::kable(caption = "PURPLE Somatic CNVs (per chromosome) Columns.")
PURPLE Somatic CNVs (per chromosome) Columns.
Column |
Description |
Chr/Start/End |
Coordinates of copy number segment |
CN |
Fitted absolute copy number of segment adjusted for
purity and ploidy |
CN Min+Maj |
CopyNumber of minor + major allele adjusted for
purity |
Start/End SegSupport |
Type of SV support for the CN breakpoint at start/end
of region. Allowed values: CENTROMERE, TELOMERE, INV, DEL, DUP, BND
(translocation), SGL (single breakend SV support), NONE (no SV support
for CN breakpoint), MULT (multiple SV support at exact breakpoint) |
Method |
Method used to determine the CN of the region. Allowed
values: BAF_WEIGHTED (avg of all depth windows for the region),
STRUCTURAL_VARIANT (inferred using ploidy of flanking SVs), LONG_ARM
(inferred from the long arm), GERMLINE_AMPLIFICATION (inferred using
special logic to handle regions of germline amplification) |
BAF (count) |
Tumor BAF after adjusted for purity and ploidy (Count
of AMBER baf points covered by this segment) |
GC (windowCount) |
Proportion of segment that is G or C (Count of COBALT
windows covered by this segment) |
cnv_som$tab |>
dplyr::slice(1:10) |>
knitr::kable(caption = "PURPLE Somatic CNVs (per chromosome) Summary Table.")
PURPLE Somatic CNVs (per chromosome) Summary Table.
Chr |
Start |
End |
CN |
CN Min+Maj |
Start/End SegSupport |
Method |
BAF (count) |
GC (windowCount) |
chr1 |
1 |
123605522 |
1.0 |
0+1 |
TELOMERE-CENTROMERE |
BAF_WEIGHTED |
0.98 (20830) |
0.42 (107822) |
chr1 |
123605523 |
200044314 |
2.0 |
1+1 |
CENTROMERE-DUP |
BAF_WEIGHTED |
0.5 (10239) |
0.4 (47332) |
chr1 |
200044315 |
200044570 |
2.8 |
1+1.8 |
DUP-DUP |
STRUCTURAL_VARIANT |
0.65 (0) |
0 (0) |
chr1 |
200044571 |
248956422 |
2.0 |
1+1 |
DUP-TELOMERE |
BAF_WEIGHTED |
0.5 (10341) |
0.42 (43373) |
chr2 |
1 |
93139350 |
2.0 |
1+1 |
TELOMERE-CENTROMERE |
BAF_WEIGHTED |
0.51 (19339) |
0.41 (81949) |
chr2 |
93139351 |
219955359 |
2.0 |
1+1 |
CENTROMERE-BND |
BAF_WEIGHTED |
0.5 (19047) |
0.39 (112493) |
chr2 |
219955360 |
225225069 |
1.0 |
0+1 |
BND-BND |
BAF_WEIGHTED |
0.98 (1284) |
0.4 (5069) |
chr2 |
225225070 |
242193529 |
2.0 |
1+1 |
BND-TELOMERE |
BAF_WEIGHTED |
0.51 (4236) |
0.44 (15099) |
chr3 |
1 |
92214015 |
1.0 |
0+1 |
TELOMERE-CENTROMERE |
BAF_WEIGHTED |
0.98 (18629) |
0.4 (83984) |
chr3 |
92214016 |
198295559 |
2.0 |
1+1 |
CENTROMERE-TELOMERE |
BAF_WEIGHTED |
0.5 (21114) |
0.39 (95873) |
Somatic CNVs (per gene)
Description
cnv_som_gene$descr |>
knitr::kable(caption = "PURPLE Somatic CNVs (per gene) Columns.")
PURPLE Somatic CNVs (per gene) Columns.
Column |
Description |
gene |
Name of gene |
minCN/maxCN |
Min/Max copy number found in gene exons |
chrom/start/end |
Chromosome/start/end location of gene transcript |
chrBand |
Chromosome band of the gene |
onco_or_ts |
oncogene (‘oncogene’), tumor suppressor (‘tsgene’), or
both (‘onco+ts’), as reported by Cancermine
|
transcriptID |
Ensembl transcript ID (dot version) |
minMinorAlleleCN |
Minimum allele ploidy found over the gene exons -
useful for identifying LOH events |
somReg (somaticRegions) |
Count of somatic copy number regions this gene
spans |
minReg (minRegions) |
Number of somatic regions inside the gene that share
the min copy number |
minRegStartEnd |
Start/End base of the copy number region overlapping
the gene with the minimum copy number |
minRegSupportStartEndMethod |
Start/end support of the CN region overlapping the gene
with the min CN (plus determination method) |
cnv_som_gene$tab |>
dplyr::slice(1:10) |>
knitr::kable(caption = "PURPLE Somatic CNVs (per gene) Summary Table.")
PURPLE Somatic CNVs (per gene) Summary Table.
gene |
minCN |
maxCN |
chrom |
start |
end |
chrBand |
onco_or_ts |
transcriptID |
minMinorAlleleCN |
somReg |
minReg |
minRegStartEnd |
minRegSupportStartEndMethod |
SDHA |
5.6640 |
5.6640 |
chr5 |
218303 |
257082 |
p15.33 |
tsgene |
ENST00000264932 |
1.8901 |
1 |
1 |
89179-297781 |
DEL-DEL (BAF_WEIGHTED) |
DUSP22 |
4.6638 |
4.6638 |
chr6 |
291630 |
351355 |
p25.3 |
tsgene |
ENST00000419235 |
1.6754 |
1 |
1 |
1-834611 |
TELOMERE-DEL (BAF_WEIGHTED) |
IRF4 |
4.6638 |
4.6638 |
chr6 |
391752 |
411443 |
p25.3 |
oncogene |
ENST00000380956 |
1.6754 |
1 |
1 |
1-834611 |
TELOMERE-DEL (BAF_WEIGHTED) |
KDM5A |
4.2953 |
4.2953 |
chr12 |
280057 |
389320 |
p13.33 |
oncogene |
ENST00000399788 |
0.0000 |
1 |
1 |
1-1210023 |
TELOMERE-DEL (BAF_WEIGHTED) |
CRLF2 |
3.7271 |
3.7271 |
chrX |
1187549 |
1212723 |
p22.33 |
oncogene |
ENST00000400841 |
0.0000 |
1 |
1 |
924013-2677577 |
DUP-DUP (BAF_WEIGHTED) |
Purity
purity <- system.file("extdata/purple/purple.purity.tsv", package = "gpgr") |>
gpgr::purple_purity_read()
purity$summary |>
knitr::kable(caption = "PURPLE Purity Summary Table.")
PURPLE Purity Summary Table.
n |
variable |
value |
details |
2 |
Purity |
0.99 (0.98-1) |
Purity of tumor in the sample (and min-max with score
within 10% of best). |
3 |
Ploidy |
2.86 (2.78-2.94) |
Average ploidy of tumor sample after adjusting for
purity (and min-max with score within 10% of best). |
4 |
Gender |
MALE |
Gender as inferred by AMBER/COBALT. |
7 |
WGD |
TRUE |
Whole genome duplication (more than 10 autosomes have
average major allele ploidy > 1.5). |
8 |
MSI (indels/Mb) |
MSS (0.19) |
MSI status (MSI, MSS or UNKNOWN if somatic variants not
supplied) & MS Indels per Mb. |
9 |
PolyclonalProp |
0.13 |
Proportion of CN regions that are more than 0.25 from a
whole CN |
10 |
DiploidyProp |
0.03 (0.02-0.04) |
Proportion of CN regions that have 1 (+- 0.2) minor and
major allele. |
11 |
TMB |
15.13 (HIGH) |
Tumor mutational burden (# PASS variants per Megabase)
(Status: ‘HIGH’ (>10 PASS per Mb), ‘LOW’ or ‘UNKNOWN’). |
12 |
TML |
349 (HIGH) |
Tumor mutational load (# of missense variants) (Status:
‘HIGH’, ‘LOW’ or ‘UNKNOWN’). |
13 |
TMB-SV |
1267 |
# of non inferred, non single passing SVs. |
Kataegis
kat <- system.file("extdata/purple/purple.somatic.vcf.gz", package = "gpgr") |>
purple_kataegis()
kat$data |>
dplyr::slice(1:10) |>
knitr::kable(caption = "PURPLE Kataegis Table.")
PURPLE Kataegis Table.
CHROM |
POS |
KT |
PURPLE_AF |
PURPLE_CN |
PURPLE_MACN |
PURPLE_VCN |
SUBCL |
MH |
TNC |
chr2 |
45006212 |
REV_1 |
0.2601 |
1.52 |
0.000 |
0.396 |
1 |
NA |
GGA |
chr2 |
45007103 |
REV_1 |
0.2399 |
1.52 |
0.000 |
0.365 |
1 |
NA |
AGA |
chr2 |
45007598 |
REV_1 |
0.2913 |
1.52 |
0.000 |
0.444 |
1 |
NA |
TGA |
chr2 |
45007709 |
REV_1 |
0.2759 |
1.52 |
0.000 |
0.420 |
1 |
NA |
TGA |
chr3 |
71010245 |
REV_2 |
0.4054 |
1.16 |
0.461 |
0.469 |
1 |
NA |
AGA |
chr3 |
71010553 |
REV_2 |
0.2387 |
1.16 |
0.461 |
0.276 |
1 |
NA |
AGA |
chr3 |
71011404 |
REV_2 |
0.3640 |
1.16 |
0.461 |
0.421 |
1 |
NA |
GGA |
chr3 |
71011410 |
REV_2 |
0.3494 |
1.16 |
0.461 |
0.405 |
1 |
NA |
AGA |
chr3 |
71011474 |
REV_2 |
0.3302 |
1.16 |
0.461 |
0.382 |
1 |
NA |
AGA |
chr3 |
71327944 |
FWD_1 |
0.2494 |
1.26 |
0.438 |
0.314 |
1 |
NA |
TCC |
Description
knitr::kable(kat$description, caption = "Kataegis column description.")
Kataegis column description.
ID |
Description |
KT |
Forward/reverse kataegis id |
MH |
Microhomology |
PURPLE_AF |
Purity adjusted allelic frequency of variant |
PURPLE_CN |
Purity adjusted copy number surrounding variant
location |
PURPLE_MACN |
Purity adjusted minor allele ploidy surrounding variant
location |
PURPLE_VCN |
Purity adjusted ploidy of variant |
SUBCL |
Non-zero subclonal likelihood |
TNC |
Tri-nucleotide context |
QC
qc <- system.file("extdata/purple/purple.qc", package = "gpgr") |>
gpgr::purple_qc_read()
qc$summary |>
knitr::kable(caption = "PURPLE QC Summary Table.")
PURPLE QC Summary Table.
n |
variable |
value |
details |
1 |
QC_Status |
FAIL_CONTAMINATION |
See ‘Description’. |
13 |
Method |
NORMAL |
Fit method (NORMAL, HIGHLY_DIPLOID, SOMATIC or
NO_TUMOR). |
14 |
CopyNumberSegments |
1428 (Unsupported: 2) |
# of CN segments. |
2 |
Purity |
0.8600 |
|
17 |
Gender |
Amber: MALE; Cobalt: MALE |
|
14 |
DeletedGenes |
150 |
# of homozygously deleted genes. |
15 |
Contamination |
0.807 |
Rate of contamination in tumor sample as determined by
AMBER. |
16 |
GermlineAberrations |
NONE |
Can be one or more of: KLINEFELTER,
TRISOMY_X/21/13/18/15, XYY, MOSAIC_X. |
18 |
AmberMeanDepth |
128 |
Mean depth as determined by AMBER. |
Session Info
Main packages used in this vignette.
package |
version |
datestamp |
source |
base |
4.2.3 |
2024-06-16 |
local |
gpgr |
2.1.1 |
2024-08-20 |
local |
Platform information.
name |
value |
version |
R version 4.2.3 (2023-03-15) |
os |
Ubuntu 22.04.4 LTS |
system |
x86_64, linux-gnu |
ui |
X11 |
language |
en |
collate |
C.UTF-8 |
ctype |
C.UTF-8 |
tz |
Etc/UTC |
date |
2024-08-20 |
pandoc |
3.3 @ /home/runner/micromamba/envs/pkgdownenv/bin/ (via
rmarkdown) |