Introduction
PURPLE: Purity Ploidy Estimator (https://github.com/hartwigmedical/hmftools/tree/master/purple).
PURPLE combines B-allele frequency, read depth ratios, small variants
and structural variants to estimate the purity and copy number profile
of a tumor sample.
It outputs several files, some of which are displayed below.
Data Munging
Somatic CNVs (per chromosome)
Description
cnv_som$descr |>
knitr::kable(caption = "PURPLE Somatic CNVs (per chromosome) Columns.")
PURPLE Somatic CNVs (per chromosome) Columns.
| Column |
Description |
| Chr/Start/End |
Coordinates of copy number segment |
| CN |
Fitted absolute copy number of segment adjusted for
purity and ploidy |
| CN Min+Maj |
CopyNumber of minor + major allele adjusted for
purity |
| Start/End SegSupport |
Type of SV support for the CN breakpoint at start/end
of region. Allowed values: CENTROMERE, TELOMERE, INV, DEL, DUP, BND
(translocation), SGL (single breakend SV support), NONE (no SV support
for CN breakpoint), MULT (multiple SV support at exact breakpoint) |
| Method |
Method used to determine the CN of the region. Allowed
values: BAF_WEIGHTED (avg of all depth windows for the region),
STRUCTURAL_VARIANT (inferred using ploidy of flanking SVs), LONG_ARM
(inferred from the long arm), GERMLINE_AMPLIFICATION (inferred using
special logic to handle regions of germline amplification) |
| BAF (count) |
Tumor BAF after adjusted for purity and ploidy (Count
of AMBER baf points covered by this segment) |
| GC (windowCount) |
Proportion of segment that is G or C (Count of COBALT
windows covered by this segment) |
cnv_som$tab |>
dplyr::slice(1:10) |>
knitr::kable(caption = "PURPLE Somatic CNVs (per chromosome) Summary Table.")
PURPLE Somatic CNVs (per chromosome) Summary Table.
| Chr |
Start |
End |
CN |
CN Min+Maj |
Start/End SegSupport |
Method |
BAF (count) |
GC (windowCount) |
| chr1 |
1 |
123605522 |
1.0 |
0+1 |
TELOMERE-CENTROMERE |
BAF_WEIGHTED |
0.98 (20830) |
0.42 (107822) |
| chr1 |
123605523 |
200044314 |
2.0 |
1+1 |
CENTROMERE-DUP |
BAF_WEIGHTED |
0.5 (10239) |
0.4 (47332) |
| chr1 |
200044315 |
200044570 |
2.8 |
1+1.8 |
DUP-DUP |
STRUCTURAL_VARIANT |
0.65 (0) |
0 (0) |
| chr1 |
200044571 |
248956422 |
2.0 |
1+1 |
DUP-TELOMERE |
BAF_WEIGHTED |
0.5 (10341) |
0.42 (43373) |
| chr2 |
1 |
93139350 |
2.0 |
1+1 |
TELOMERE-CENTROMERE |
BAF_WEIGHTED |
0.51 (19339) |
0.41 (81949) |
| chr2 |
93139351 |
219955359 |
2.0 |
1+1 |
CENTROMERE-BND |
BAF_WEIGHTED |
0.5 (19047) |
0.39 (112493) |
| chr2 |
219955360 |
225225069 |
1.0 |
0+1 |
BND-BND |
BAF_WEIGHTED |
0.98 (1284) |
0.4 (5069) |
| chr2 |
225225070 |
242193529 |
2.0 |
1+1 |
BND-TELOMERE |
BAF_WEIGHTED |
0.51 (4236) |
0.44 (15099) |
| chr3 |
1 |
92214015 |
1.0 |
0+1 |
TELOMERE-CENTROMERE |
BAF_WEIGHTED |
0.98 (18629) |
0.4 (83984) |
| chr3 |
92214016 |
198295559 |
2.0 |
1+1 |
CENTROMERE-TELOMERE |
BAF_WEIGHTED |
0.5 (21114) |
0.39 (95873) |
Somatic CNVs (per gene)
Description
cnv_som_gene$descr |>
knitr::kable(caption = "PURPLE Somatic CNVs (per gene) Columns.")
PURPLE Somatic CNVs (per gene) Columns.
| Column |
Description |
| gene |
Name of gene |
| minCN/maxCN |
Min/Max copy number found in gene exons |
| chrom/start/end |
Chromosome/start/end location of gene transcript |
| chrBand |
Chromosome band of the gene |
| onco_or_ts |
oncogene (‘oncogene’), tumor suppressor (‘tsgene’), or
both (‘onco+ts’), as reported by Cancermine
|
| transcriptID |
Ensembl transcript ID (dot version) |
| minMinorAlleleCN |
Minimum allele ploidy found over the gene exons -
useful for identifying LOH events |
| somReg (somaticRegions) |
Count of somatic copy number regions this gene
spans |
| minReg (minRegions) |
Number of somatic regions inside the gene that share
the min copy number |
| minRegStartEnd |
Start/End base of the copy number region overlapping
the gene with the minimum copy number |
| minRegSupportStartEndMethod |
Start/end support of the CN region overlapping the gene
with the min CN (plus determination method) |
cnv_som_gene$tab |>
dplyr::slice(1:10) |>
knitr::kable(caption = "PURPLE Somatic CNVs (per gene) Summary Table.")
PURPLE Somatic CNVs (per gene) Summary Table.
| gene |
minCN |
maxCN |
chrom |
start |
end |
chrBand |
onco_or_ts |
transcriptID |
minMinorAlleleCN |
somReg |
minReg |
minRegStartEnd |
minRegSupportStartEndMethod |
| SDHA |
5.6640 |
5.6640 |
chr5 |
218303 |
257082 |
p15.33 |
tsgene |
ENST00000264932 |
1.8901 |
1 |
1 |
89179-297781 |
DEL-DEL (BAF_WEIGHTED) |
| DUSP22 |
4.6638 |
4.6638 |
chr6 |
291630 |
351355 |
p25.3 |
tsgene |
ENST00000419235 |
1.6754 |
1 |
1 |
1-834611 |
TELOMERE-DEL (BAF_WEIGHTED) |
| IRF4 |
4.6638 |
4.6638 |
chr6 |
391752 |
411443 |
p25.3 |
oncogene |
ENST00000380956 |
1.6754 |
1 |
1 |
1-834611 |
TELOMERE-DEL (BAF_WEIGHTED) |
| KDM5A |
4.2953 |
4.2953 |
chr12 |
280057 |
389320 |
p13.33 |
oncogene |
ENST00000399788 |
0.0000 |
1 |
1 |
1-1210023 |
TELOMERE-DEL (BAF_WEIGHTED) |
| CRLF2 |
3.7271 |
3.7271 |
chrX |
1187549 |
1212723 |
p22.33 |
oncogene |
ENST00000400841 |
0.0000 |
1 |
1 |
924013-2677577 |
DUP-DUP (BAF_WEIGHTED) |
Purity
purity <- system.file("extdata/purple/purple.purity.tsv", package = "gpgr") |>
gpgr::purple_purity_read()
purity$summary |>
knitr::kable(caption = "PURPLE Purity Summary Table.")
PURPLE Purity Summary Table.
| n |
variable |
value |
details |
| 2 |
Purity |
0.99 (0.98-1) |
Purity of tumor in the sample (and min-max with score
within 10% of best). |
| 3 |
Ploidy |
2.86 (2.78-2.94) |
Average ploidy of tumor sample after adjusting for
purity (and min-max with score within 10% of best). |
| 4 |
Gender |
MALE |
Gender as inferred by AMBER/COBALT. |
| 7 |
WGD |
TRUE |
Whole genome duplication (more than 10 autosomes have
average major allele ploidy > 1.5). |
| 8 |
PolyclonalProp |
0.13 |
Proportion of CN regions that are more than 0.25 from a
whole CN |
| 9 |
DiploidyProp |
0.03 (0.02-0.04) |
Proportion of CN regions that have 1 (+- 0.2) minor and
major allele. |
| 10 |
TMB |
15.13 (HIGH) |
Tumor mutational burden (# PASS variants per Megabase)
(Status: ‘HIGH’ (>10 PASS per Mb), ‘LOW’ or ‘UNKNOWN’). |
| 11 |
TML |
349 (HIGH) |
Tumor mutational load (# of missense variants) (Status:
‘HIGH’, ‘LOW’ or ‘UNKNOWN’). |
| 12 |
TMB-SV |
1267 |
# of non inferred, non single passing SVs. |
QC
qc <- system.file("extdata/purple/purple.qc", package = "gpgr") |>
gpgr::purple_qc_read()
qc$summary |>
knitr::kable(caption = "PURPLE QC Summary Table.")
PURPLE QC Summary Table.
| n |
variable |
value |
details |
| 1 |
QC_Status |
FAIL_CONTAMINATION |
See ‘Description’. |
| 13 |
Method |
NORMAL |
Fit method (NORMAL, HIGHLY_DIPLOID, SOMATIC or
NO_TUMOR). |
| 14 |
CopyNumberSegments |
1428 (Unsupported: 2) |
# of CN segments. |
| 2 |
Purity |
0.8600 |
|
| 17 |
Gender |
Amber: MALE; Cobalt: MALE |
|
| 14 |
DeletedGenes |
2 |
# of homozygously deleted genes. |
| 15 |
Contamination |
0.05 |
Rate of contamination in tumor sample as determined by
AMBER. |
| 16 |
GermlineAberrations |
NONE |
Can be one or more of: KLINEFELTER,
TRISOMY_X/21/13/18/15, XYY, MOSAIC_X. |
| 18 |
AmberMeanDepth |
30 |
Mean depth as determined by AMBER. |
Session Info
Main packages used in this vignette.
| package |
version |
datestamp |
source |
| base |
4.2.3 |
2024-06-16 |
local |
| gpgr |
2.2.11 |
2025-10-01 |
local |
Platform information.
| name |
value |
| version |
R version 4.2.3 (2023-03-15) |
| os |
Ubuntu 24.04.3 LTS |
| system |
x86_64, linux-gnu |
| ui |
X11 |
| language |
en |
| collate |
C.UTF-8 |
| ctype |
C.UTF-8 |
| tz |
Etc/UTC |
| date |
2025-10-01 |
| pandoc |
3.8.1 @ /home/runner/micromamba/envs/pkgdownenv/bin/
(via rmarkdown) |