Introduction
PURPLE: Purity Ploidy Estimator (https://github.com/hartwigmedical/hmftools/tree/master/purple).
PURPLE combines B-allele frequency, read depth ratios, small variants
and structural variants to estimate the purity and copy number profile
of a tumor sample.
It outputs several files, some of which are displayed below.
Data Munging
Somatic CNVs (per chromosome)
Description
cnv_som$descr |>
knitr::kable(format = "html", caption = "PURPLE Somatic CNVs (per chromosome) Columns.")
PURPLE Somatic CNVs (per chromosome) Columns.
Column
|
Description
|
Chr/Start/End
|
Coordinates of copy number segment
|
CN
|
Fitted absolute copy number of segment adjusted for purity and ploidy
|
CN Min+Maj
|
CopyNumber of minor + major allele adjusted for purity
|
Start/End SegSupport
|
Type of SV support for the CN breakpoint at start/end of region. Allowed
values: CENTROMERE, TELOMERE, INV, DEL, DUP, BND (translocation), SGL
(single breakend SV support), NONE (no SV support for CN breakpoint),
MULT (multiple SV support at exact breakpoint)
|
Method
|
Method used to determine the CN of the region. Allowed values:
BAF_WEIGHTED (avg of all depth windows for the region),
STRUCTURAL_VARIANT (inferred using ploidy of flanking SVs), LONG_ARM
(inferred from the long arm), GERMLINE_AMPLIFICATION (inferred using
special logic to handle regions of germline amplification)
|
BAF (count)
|
Tumor BAF after adjusted for purity and ploidy (Count of AMBER baf
points covered by this segment)
|
GC (windowCount)
|
Proportion of segment that is G or C (Count of COBALT windows covered by
this segment)
|
cnv_som$tab |>
dplyr::slice(1:10) |>
knitr::kable(format = "html", caption = "PURPLE Somatic CNVs (per chromosome) Summary Table.")
PURPLE Somatic CNVs (per chromosome) Summary Table.
Chr
|
Start
|
End
|
CN
|
CN Min+Maj
|
Start/End SegSupport
|
Method
|
BAF (count)
|
GC (windowCount)
|
chr1
|
1
|
123605522
|
1.0
|
0+1
|
TELOMERE-CENTROMERE
|
BAF_WEIGHTED
|
0.98 (20830)
|
0.42 (107822)
|
chr1
|
123605523
|
200044314
|
2.0
|
1+1
|
CENTROMERE-DUP
|
BAF_WEIGHTED
|
0.5 (10239)
|
0.4 (47332)
|
chr1
|
200044315
|
200044570
|
2.8
|
1+1.8
|
DUP-DUP
|
STRUCTURAL_VARIANT
|
0.65 (0)
|
0 (0)
|
chr1
|
200044571
|
248956422
|
2.0
|
1+1
|
DUP-TELOMERE
|
BAF_WEIGHTED
|
0.5 (10341)
|
0.42 (43373)
|
chr2
|
1
|
93139350
|
2.0
|
1+1
|
TELOMERE-CENTROMERE
|
BAF_WEIGHTED
|
0.51 (19339)
|
0.41 (81949)
|
chr2
|
93139351
|
219955359
|
2.0
|
1+1
|
CENTROMERE-BND
|
BAF_WEIGHTED
|
0.5 (19047)
|
0.39 (112493)
|
chr2
|
219955360
|
225225069
|
1.0
|
0+1
|
BND-BND
|
BAF_WEIGHTED
|
0.98 (1284)
|
0.4 (5069)
|
chr2
|
225225070
|
242193529
|
2.0
|
1+1
|
BND-TELOMERE
|
BAF_WEIGHTED
|
0.51 (4236)
|
0.44 (15099)
|
chr3
|
1
|
92214015
|
1.0
|
0+1
|
TELOMERE-CENTROMERE
|
BAF_WEIGHTED
|
0.98 (18629)
|
0.4 (83984)
|
chr3
|
92214016
|
198295559
|
2.0
|
1+1
|
CENTROMERE-TELOMERE
|
BAF_WEIGHTED
|
0.5 (21114)
|
0.39 (95873)
|
Somatic CNVs (per gene)
Description
cnv_som_gene$descr |>
knitr::kable(format = "html", caption = "PURPLE Somatic CNVs (per gene) Columns.")
PURPLE Somatic CNVs (per gene) Columns.
Column
|
Description
|
gene
|
Name of gene
|
minCN/maxCN
|
Min/Max copy number found in gene exons
|
chrom/start/end
|
Chromosome/start/end location of gene transcript
|
chrBand
|
Chromosome band of the gene
|
onco_or_ts
|
oncogene (‘oncogene’), tumor suppressor (‘tsgene’), or both (‘onco+ts’),
as reported by Cancermine
|
transcriptID
|
Ensembl transcript ID (dot version)
|
minMinorAlleleCN
|
Minimum allele ploidy found over the gene exons - useful for identifying
LOH events
|
somReg (somaticRegions)
|
Count of somatic copy number regions this gene spans
|
germDelReg (germlineHomDeletionRegions /
germlineHetToHomDeletionRegions)
|
Number of regions spanned by this gene that are (homozygously deleted in
the germline / both heterozygously deleted in the germline and
homozygously deleted in the tumor)
|
minReg (minRegions)
|
Number of somatic regions inside the gene that share the min copy number
|
minRegStartEnd
|
Start/End base of the copy number region overlapping the gene with the
minimum copy number
|
minRegSupportStartEndMethod
|
Start/end support of the CN region overlapping the gene with the min CN
(plus determination method)
|
cnv_som_gene$tab |>
dplyr::slice(1:10) |>
knitr::kable(format = "html", caption = "PURPLE Somatic CNVs (per gene) Summary Table.")
PURPLE Somatic CNVs (per gene) Summary Table.
gene
|
minCN
|
maxCN
|
chrom
|
start
|
end
|
chrBand
|
onco_or_ts
|
transcriptID
|
minMinorAlleleCN
|
somReg
|
germDelReg
|
minReg
|
minRegStartEnd
|
minRegSupportStartEndMethod
|
CRBN
|
1.0268
|
1.0268
|
chr3
|
3150011
|
3179710
|
p26.2
|
|
ENST00000231948.8
|
0.0176
|
1
|
0/0
|
1
|
1-92214015
|
TELOMERE-CENTROMERE (BAF_WEIGHTED)
|
SDHA
|
2.0123
|
2.0123
|
chr5
|
218241
|
256700
|
p15.33
|
tsgene
|
ENST00000264932.10
|
0.9939
|
1
|
0/0
|
1
|
1-48272853
|
TELOMERE-CENTROMERE (BAF_WEIGHTED)
|
DUSP22
|
2.0037
|
2.0037
|
chr6
|
292462
|
351353
|
p25.3
|
|
ENST00000419235.6
|
0.9868
|
1
|
0/0
|
1
|
1-59191910
|
TELOMERE-CENTROMERE (BAF_WEIGHTED)
|
IRF4
|
2.0037
|
2.0037
|
chr6
|
391739
|
411447
|
p25.3
|
oncogene
|
ENST00000380956.8
|
0.9868
|
1
|
0/0
|
1
|
1-59191910
|
TELOMERE-CENTROMERE (BAF_WEIGHTED)
|
FOXQ1
|
2.0037
|
2.0037
|
chr6
|
1312440
|
1314748
|
p25.3
|
|
ENST00000296839.4
|
0.9868
|
1
|
0/0
|
1
|
1-59191910
|
TELOMERE-CENTROMERE (BAF_WEIGHTED)
|
DOCK8
|
1.9900
|
1.9900
|
chr9
|
214865
|
465259
|
p24.3
|
|
ENST00000432829.6
|
0.9886
|
1
|
0/0
|
1
|
1-44377362
|
TELOMERE-CENTROMERE (BAF_WEIGHTED)
|
LARP4B
|
1.9899
|
1.9899
|
chr10
|
806914
|
931705
|
p15.3
|
tsgene
|
ENST00000612396.4
|
0.9825
|
1
|
0/0
|
1
|
1-40640101
|
TELOMERE-CENTROMERE (BAF_WEIGHTED)
|
SIRT3
|
2.0012
|
2.0012
|
chr11
|
215458
|
236431
|
p15.5
|
tsgene
|
ENST00000382743.8
|
0.9888
|
1
|
0/0
|
1
|
1-52751710
|
TELOMERE-CENTROMERE (BAF_WEIGHTED)
|
KDM5A
|
1.9962
|
1.9962
|
chr12
|
280129
|
389454
|
p13.33
|
oncogene
|
ENST00000399788.6
|
0.9879
|
1
|
0/0
|
1
|
1-35977329
|
TELOMERE-CENTROMERE (BAF_WEIGHTED)
|
ZMYM2
|
2.0017
|
2.0017
|
chr13
|
19958670
|
20091829
|
q12.11
|
|
ENST00000610343.4
|
0.9944
|
1
|
0/0
|
1
|
17025624-31577000
|
CENTROMERE-NONE (BAF_WEIGHTED)
|
Germline CNVs (per chromosome)
Description
cnv_germ$descr |>
knitr::kable(format = "html", caption = "PURPLE Germline CNVs (per chromosome) Columns.")
PURPLE Germline CNVs (per chromosome) Columns.
Column
|
Description
|
Chr/Start/End
|
Coordinates of copy number segment
|
CN
|
Fitted absolute copy number of segment adjusted for purity and ploidy
|
CN Min+Maj
|
CopyNumber of minor + major allele adjusted for purity
|
Start/End SegSupport
|
Type of SV support for the CN breakpoint at start/end of region. Allowed
values: CENTROMERE, TELOMERE, INV, DEL, DUP, BND (translocation), SGL
(single breakend SV support), NONE (no SV support for CN breakpoint),
MULT (multiple SV support at exact breakpoint)
|
Method
|
Method used to determine the CN of the region. Allowed values:
BAF_WEIGHTED (avg of all depth windows for the region),
STRUCTURAL_VARIANT (inferred using ploidy of flanking SVs), LONG_ARM
(inferred from the long arm), GERMLINE_AMPLIFICATION (inferred using
special logic to handle regions of germline amplification)
|
BAF (count)
|
Tumor BAF after adjusted for purity and ploidy (Count of AMBER baf
points covered by this segment)
|
GC (windowCount)
|
Proportion of segment that is G or C (Count of COBALT windows covered by
this segment)
|
cnv_germ$tab |>
dplyr::slice(1:10) |>
knitr::kable(format = "html", caption = "PURPLE Germline CNVs (per chromosome) Summary Table.")
PURPLE Germline CNVs (per chromosome) Summary Table.
Chr
|
Start
|
End
|
CN
|
CN Min+Maj
|
Start/End SegSupport
|
Method
|
BAF (count)
|
GC (windowCount)
|
chr1
|
7510001
|
7511000
|
0.0
|
0+0
|
NONE-UNKNOWN
|
GERMLINE_HET2HOM_DELETION
|
0.98 (0)
|
0.52 (1)
|
chr1
|
14110001
|
14113000
|
0.4
|
0+0.4
|
NONE-UNKNOWN
|
GERMLINE_HET2HOM_DELETION
|
0.98 (0)
|
0.47 (3)
|
chr1
|
15825001
|
15829000
|
0.4
|
0+0.3
|
NONE-UNKNOWN
|
GERMLINE_HET2HOM_DELETION
|
0.98 (0)
|
0.49 (2)
|
chr1
|
58278001
|
58279000
|
0.3
|
0+0.3
|
NONE-UNKNOWN
|
GERMLINE_HET2HOM_DELETION
|
0.98 (0)
|
0.44 (1)
|
chr1
|
61617001
|
61618000
|
0.1
|
0+0.1
|
NONE-UNKNOWN
|
GERMLINE_HET2HOM_DELETION
|
0.98 (0)
|
0.39 (1)
|
chr1
|
79756001
|
79757000
|
0.0
|
0+0
|
NONE-UNKNOWN
|
GERMLINE_HOM_DELETION
|
0.98 (0)
|
0.35 (1)
|
chr1
|
85935001
|
85939000
|
0.2
|
0+0.2
|
NONE-UNKNOWN
|
GERMLINE_HET2HOM_DELETION
|
0.98 (0)
|
0.41 (4)
|
chr1
|
89010001
|
89013000
|
0.1
|
0+0.1
|
NONE-UNKNOWN
|
GERMLINE_HET2HOM_DELETION
|
0.98 (0)
|
0.42 (3)
|
chr1
|
105473001
|
105481000
|
0.1
|
0+0.1
|
NONE-UNKNOWN
|
GERMLINE_HET2HOM_DELETION
|
0.98 (0)
|
0.37 (8)
|
chr1
|
110834001
|
110845000
|
0.0
|
0+0
|
NONE-UNKNOWN
|
GERMLINE_HET2HOM_DELETION
|
0.98 (0)
|
0.33 (9)
|
Purity
purity <- system.file("extdata/purple/purple.purity.tsv", package = "gpgr") |>
gpgr::purple_purity_read()
purity$summary |>
knitr::kable(format = "html", caption = "PURPLE Purity Summary Table.")
PURPLE Purity Summary Table.
n
|
variable
|
value
|
details
|
2
|
Purity
|
0.75 (0.71-0.78)
|
Purity of tumor in the sample (and min-max with score within 10% of
best).
|
3
|
Ploidy
|
1.87 (1.86-1.88)
|
Average ploidy of tumor sample after adjusting for purity (and min-max
with score within 10% of best).
|
4
|
Gender
|
MALE
|
Gender as inferred by AMBER/COBALT.
|
7
|
WGD
|
FALSE
|
Whole genome duplication (more than 10 autosomes have average major
allele ploidy > 1.5).
|
8
|
MSI (indels/Mb)
|
MSS (0)
|
MSI status (MSI, MSS or UNKNOWN if somatic variants not supplied) &
MS Indels per Mb.
|
9
|
PolyclonalProp
|
0
|
Proportion of CN regions that are more than 0.25 from a whole CN
|
10
|
DiploidyProp
|
0.86 (0.86-0.86)
|
Proportion of CN regions that have 1 (+- 0.2) minor and major allele.
|
11
|
TMB
|
0 (LOW)
|
Tumor mutational burden (# PASS variants per Megabase) (Status: ‘HIGH’
(>10 PASS per Mb), ‘LOW’ or ‘UNKNOWN’).
|
12
|
TML
|
0 (LOW)
|
Tumor mutational load (# of missense variants) (Status: ‘HIGH’, ‘LOW’ or
‘UNKNOWN’).
|
13
|
TMB-SV
|
713
|
# of non inferred, non single passing SVs.
|
Kataegis
kat <- system.file("extdata/purple/purple.somatic.vcf.gz", package = "gpgr") |>
purple_kataegis()
kat$data |>
knitr::kable(format = "html", caption = "PURPLE Kataegis Table.")
PURPLE Kataegis Table.
CHROM
|
POS
|
KT
|
AF
|
PURPLE_AF
|
PURPLE_CN
|
PURPLE_MACN
|
PURPLE_VCN
|
SUBCL
|
MH
|
TNC
|
chr2
|
45006212
|
REV_1
|
0.2143
|
0.2601
|
1.52
|
0.000
|
0.396
|
1
|
NA
|
GGA
|
chr2
|
45007103
|
REV_1
|
0.1977
|
0.2399
|
1.52
|
0.000
|
0.365
|
1
|
NA
|
AGA
|
chr2
|
45007598
|
REV_1
|
0.24
|
0.2913
|
1.52
|
0.000
|
0.444
|
1
|
NA
|
TGA
|
chr2
|
45007709
|
REV_1
|
0.2273
|
0.2759
|
1.52
|
0.000
|
0.420
|
1
|
NA
|
TGA
|
chr3
|
71010245
|
REV_2
|
0.3165
|
0.4054
|
1.16
|
0.461
|
0.469
|
1
|
NA
|
AGA
|
chr3
|
71010553
|
REV_2
|
0.1863
|
0.2387
|
1.16
|
0.461
|
0.276
|
1
|
NA
|
AGA
|
chr3
|
71011404
|
REV_2
|
0.2841
|
0.3640
|
1.16
|
0.461
|
0.421
|
1
|
NA
|
GGA
|
chr3
|
71011410
|
REV_2
|
0.2727
|
0.3494
|
1.16
|
0.461
|
0.405
|
1
|
NA
|
AGA
|
chr3
|
71011474
|
REV_2
|
0.2577
|
0.3302
|
1.16
|
0.461
|
0.382
|
1
|
NA
|
AGA
|
chr3
|
71327944
|
FWD_1
|
0.1982
|
0.2494
|
1.26
|
0.438
|
0.314
|
1
|
NA
|
TCC
|
chr3
|
71327987
|
FWD_1
|
0.1947
|
0.2450
|
1.26
|
0.438
|
0.309
|
1
|
NA
|
TCA
|
chr3
|
71327997
|
FWD_1
|
0.2
|
0.2517
|
1.26
|
0.438
|
0.317
|
1
|
NA
|
TCA
|
chr3
|
71328028
|
FWD_1
|
0.15
|
0.1888
|
1.26
|
0.438
|
0.238
|
1
|
NA
|
TCA
|
chr3
|
71328321
|
FWD_1
|
0.1951
|
0.2455
|
1.26
|
0.438
|
0.309
|
1
|
NA
|
TCA
|
chr3
|
71328545
|
FWD_1
|
0.2188
|
0.2862
|
1.06
|
0.000
|
0.302
|
1
|
NA
|
TCC
|
chr3
|
71329130
|
FWD_1
|
0.3295
|
0.4311
|
1.06
|
0.000
|
0.455
|
1
|
NA
|
TCA
|
chr3
|
71329696
|
FWD_1
|
0.3671
|
0.4803
|
1.06
|
0.000
|
0.507
|
1
|
NA
|
TCA
|
chr3
|
78940888
|
REV_3
|
0.2525
|
0.3301
|
1.06
|
0.392
|
0.350
|
1
|
NA
|
AGA
|
chr3
|
78941023
|
REV_3
|
0.2589
|
0.3385
|
1.06
|
0.392
|
0.359
|
1
|
NA
|
AGA
|
chr3
|
78941779
|
REV_3
|
0.3368
|
0.4403
|
1.06
|
0.392
|
0.467
|
1
|
NA
|
TGA
|
chr3
|
78941859
|
REV_3
|
0.3488
|
0.4560
|
1.06
|
0.392
|
0.483
|
1
|
NA
|
GGA
|
chr3
|
78942061
|
REV_3
|
0.2989
|
0.3907
|
1.06
|
0.392
|
0.414
|
1
|
NA
|
GGA
|
chr3
|
78942101
|
REV_3
|
0.3077
|
0.4022
|
1.06
|
0.392
|
0.426
|
1
|
NA
|
AGA
|
chr3
|
78942282
|
REV_3
|
0.3222
|
0.4212
|
1.06
|
0.392
|
0.446
|
1
|
NA
|
AGA
|
chr3
|
78942348
|
REV_3
|
0.2892
|
0.3780
|
1.06
|
0.392
|
0.401
|
1
|
NA
|
AGA
|
chr3
|
78942562
|
REV_3
|
0.2115
|
0.2765
|
1.06
|
0.392
|
0.293
|
1
|
NA
|
AGA
|
Description
knitr::kable(kat$description, format = "html", caption = "Kataegis column description.")
Kataegis column description.
ID
|
Description
|
AF
|
Allele Frequency, for each ALT allele, in the same order as listed
|
KT
|
Forward/reverse kataegis id
|
MH
|
Microhomology
|
PURPLE_AF
|
Purity adjusted allelic frequency of variant
|
PURPLE_CN
|
Purity adjusted copy number surrounding variant location
|
PURPLE_MACN
|
Purity adjusted minor allele ploidy surrounding variant location
|
PURPLE_VCN
|
Purity adjusted ploidy of variant
|
SUBCL
|
Non-zero subclonal likelihood
|
TNC
|
Tri-nucleotide context
|
QC
qc <- system.file("extdata/purple/purple.qc", package = "gpgr") |>
gpgr::purple_qc_read()
qc$summary |>
knitr::kable(format = "html", caption = "PURPLE QC Summary Table.")
PURPLE QC Summary Table.
n
|
variable
|
value
|
details
|
1
|
QC_Status
|
WARN_DELETED_GENES
|
See ‘Description’.
|
13
|
Method
|
NORMAL
|
Fit method (NORMAL, HIGHLY_DIPLOID, SOMATIC or NO_TUMOR).
|
14
|
CopyNumberSegments
|
1428 (Unsupported: 0)
|
# of CN segments.
|
2
|
Purity
|
0.8600
|
|
17
|
Gender
|
Amber: MALE; Cobalt: MALE
|
|
14
|
DeletedGenes
|
7782
|
# of homozygously deleted genes.
|
15
|
Contamination
|
0.0
|
Rate of contamination in tumor sample as determined by AMBER.
|
16
|
GermlineAberrations
|
NONE
|
Can be one or more of: KLINEFELTER, TRISOMY_X/21/13/18/15, XYY,
MOSAIC_X.
|
Session Info
Main packages used in this vignette.
base |
4.2.3 |
2023-07-13 |
local |
gpgr |
1.5.0 |
2023-08-22 |
local |
Platform information.
version |
R version 4.2.3 (2023-03-15) |
os |
Ubuntu 22.04.3 LTS |
system |
x86_64, linux-gnu |
ui |
X11 |
language |
en |
collate |
C.UTF-8 |
ctype |
C.UTF-8 |
tz |
Etc/UTC |
date |
2023-08-22 |
pandoc |
3.1.3 @ /home/runner/micromamba/envs/pkgdownenv/bin/
(via rmarkdown) |