Summary
QC_Status
|
PASS
|
Purity
|
0.43 (0.41-0.79)
|
Ploidy
|
2.911 (2.851-4.917)
|
Gender
|
MALE
|
Top MutSigs
|
Old: Sig3 (11082), Sig13 (6441) - New: SBS13 (6369), SBS39 (4511)
|
CNVs (Somatic)
|
Min: -0.52; Max: 11.52; N: 1497
|
SVs (unmelted)
|
Sum: 807, INS: 19, DUP: 205, DEL: 375, BND: 208
|
SVs (melted)
|
Sum: 15879, INS: 21, DUP: 6788, DEL: 8819, BND: 251
|
Genome
|
hg38
|
AF Summary Stats
set
|
n
|
mean
|
median
|
mode
|
Global
|
32264
|
0.28
|
0.24
|
0.25
|
Key genes CDS
|
67
|
0.32
|
0.25
|
0.25
|
Somatic Mutation Profiles
Allelic Frequencies
Summarised below are the allele frequencies (AFs) for somatic variants detected genome-wide (Global) vs. within the coding sequence of ~1,100 UMCCR cancer genes (Key Genes CDS). AFs range from 0 to 1, or 0%-100% (we filter out all novel variants with AF < 10%).
Details
Variants are typically called in bcbio by 3 different callers, with calls supported by at least 2 of them used (“ensemble” approach). In some cases only a single caller is used due to technical reasons (e.g. highly mutated FFPE sample).
The following post-processing steps occur:
somatic_vcf_annotate
: annotate VCF against databases of known hotspots, germline variants, low mappability regions, UMCCR panel of normals
somatic_vcf_filter
: filter VCF to remove germline variants and artefacts, but keep known hotspots
- As preparation for the allelic frequencies plots:
subset_to_giab
: keep variants in ‘high confidence’ regions as determined by the Genome in a Bottle consortium
- keep only variants with AF above 10%
- Allele frequencies for global and keygenes:
afs
: grab only the INFO/TUMOR_AF
field and output to final txt file
afs_keygenes
: grab the CHROM
, POS
, ID
, REF
, ALT
and INFO/TUMOR_AF
for variants in the UMCCR cancer gene BED file, and output to final txt file

Mutational Signatures
Deciphering the mutational signature of a tumor sample can provide insight into the mutational processes involved in carcinogenesis and help in cancer treatment and prevention. The MutationalPatterns R package is used to generate a mutation signature for the sample. We use the final filtered somatic calls as input.
Context signature

Point mutation spectrum
Description
We can count the mutation type occurrences for the input VCF. For C>T
mutations, a distinction is made between C>T
at CpG sites and other sites, as deamination of methylated cytosine at CpG sites is a common mutational process. This is the reason the reference genome is needed.
A mutation spectrum shows the relative contribution of each mutation type in the base substitution catalogs. We can plot the mean relative contribution of each of the 6 base substitution types over all samples. Error bars indicate standard deviation over all samples. The total number of mutations is indicated. We can also distinguish between C>T
at CpG sites and other sites.

Transcriptional strand bias
Description
We can determine if a gene mutation is on the transcribed or non-transcribed strand, which can be used to evaluate the involvement of transcription-coupled repair. By convention base substitutions are regarded as C>X or T>X, so we try to determine whether the C or T base is on the same strand as the gene definition. Base substitutions on the same strand as the gene definition are considered ‘untranscribed’, and on the opposite strand ‘transcribed’, since the gene definitions report the coding or sense strand, which is untranscribed. No strand information is reported for base substitutions that overlap with more than one gene on different strands.


Replicative strand bias
Description
The involvement of replication-associated mechanisms can be evaluated by testing for a mutational bias between the leading and lagging strand. The replication strand is dependent on the locations of replication origins from which DNA replication is fired. However, replication timing is dynamic and cell-type specific, which makes replication strand determination less straightforward than transcriptional strand bias analysis. Replication timing profiles can be generated with Repli-Seq experiments. Once the replication direction is defined, a strand asymmetry analysis can be performed similarly as the transcription strand bias analysis.


Signature Contribution
Description
The contribution of any set of signatures to the mutational profile of a sample can be quantified. This unique feature is specifically useful for mutational signature analyses of small cohorts or individual samples, but also to relate own findings to known signatures and published findings. The fit_to_signatures
function finds the optimal linear combination of mutational signatures that most closely reconstructs the mutation matrix by solving a non-negative least-squares constraints problem.
Shown are signatures with positive Contribution values, along with summarised descriptions and reference signature plots from https://cancer.sanger.ac.uk/cosmic/signatures.
OLD
Rank
|
Signature
|
Contribution
|
Description
|
Plot
|
1
|
Sig3
|
11082
|
Breast, ovarian, pancreatic; germline + somatic BRCA1/BRCA2 mut; failure of DNA ds break-repair; many large indels.
|
|
2
|
Sig13
|
6441
|
In 22 cancers; mostly cervical, bladder, breast; AID/APOBEC; C>G mut; similar to sig2; maybe viral infection, retrotransposon jumping or tissue inflammation; kataegis.
|
|
3
|
Sig2
|
4072
|
In 22 cancers; mostly cervical + bladder; AID/APOBEC; similar to sig13; maybe viral infection, retrotransposon jumping or tissue inflammation; exon TSB.
|
|
4
|
Sig8
|
3270
|
Breast cancer, medulloblastoma; many CC>AA subs; weak strand bias for C>A subs.
|
|
5
|
Sig19
|
2220
|
Only in pilocytic astrocytoma.
|
|
6
|
Sig7
|
1985
|
Skin, lip, head, neck or oral squamous cancers; ultraviolet light exposure; many CC>TT subs; strong TSB with many C>T mut.
|
|
7
|
Sig16
|
1246
|
Liver; strong TSB for T>C at ATN context.
|
|
8
|
Sig12
|
1093
|
Liver; TSB for T>C.
|
|
9
|
Sig1
|
1082
|
All cancers; correlates with age of diagnosis; few small indels
|
|
10
|
Sig9
|
996
|
Chronic lymphocytic leukaemias, malignant B-cell lymphomas; mut pattern associated with polymerase H (implicated with AID during som hypermutation).
|
|
11
|
Sig11
|
644
|
Melanoma, glioblastoma; alkylating agents; TSB for C>T.
|
|
12
|
Sig27
|
66
|
Subset of kidney clear cell carcinomas; strong TSB for T>A; many small indels.
|
|
13
|
Sig28
|
38
|
Subset of stomach cancers.
|
|
14
|
Sig10
|
6
|
In 6 cancers; mostly colorectal and uterine; hypermutated samples due to altered POLE activity; TSB for C>A at TCT context, T>G at TTT context.
|
|
Rainfall Plot
Rainfall plots show the distribution of mutations along the genome, with mutation types indicated with different colors. The y-axis corresponds to the distance of a mutation from the previous mutation, and is log10 transformed. Drop-downs from the plots indicate clusters or “hotspots” of mutations.

Circos Plots
Circos plots are generated by PURPLE. The first BAF plot is based on PURPLE data and configuration files.
BAF, Total/Minor CN, SVs
Description
- Track1: Chromosomes. Darker shaded areas: gaps in reference genome (centromeres, heterochromatin & missing short arms)
- Track2: Beta Allele Frequency. Given that the BAF points correspond to allele frequencies of heterozygous SNPs that are common in germline samples, there shouldn’t be any in chromosome Y (and chromosome X when male).
- Track3: Total copy number changes adjusted for tumor purity, including focal and chromosomal somatic events. Red = Loss; Green = Gain. Scaled from 0 (complete loss) to 6 (high level gains). If > 6, shown as 6 with a green dot on the outermost green gridline.
- Track4: Minor allele copy numbers. Range from 0 to 3. Expected normal minor allele copy number is 1, and anything below 1 is shown as a loss (Orange), representing an LOH event. Minor allele copy numbers above 1 (Blue) indicate gains of both A and B alleles.
- Track5 (Inner circle): Observed structural variants within or between the chromosomes.
- Blue = Translocations
- Red = Deletions
- Yellow = Insertions
- Green = Tandem duplications
- Black = Inversions

SNVs/Indels, Total/Minor CN, SVs
Description
- Track1: Chromosomes. Darker shaded areas: gaps in reference genome (centromeres, heterochromatin & missing short arms)
- Track2: Somatic variants (incl. exon, intron and intergenic regions).
- outer ring: SNP allele frequencies, corrected for tumor purity and scaled from 0 to 100%. Each dot represents a single somatic variant, coloured according to the type of base change (e.g. C>T/G>A in red).
- inner ring: short insertion (yellow) and deletion (red) locations.
- Track3: Observed total copy number changes adjusted for tumor purity, including focal and chromosomal somatic events. Red = Loss; Green = Gain. Scaled from 0 (complete loss) to 6 (high level gains). If > 6, shown as 6 with a green dot on the outermost green gridline.
- Track4: Observed minor allele copy numbers. Range from 0 to 3. Expected normal minor allele copy number is 1, and anything below 1 is shown as a loss (Orange), representing an LOH event. Minor allele copy numbers above 1 (Blue) indicate gains of both A and B alleles.
- Track5 (Inner circle): Observed structural variants within or between the chromosomes.
- Blue = Translocations
- Red = Deletions
- Yellow = Insertions
- Green = Tandem duplications
- Black = Inversions

Allele Ratios, BAF
Description
- Track1: Chromosomes. Darker shaded areas: gaps in reference genome (centromeres, heterochromatin & missing short arms)
- Track2: Tumor and Normal Allele Ratios
- Track3: Beta Allele Frequency Given that the BAF points correspond to allele frequencies of heterozygous SNPs that are common in germline samples, there shouldn’t be any in chromosome Y (and chromosome X when male).

Structural Variants
Structural variants are inferred with Manta, adjusted using PURPLE, and prioritised using simple_sv_annotation. Allele frequencies, copy number changes and ploidy are purity-adjusted.
Details
The input file corresponds to umccrised/<batch>/structural/<batch>-manta.tsv
.
It’s generated through the following steps:
Step 1: Processing
- Input: Manta structural variant calls from bcbio (
final/<tumor-name>/<batch-name>-sv-prioritize-manta.vcf.gz
(or <batch-name>-manta.vcf.gz
if not prioritised))
- Remove following annotations from Manta VCF: ‘INFO/SIMPLE_ANN’, ‘INFO/SV_HIGHEST_TIER’, ‘FILTER/Intergenic’, ‘FILTER/MissingAnn’, ‘FILTER/REJECT’
- Prioritise variants with simple_sv_annotation`
- Output:
work/{batch}/structural/prioritize/{batch}-manta.vcf.gz
- Keep PASS variants
- If more than 100,000 variants, keep only variants where
INFO/SV_TOP_TIER <= 3
- Output:
work/{batch}/structural/keep_pass/{batch}-manta.vcf
- Deal with chromosome capitalisation occurring from SnpEff
- Run BreakPointInspector (BPI) if it was disabled in bcbio
- Output:
work/{batch}/structural/maybe_bpi/{batch}-manta.vcf
Step 2: Filtering
- Keep PASS variants (since BPI updates the FILTER column)
- For BND variants require paired reads support (PR) to be higher than split read support (SR)
- Keep all
INFO/SV_TOP_TIER <= 2
variants
- For
INFO/SV_TOP_TIER > 2
variants require split or paired reads support of at least 5x
- For
INFO/SV_TOP_TIER > 2
variants with low allele frequency at any breakpoint (BPI_AF[0 or 1] < 0.1
), require SR or PR support of at least 10x
- Output:
work/{batch}/structural/filt/{batch}-manta.vcf
Step 3: PURPLE and FFPE conditional
- If the sample is not FFPE:
- Feed above filtered SVs to PURPLE, which outputs
purple.sv.vcf.gz
that contains rescued SVs
- Prioritise variants (again)
- Remove ‘INFO/ANN’ annotation
- Output:
{batch}/structural/{batch}-manta.vcf.gz
- If the sample is FFPE:
- Just copy
filtered
variants and don’t do anything (i.e. we don’t want the rescued SVs from PURPLE since they’ll likely be heaps) (note that PURPLE will still get fed with the filtered
SVs)
- Output:
{batch}/structural/{batch}-manta.vcf.gz
Step 4: TSV final output
- Input:
{batch}/structural/{batch}-manta.vcf.gz
VCF
- Output:
{batch}/structural/{batch}-manta.tsv
TSV
Description of Manta TSV columns
Column
|
Description
|
AF_BPI
|
INFO/BPI_AF: AF at each breakpoint (so AF_BPI1,AF_BPI2)
|
AF_PURPLE
|
INFO/PURPLE_AF: AF at each breakend (purity adjusted) (so AF_PURPLE1,AF_PURPLE2)
|
annotation
|
INFO/SIMPLE_ANN: Simplified structural variant annotation: ‘SVTYPE | EFFECT | GENE(s) | TRANSCRIPT | PRIORITY (1-4)’
|
caller
|
Manta SV caller
|
chrom
|
CHROM column in VCF
|
CN_change_PURPLE
|
INFO/PURPLE_CN_CHANGE: change in CN at each breakend (purity adjusted) (so CN_change_PURPLE1,CN_change_PURPLE2)
|
CN_PURPLE
|
INFO/PURPLE_CN: CN at each breakend (purity adjusted) (so CN_PURPLE1,CN_PURPLE2)
|
end
|
INFO/END: End position of the variant described in this record
|
END_BPI
|
INFO/BPI_END: BPI adjusted breakend location
|
ID
|
ID column in VCF
|
MATEID
|
INFO/MATEID: ID of mate breakend
|
paired_support_PE
|
FORMAT/PE of tumor sample: ??
|
paired_support_PR
|
FORMAT/PR of tumor sample: Spanning paired-read support for the ref and alt alleles in the order listed, for reads where P(allele|read)>0.999
|
Ploidy_PURPLE
|
INFO/PURPLE_PLOIDY: Ploidy of variant (purity adjusted)
|
PURPLE_status
|
INFERRED if FILTER=INFERRED, or RECOVERED if has INFO/RECOVERED, else blank. INFERRED: Breakend inferred from copy number transition
|
sample
|
Tumor sample name
|
somaticscore
|
INFO/SOMATICSCORE: Somatic variant quality score
|
split_read_support
|
FORMAT/SR of tumor sample: Split reads for the ref and alt alleles in the order listed, for reads where P(allele|read)>0.999
|
start
|
POS column in VCF
|
START_BPI
|
INFO/BPI_START: BPI adjusted breakend location
|
svtype
|
INFO/SVTYPE: Type of structural variant
|
tier
|
INFO/SV_TOP_TIER (or 4 if missing): Highest priority tier for the effects of a variant entry
|
Prioritisation process
- Annotate with SnpEff based on Ensembl gene model
- Subset annotations to APPRIS principal transcripts
- Prioritize variants with simple_sv_annotation 1(high)-2(moderate)-3(low)-4(no interest):
name
|
abbreviation
|
3_prime_UTR_truncation
|
3UTRtrunc
|
3_prime_UTR_variant
|
3UTRvar
|
5_prime_UTR_truncation
|
5UTRtrunc
|
5_prime_UTR_variant
|
5UTRvar
|
bidirectional_gene_fusion
|
BidFusG
|
chromosome_number_variation
|
ChromNumV
|
conservative_inframe_deletion
|
ConsInframeDel
|
feature_ablation
|
DelG
|
transcript_ablation
|
DelTx
|
downstream_gene_variant
|
DnstreamGV
|
duplication
|
Dup
|
exon_loss_variant
|
ExonLossV
|
frameshift_variant
|
FrameshiftV
|
feature_fusion
|
Fus
|
gene_fusion
|
FusG
|
intergenic_region
|
IntergenReg
|
intragenic_variant
|
IntragenV
|
intron_variant
|
IntronV
|
no_func_effect
|
NoFuncEff
|
non_coding_transcript_variant
|
NoncodTxV
|
no_prio_effect
|
NoPrioEff
|
splice_acceptor_variant
|
SpliceAccV
|
splice_donor_variant
|
SpliceDonV
|
splice_region_variant
|
SpliceRegV
|
start_lost
|
StartLoss
|
stop_gained
|
StopGain
|
stop_lost
|
StopLoss
|
TFBS_ablation
|
TFBSDel
|
TF_binding_site_variant
|
TFBSVar
|
upstream_gene_variant
|
UpstreamGV
|
Summary
SV type by top tier before breaking down by annotation
|
BND
|
DEL
|
DUP
|
INS
|
Sum
|
1
|
0
|
0
|
1
|
0
|
1
|
2
|
20
|
27
|
35
|
0
|
82
|
3
|
4
|
27
|
11
|
1
|
43
|
4
|
184
|
321
|
158
|
18
|
681
|
Sum
|
208
|
375
|
205
|
19
|
807
|
|
SV type by individual tier after breaking down by annotation
|
BND
|
DEL
|
DUP
|
INS
|
Sum
|
1
|
0
|
0
|
1
|
0
|
1
|
2
|
26
|
117
|
43
|
0
|
186
|
3
|
6
|
257
|
222
|
1
|
486
|
4
|
219
|
8445
|
6522
|
20
|
15206
|
Sum
|
251
|
8819
|
6788
|
21
|
15879
|
|
Unmelted Variants
Description
Column
|
Description
|
varnum
|
Original event row number that connects variants to events
|
TierTop
|
Top priority of the event (from ‘simple_sv_annotation’: 1 highest, 4 lowest)
|
Chr
|
Chromosome
|
Start
|
Start position as inferred by BPI (for PURPLE-inferred SVs we use POS)
|
End
|
End position. For BNDs this has been assigned to the Chr:Start of the BND’s mate for convenience. Values are inferred by BPI (PURPLE-inferred SVs don’t have an End).
|
Type
|
Type of structural variant
|
ID
|
ID of BND from Manta (or PURPLE for PURPLE-inferred SVs))
|
MATEID
|
ID of BND mate from Manta
|
BND_ID
|
ID of BND pair simplified. BNDs with the same BND_ID belong to the same translocation event
|
BND_mate
|
‘A’ or ‘B’ depending on if it’s the first or second mate in the BND pair
|
SR_PR_alt
|
Number of Split Reads and Paired Reads supporting the alt allele, for reads where P(allele|read)>0.999
|
SR_PR_ref
|
Number of Split Reads and Paired Reads supporting the ref allele, for reads where P(allele|read)>0.999
|
Ploidy
|
Ploidy of variant from PURPLE (purity adjusted)
|
AF_PURPLE
|
PURPLE AF at each breakend preceded by their average
|
AF_BPI
|
BPI AF at each breakend preceded by their average
|
CNC
|
Copy Number Change at each breakend preceded by their average
|
CN
|
Copy Number at each breakend preceded by their average
|
SScore
|
Somatic variant quality score
|
nann
|
Number of annotations for given event
|
Translocations (BNDs)
Main Columns
Description
Column
|
Description
|
nrow
|
Row number that connects variants between tables in same tab set
|
varnum
|
Original event row number that connects variants to events
|
Tier
|
Priority of the specific event (from ‘simple_sv_annotation’: 1 highest, 4 lowest)
|
Chr
|
Chromosome
|
Start
|
Start position as inferred by BPI (for PURPLE-inferred SVs we use POS)
|
End
|
End position. For BNDs this has been assigned to the Chr:Start of the BND’s mate for convenience. Values are inferred by BPI (PURPLE-inferred SVs don’t have an End).
|
ID
|
ID of BND from Manta (or PURPLE for PURPLE-inferred SVs))
|
BND_ID
|
ID of BND pair simplified. BNDs with the same BND_ID belong to the same translocation event
|
BND_mate
|
‘A’ or ‘B’ depending on if it’s the first or second mate in the BND pair
|
Genes
|
Genes involved in the event. DEL/DUP/INS events involving more than 2 genes are shown in separate table.
|
Effect
|
SV effect (based on SnpEff Effect Sequence Ontology - abbreviations are shown under Details in the Effects table
|
Detail
|
Prioritisation detail (from ‘simple_sv_annotation’)
|
SR_PR_alt
|
Number of Split Reads and Paired Reads supporting the alt allele, for reads where P(allele|read)>0.999
|
Ploidy
|
Ploidy of variant from PURPLE (purity adjusted)
|
AF_PURPLE
|
PURPLE AF at each breakend preceded by their average
|
Other Columns
Description
Column
|
Description
|
nrow
|
Row number that connects variants between tables in same tab set
|
AF_BPI
|
BPI AF at each breakend preceded by their average
|
CNC
|
Copy Number Change at each breakend preceded by their average
|
CN
|
Copy Number at each breakend preceded by their average
|
SR_PR_ref
|
Number of Split Reads and Paired Reads supporting the ref allele, for reads where P(allele|read)>0.999
|
SScore
|
Somatic variant quality score
|
ntrx
|
Number of transcripts for given event
|
Transcript
|
Transcripts involved in the event. DEL/DUP/INS events involving more than 2 transcripts are shown in separate table.
|
PURPLE Inferred
Description
Column
|
Description
|
Tier
|
Priority of the specific event (from ‘simple_sv_annotation’: 1 highest, 4 lowest)
|
Chr
|
Chromosome
|
Start
|
Start position as inferred by BPI (for PURPLE-inferred SVs we use POS)
|
Effect
|
SV effect (based on SnpEff Effect Sequence Ontology - abbreviations are shown under Details in the Effects table
|
Detail
|
Prioritisation detail (from ‘simple_sv_annotation’)
|
Ploidy
|
Ploidy of variant from PURPLE (purity adjusted)
|
CN
|
Copy Number at each breakend preceded by their average
|
CNC
|
Copy Number Change at each breakend preceded by their average
|
ID
|
ID of BND from Manta (or PURPLE for PURPLE-inferred SVs))
|
DEL/DUP/INS
Main Columns
Description
Column
|
Description
|
nrow
|
Row number that connects variants between tables in same tab set
|
varnum
|
Original event row number that connects variants to events
|
TierTop
|
Top priority of the event (from ‘simple_sv_annotation’: 1 highest, 4 lowest)
|
Tier
|
Priority of the specific event (from ‘simple_sv_annotation’: 1 highest, 4 lowest)
|
Type
|
Type of structural variant
|
Chr
|
Chromosome
|
Start
|
Start position as inferred by BPI (for PURPLE-inferred SVs we use POS)
|
End
|
End position. For BNDs this has been assigned to the Chr:Start of the BND’s mate for convenience. Values are inferred by BPI (PURPLE-inferred SVs don’t have an End).
|
Effect
|
SV effect (based on SnpEff Effect Sequence Ontology - abbreviations are shown under Details in the Effects table
|
Genes
|
Genes involved in the event. DEL/DUP/INS events involving more than 2 genes are shown in separate table.
|
Transcript
|
Transcripts involved in the event. DEL/DUP/INS events involving more than 2 transcripts are shown in separate table.
|
Detail
|
Prioritisation detail (from ‘simple_sv_annotation’)
|
SR_PR_alt
|
Number of Split Reads and Paired Reads supporting the alt allele, for reads where P(allele|read)>0.999
|
AF_PURPLE
|
PURPLE AF at each breakend preceded by their average
|
Other Columns
Description
Column
|
Description
|
varnum
|
Original event row number that connects variants to events
|
Ploidy
|
Ploidy of variant from PURPLE (purity adjusted)
|
AF_BPI
|
BPI AF at each breakend preceded by their average
|
CNC
|
Copy Number Change at each breakend preceded by their average
|
CN
|
Copy Number at each breakend preceded by their average
|
SR_PR_ref
|
Number of Split Reads and Paired Reads supporting the ref allele, for reads where P(allele|read)>0.999
|
SScore
|
Somatic variant quality score
|
Many Genes
Description
Column
|
Description
|
nrow
|
Row number that connects variants between tables in same tab set
|
varnum
|
Original event row number that connects variants to events
|
Tier
|
Priority of the specific event (from ‘simple_sv_annotation’: 1 highest, 4 lowest)
|
Type
|
Type of structural variant
|
Chr
|
Chromosome
|
Start
|
Start position as inferred by BPI (for PURPLE-inferred SVs we use POS)
|
End
|
End position. For BNDs this has been assigned to the Chr:Start of the BND’s mate for convenience. Values are inferred by BPI (PURPLE-inferred SVs don’t have an End).
|
Effect
|
SV effect (based on SnpEff Effect Sequence Ontology - abbreviations are shown under Details in the Effects table
|
ngen
|
Number of genes for given event
|
Genes
|
Genes involved in the event. DEL/DUP/INS events involving more than 2 genes are shown in separate table.
|
Many Transcripts
Description
Column
|
Description
|
nrow
|
Row number that connects variants between tables in same tab set
|
varnum
|
Original event row number that connects variants to events
|
Tier
|
Priority of the specific event (from ‘simple_sv_annotation’: 1 highest, 4 lowest)
|
Type
|
Type of structural variant
|
Chr
|
Chromosome
|
Start
|
Start position as inferred by BPI (for PURPLE-inferred SVs we use POS)
|
End
|
End position. For BNDs this has been assigned to the Chr:Start of the BND’s mate for convenience. Values are inferred by BPI (PURPLE-inferred SVs don’t have an End).
|
Effect
|
SV effect (based on SnpEff Effect Sequence Ontology - abbreviations are shown under Details in the Effects table
|
ntrx
|
Number of transcripts for given event
|
Transcript
|
Transcripts involved in the event. DEL/DUP/INS events involving more than 2 transcripts are shown in separate table.
|
Copy Number Variants
The purity and ploidy estimator PURPLE is used to generate a copy number profile for the somatic sample.
QC, Purity and Ploidy Summary
PURPLE outputs a QC status along with a summary for the inferred purity and ploidy of the somatic sample. A failed QC status can be attributed to several factors (see Description
below).
Description
QC Status
The QC Status field reflects how we have determined the purity of the sample:
NORMAL
- PURPLE fit the purity using COBALT and AMBER output.
HIGHLY_DIPLOID
- The fitted purity solution is highly diploid (> 95%) with a large range of potential solutions, but somatic variants are unable to help either because they were not supplied or because their implied purity was too low.
SOMATIC
- Somatic variants have improved the otherwise highly diploid solution.
NO_TUMOR
- PURPLE failed to find any aneuploidy and somatic variants were supplied but there were fewer than 300 with observed VAF > 0.1.
QC Failure
There are several reasons PURPLE may classify a sample as failed:
FAIL_SEGMENT
: more than 220 copy number segments not supported at either end by SV breakpoints. Indicates samples with extreme GC bias, with differences in depth of >= 10x between high and low GC regions. GC normalisation is unreliable when the corrections are so extreme so it is recommended to fail the sample (concerns with miscalled deletions or amplifications or have poor sensitivity in high GC regions.
NO_TUMOR
: no aneuploidy found and the number of somatic SNVs found is less than 1,000
MIN_PURITY
: fitted purity < 20%
FAIL_DELETED_GENES
: more than 280 deleted genes. This QC step was added after observing that in a handful of samples with high MB scale positive GC bias we sometimes systematically underestimate the copy number in high GC regions. This can lead us to incorrectly infer homozygous loss of entire chromosomes, particularly on chromosomes 17 and 19.
FAIL_GENDER
: if the AMBER and COBALT inferred genders are inconsistent then the COBALT one is used but the sample is failed.
Variable
|
value
|
details
|
QC_Status
|
PASS
|
|
Purity
|
0.43 (0.41-0.79)
|
Purity of tumor in the sample (and min-max with score within 10% of best)
|
Ploidy
|
2.911 (2.851-4.917)
|
Average ploidy of tumor sample after adjusting for purity (and min-max with score within 10% of best)
|
Gender
|
MALE
|
|
WGD
|
true
|
Whole genome duplication
|
MSI (indels/Mb)
|
MSS (0.135)
|
MSI status (MSI, MSS or UNKNOWN if somatic variants not supplied) & MS Indels per Mb
|
Polyclonal Prop
|
0.287
|
Proportion of CN regions that are more than 0.25 from a whole copy number
|
Diploidy Prop
|
0.015 (0.001-0.015)
|
Proportion of CN regions that have 1 (+- 0.2) minor and major allele
|
Segment_Pass
|
TRUE
|
Score: 15; Unsupported: 15
|
Gender_Pass
|
TRUE
|
Amber: MALE; Cobalt: MALE
|
DelGenes_Pass
|
TRUE
|
count: 77
|
TMB
|
12.714 (HIGH)
|
Tumor mutational burden per mega base (Status: ‘HIGH’, ‘LOW’ or ‘UNKNOWN’ if somatic variants not supplied)
|
TML
|
0 (LOW)
|
Tumor mutational load (Status: ‘HIGH’, ‘LOW’ or ‘UNKNOWN’ if somatic variants not supplied)
|
UMCCR Gene CNV Calls
Description
PURPLE copy number alterations in the UMCCR Cancer Gene panel (~1,200 genes) - description is from https://github.com/hartwigmedical/hmftools/blob/master/purity-ploidy-estimator/README.md#gene-copy-number-file
Column
|
Description
|
gene
|
Name of gene
|
minCN/maxCN
|
Min/Max copy number found in gene exons
|
chrom/start/end
|
Chromosome/start/end location of gene transcript
|
chrBand
|
Chromosome band of the gene
|
onco_or_ts
|
oncogene (‘oncogene’), tumor suppressor (‘tsgene’), or both (‘onco+ts’), as reported by Cancermine
|
transcriptID
|
Ensembl transcript ID (dot version)
|
minMinorAllelePloidy
|
Minimum allele ploidy found over the gene exons - useful for identifying LOH events
|
somReg (somaticRegions)
|
Count of somatic copy number regions this gene spans
|
germDelReg (germlineHomDeletionRegions / germlineHetToHomDeletionRegions)
|
Number of regions spanned by this gene that are (homozygously deleted in the germline / both heterozygously deleted in the germline and homozygously deleted in the tumor)
|
minReg (minRegions)
|
Number of somatic regions inside the gene that share the min copy number
|
minRegStartEnd
|
Start/End base of the copy number region overlapping the gene with the minimum copy number
|
minRegSupportStartEndMethod
|
Start/end support of the CN region overlapping the gene with the min CN (plus determination method)
|
Genome-wide CNV Segments
Description
PURPLE outputs a file with the copy number profile of all contiguous segments of the tumor sample:
PURPLE copy number profile of all (contiguous) segments of the tumor sample - description is from https://github.com/hartwigmedical/hmftools/blob/master/purity-ploidy-estimator/README.md#copy-number-file
Column
|
Description
|
Chr/Start/End
|
Coordinates of copy number segment
|
CN
|
Fitted absolute copy number of segment adjusted for purity and ploidy
|
Ploidy Min+Maj
|
Ploidy of minor + major allele adjusted for purity
|
BAF
|
Tumor BAF after adjusted for purity and ploidy
|
BafCount
|
Count of AMBER baf points covered by this segment
|
Start/End SegSupport
|
Type of SV support for the CN breakpoint at start/end of region. Allowed values: CENTROMERE, TELOMERE, INV, DEL, DUP, BND (translocation), SGL (single breakend SV support), NONE (no SV support for CN breakpoint), MULT (multiple SV support at exact breakpoint)
|
Method
|
Method used to determine the CN of the region. Allowed values: BAF_WEIGHTED (avg of all depth windows for the region), STRUCTURAL_VARIANT (inferred using ploidy of flanking SVs), LONG_ARM (inferred from the long arm), GERMLINE_AMPLIFICATION (inferred using special logic to handle regions of germline amplification)
|
windowCount
|
Count of COBALT windows covered by this segment
|
GC
|
Proportion of segment that is G or C
|
PURPLE Charts
PURPLE generates charts for summarising tumor sample characteristics. Description is from the PURPLE docs.
Copy number / Minor allele ploidy
The following figures show the AMBER BAF count weighted distribution of copy number and minor allele ploidy throughout the fitted segments. Copy numbers are broken down by colour into their respective minor allele ploidy (MAP) while the minor allele ploidy figure is broken down by copy number.


Rainfall
If a somatic variant VCF has been supplied, a figure will be produced showing the somatic variant ploidy broken down by copy number as well as a rainfall plot with kataegis clusters highlighted in grey.


Purity/ploidy
The following ‘sunrise’ chart shows the range of scores of all examined solutions of purity and ploidy. Crosshairs identify the best purity / ploidy solution.

Clonality
The following diagram illustrates the clonality model of a typical sample.
The top figure shows the histogram of somatic ploidy for all SNVs and INDELs in blue. Superimposed are peaks in different colours fitted from the sample as described in the docs while the black line shows the overall fitted ploidy distribution. Red filled peaks are below the 0.85 subclonal threshold.
We can determine the likelihood of a variant being subclonal at any given ploidy as shown in the bottom half of the figure.

Segment
The contribution of each fitted segment to the final score of the best fit is shown in the following figure. Each segment is divided into its major and minor allele ploidy. The area of each circle shows the weight (AMBER baf count) of each segment.

Oncoviruses
Oncoviruses and their integration sites. Viral sequences are obtained from the GDC database. Host genes are reported if the integration site falls on a gene or at least before 100kbp of the gene start.
## No oncoviral content detected in this sample.
Addendum
Back to top
Conda Pkgs Main
name
|
version
|
build
|
channel
|
envs
|
umccrise
|
1.0.0
|
dev_0
|
<develop>
|
NA
|
snakemake
|
5.17.0
|
dev_0
|
<develop>
|
NA
|
reference-data
|
1.0.3
|
dev_0
|
<develop>
|
NA
|
r-base
|
3.5.1
|
hc461eb7_1012
|
conda-forge
|
_pcgr, _cancer_report
|
r-base
|
3.6.3
|
h316533a_2
|
conda-forge
|
_hmf
|
python
|
3.7.3
|
h357f687_2
|
conda-forge
|
NA, _pcgr, _hmf
|
python
|
3.8.2
|
he5300dc_7_cpython
|
conda-forge
|
_cancer_report
|
pcgr
|
0.8.4.10
|
py37r35_0
|
pcgr
|
_pcgr
|
pandocfilters
|
1.4.2
|
py37_1
|
NA
|
NA
|
pandoc
|
2.9.2.1
|
0
|
conda-forge
|
NA, _pcgr, _hmf, _cancer_report
|
ngs-utils
|
2.6.2
|
dev_0
|
<develop>
|
NA
|
ngs_utils
|
2.5.6
|
py37_0
|
vladsaveliev
|
_pcgr
|
multiqc-bcbio
|
0.2.8
|
pypi_0
|
pypi
|
NA
|
multiqc
|
1.9.dev0
|
dev_0
|
<develop>
|
NA
|
htslib
|
1.10.2
|
h78d89cc_0
|
bioconda
|
NA, _hmf
|
htslib
|
1.9
|
h244ad75_9
|
bioconda
|
_pcgr
|
hmftools-sage
|
1.0
|
0
|
bioconda
|
_hmf
|
hmftools-purple
|
2.40
|
0
|
bioconda
|
_hmf
|
hmftools-linx
|
1.7
|
0
|
bioconda
|
_hmf
|
hmftools-cobalt
|
1.8
|
0
|
bioconda
|
_hmf
|
hmftools-amber
|
3.2
|
0
|
bioconda
|
_hmf
|
gridss
|
2.8.3
|
0
|
bioconda
|
_hmf
|
cpsr
|
0.5.2.6
|
0
|
pcgr
|
_pcgr
|
conpair
|
0.2
|
pypi_0
|
pypi
|
NA
|
cacao
|
0.2.1.2
|
0
|
pcgr
|
_pcgr
|
