Skip to contents

You can specify the RNAsum reference dataset of interest (--dataset option) by using one of the TCGA project IDs (Project column in the following tables).

Primary datasets

The table below summarises TCGA expression data available for 33 cancer types.

NOTE: To reduce the data processing time and the size of the final RNAsum HTML report, the following datasets were restricted to include expression data from 300 patients: BRCA, THCA, HNSC, LGG, KIRC, LUSC, LUAD, PRAD, STAD and LIHC.

No Project Name Tissue code* Samples no.**
1 BRCA Breast Invasive Carcinoma 1 300
2 THCA Thyroid Carcinoma 1 300
3 HNSC Head and Neck Squamous Cell Carcinoma 1 300
4 LGG Brain Lower Grade Glioma 1 300
5 KIRC Kidney Renal Clear Cell Carcinoma 1 300
6 LUSC Lung Squamous Cell Carcinoma 1 300
7 LUAD Lung Adenocarcinoma 1 300
8 PRAD Prostate Adenocarcinoma 1 300
9 STAD Stomach Adenocarcinoma 1 300
10 LIHC Liver Hepatocellular Carcinoma 1 300
11 COAD Colon Adenocarcinoma 1 257
12 KIRP Kidney Renal Papillary Cell Carcinoma 1 252
13 BLCA Bladder Urothelial Carcinoma 1 246
14 OV Ovarian Serous Cystadenocarcinoma 1 220
15 SARC Sarcoma 1 214
16 PCPG Pheochromocytoma and Paraganglioma 1 177
17 CESC Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma 1 171
18 UCEC Uterine Corpus Endometrial Carcinoma 1 168
19 PAAD Pancreatic Adenocarcinoma 1 150
20 TGCT Testicular Germ Cell Tumours 1 149
21 LAML Acute Myeloid Leukaemia 3 145
22 ESCA Esophageal Carcinoma 1 142
23 GBM Glioblastoma Multiforme 1 141
24 THYM Thymoma 1 118
25 SKCM Skin Cutaneous Melanoma 1 100
26 READ Rectum Adenocarcinoma 1 87
27 UVM Uveal Melanoma 1 80
28 ACC Adrenocortical Carcinoma 1 78
29 MESO Mesothelioma 1 77
30 KICH Kidney Chromophobe 1 59
31 UCS Uterine Carcinosarcoma 1 56
32 DLBC Lymphoid Neoplasm Diffuse Large B-cell Lymphoma 1 47
33 CHOL Cholangiocarcinoma 1 34

Extended datasets

There are extended sets available for the BLCA, PAAD, and LUAD cohorts, including neuroendocrine tumours (NETs), intraductal papillary mucinous neoplasm (IPMNs), acinar cell carcinoma (ACC) samples and large-cell neuroendocrine carcinoma (LCNEC).

No Project Name Tissue code* Samples no.**
1 LUAD-LCNEC LUAD dataset including LCNECs (n=14) 1 314
2 BLCA-NET BLCA dataset including NETs (n=2) 1 248
3 PAAD-IPMN Pancreatic Adenocarcinoma dataset including IPMNs (n=2) 1 152
4 PAAD-NET Pancreatic Adenocarcinoma dataset including NETs (n=8) 1 158
5 PAAD-ACC Pancreatic Adenocarcinoma dataset including ACCs (n=1) 1 151

Pan-Cancer dataset

10 samples from each of the 33 datasets were combined to create the Pan-Cancer dataset.

No Project Name Tissue code* Samples no.**
1 PANCAN Samples from all 33 cancer types, 10 samples from each 1 and 3 (LAML samples only) 330

* Tissue codes:

Tissue code Letter code Definition
1 TP Primary solid Tumour
3 TB Primary Blood Derived Cancer - Peripheral Blood

** Each dataset was cleaned based on the quality metrics provided in the Merged Sample Quality Annotations file (TSV available from here), linked to from the TCGA PanCanAtlas initiative webpage (see the TCGA-data-harmonization repository for more details, including sample inclusion criteria).