Datasets and Annotations
Functional / Structural Data
Column head Description MUT_ID Unique identifier of each gene variation reported in the database. This identifier is used in all datasets (tumor variant, polymorhisms, germline). hg19_Chr17_coordinates Chromosome coordinate of variation: start position based on hg19 human genome assembly. hg38_Chr17_coordinates Chromosome coordinate of variation: start position based on hg38 human genome assembly. ExonIntron Location of the variant in the introns or exons in TP53 gene for the reference sequence NM_000546.5. Terms occurring in this column are "1-intron"' to "11-intron" and "2-exon" to "11-exon". An "i" or "e" in front mean that the variant is located within the indicated intron or exon with no information on the precise location. Codon_number For variants in exons, codon number at which the variant is located (1-393). If a variant spans more than one codon, (e.g. tandem variant or deletion of several bases) only the first (5') codon is entered. For variants in introns, 0 is entered. WT_nucleotide Base in the reference sequence at the position of the variant on the coding sequence. Mutant_nucleotide Mutant base, described on the coding strand. Description Nucleotide change read from the coding sequence. For deletions and insertions, the number of bases deleted (del) or inserted (ins) is given. For more complex variant events, a full description is given as indicated in the original publication. c_description Variant nomenclature according to HGVS standards and using the NM_000546.5 coding sequence as reference. g_description Variant nomenclature according to HGVS standards and using the GenBank NC_000017.10 (hg19 assembly) genomic sequence as reference. g_description_GRCh38 Variant nomenclature according to HGVS standards and using the GenBank NC_000017.11 (hg38 assembly) genomic sequence as reference. Type Nature of the variant. The terms occurring in this column are "A:T>C:G" (A to C or T to G base change), "A:T>G:C" (A to G or T to C base change), "A:T>T:A" (A to T or T to A base change), "G:C>A:T" (G to A or C to T base change at non CpG sites), "G:C>A:T at CpG" (G to A or C to T base change at CpG sites), "G:C>C:G" (G to C or C to G base change), "G:C>T:A" (G to T or C to A base change), "tandem" (two consecutive base changes), "ins" (insertion), "del" (deletion) and "complex" (complex changes). Splice_site Annotation on the position of the variant within conserved nucleotides of p53 consensus, criptic or alternative splice sites:
consensus SD or SA= the variant is located at conserved dinucleotides involved in p53 consensus splice sites (SD for splice donor site, SA for splice acceptor site) producing the full-lenght p53 protein (TA isoform);
criptic SD or SA= the variant is located at conserved dinucleotides involved in splice sites (gt or ag) that have been observed experimentally in p53 sequences carrying mutated consensus splice sites;
alternative SD or SA= conserved dinucleotides involved in splice sites (gt or ag) responsible for producing p53 isoforms beta and gamma;
alternative = mutated nucleotides are in the "cassette" sequence responsible for producing the p53 delta isoform;
no= the position is outside the above mentioned nucleotides.
Information on splice site
CpG_site Yes or No indicate if the position of the variant falls within a CpG site or not. Context_coding Trinucleotide sequence context of variants. The 5' base and 3' base of the start position of the variant are indicated on the left and right respectively of the mutated base. This context is provided on the coding strand of the gene sequence and is based on hg38 TP53 sequence. Mut_rate Substitution rates were calculated for all single base substitutions in the coding sequence of p53 according to the dinucleotide substitution rates derived from human-mouse aligned sequences of chromosomes 21 and 10 (Lunter and Hein 2004). The variant probabilities for a given single nucleotide substitution are calculated by averaging the dinucleotide substitution rates at that position for the forward and reverse strands. WT_codon For variants in exons, sequence of the codon in which the variant occurred in NM_000546.5 transcript. Mutant_codon Base sequence of the mutated codon in NM_000546.5 transcript. WT_AA Wild-type amino acid encoded at the codon in which the variant occurred (three-letter amino acid abbreviation). Check AA Letter Code and Genetic Code Mutant_AA Mutated amino acid encoded at the codon in which the variant occurred (three-letter amino acid abbreviation). The chain terminating variants due to single base substitutions are designated by "stop". Check AA Letter Code and Genetic Code ProtDescription Variant description at the protein level as recommended by HGVS and using the Uniprot reference sequence P04637. Mut_rateAA Variant rate of amino-acid substitution calculated by summing up the nucleotide substitution rates. This value is only valid for amino-acid substitutions resulting from single nucleotide substitutions. Effect Effect of the variant. The terms occurring in this column are: missense (change of one amino-acid), nonsense (introduction of a stop codon), FS (frameshift), silent (no change in the protein sequence), splice (variants located in the two first and two last conserved nucleotides of the introns and are thus predicted to alter splicing, or variants that have been shown to alter splicing experimentally), other (inframe deletions or insertions), intronic (variants in introns outside splicing sites), NA (variants upstream in 5' or 3' UTR). Polymorphism Polymorphic status of the gene variation.
Validated : MAF > 0.001 in ESP6500, 1000G or gnomAD databases;
No : not reported or reported at MAF < 0.001 in ESP6500, 1000G, or gnomAD databases;
NA : not applicable.
SNPlink Link to NCBI SNP database. gnomADlink Link to gnomAD database. SourceDatabases SNP databases from which the variants have been extracted. PubMedlink PubMedID of the publications in which was reported the polymorphic status of the variant. Domain function Function of the domain in which the mutated residue is located. Residue function Known function of the wild-type residue. When the function is not known but the structure is known, the solvent accessibility (SA) of the residue is indicated by the terms buried, exposed or partially exposed (SA calculated with Naccess and 1TSR (chain B) structure of p53: <20 = buried, > =20 and <50 = partially exposed, > =50 = exposed). Hotspot "Yes" indicate the a variant is located in a codon defined as a cancer hotspot by Chang (2017). Structural_motif 2D and 3D motifs where the variant is located according to structures described in Cho et al. (1994) and May and May (1999) SA Solvent accessibility of the wild-type residue as calculated with Naccess and the 1TSR (chain B) structure of p53. AGVGD class Missense variant functional predictions by an optimized Align-GVGD tool. Variants classified as "C0" are considered tolerated while other classes are considered damaging. Further details in Fortuno (2018). BayesDel Missense variant functional predictions by BayesDel tool (Feng 2017) used without allele frequency.
Score >=0.16: damaging
Score <0.16: tolerated
Further details in Fortuno (2018).
REVEL Missense variant functional predictions by REVEL tool (Ioannidis 2016).
Score >=0.5: damaging
Score <0.5: tolerated
Further details in Fortuno (2018).
SIFT class Functional classification based on SIFT program using default settings. Missense variants are classified as "damaging" or "tolerated". PolyPhen2 Functional classification based on PolyPhen2 HVAR annotations retrieved with Annovar software:
D: probably Damaging
P: Possibly damaging
Transactivation Promoter-specific transcriptional activity measured in yeast functional assays and expressed as percent of wild-type activity. Data from Kato (2003) TransactivationClass Functional classification based on the overall transcriptional activity (TA) on 8 different promoters as measured in yeast assays by Kato et al. For each mutant, the median of the 8 promoter-specific activities (expressed as percent of the wild-type protein) is calculated and missense variants are classified as "non-functional" if the median is <=20, "partially functional" if the median is >20 and <=75, "functional" if the median is >75 and <=140, and "supertrans" if the median is >140. DNE_LOFclass Functional classification for loss of growth-suppression and dominant-negative activities based on Z-scores from Giacomelli et al., (2018) study:
DNE_LOF = p53WTNutlin3 Z-score >= 0.61 and Etoposide Z-score <= -0.21;
notDNE_notLOF = p53WTNutlin3 Z-score < 0.61 and Etoposide Z-score > -0.21;
notDNE_LOF if p53WTNutlin3 Z-score < 0.61 and Etoposide Z-score <= -0.21;
unclass = others
DNE class Dominant-negative (DN) Effect on transactivation by wild-type p53.
Classification established for mutants for which available DN activity on more than 2 p53-response elements is available. Data are based on WAF1 and RGC promoters in various studies (these promoters were the most frequently used in different studies to assess DNE status), and on two large systematic study (Dearth et al that includes 76 mutants; Monti et al that includes 104 mutants).
Mutants were classified as "Yes" if they had dominant-negative activity on both WAF1 and RGC promoters, or on all promoters in the large studies, "Moderate" if they had dominant-negative activity on some but not all promoters, and "No" if they had no dominant-negative activity on both WAF1 and RGC promoters, or none of the promoters in the large studies.
Structure/Function class Functional predictions derived from a computer model that takes into account the 3D structure of WT and mutant proteins and is trained on the transactivaton dataset from Kato et al. Variants are classified as "functional" or "non-functional". (Read more details on Structure-Function Predictions Based on Scores Derived from Delaunay Tessellations.) EffectGroup3 Variant classification based on protein 3D structure and variant type. This classification has been used to derive gentoype-phenotype correlations in sporadic breast cancers (Olivier et al., 2006).
1=missense in DNA-binding loops(L2,H1,L3,L1,S2,S2',H2);
SwissProtLink SwissProt identification number with link to the variant page of the SwissProt database. Somatic_count Number of occurence in the tumor variant dataset (number of tumors reported to carry this tumor variant). Total count is 29,891 in R20. Germline_count Number of occurence in the IARC germline dataset (number of pedigree/individual carriers of this germline variant). Total count is 1,532 in R20. CellLine_count Number of occurence in the IARC cell-line dataset (number of cell-lines reported to carry this variant). Total cell-line count is 2,766 in R20. COSMIClink Link to variant ID in COSMIC database. CLINVARlink Link to ClinVar database. TCGA_ICGC_GENIE_count Sum of variation occurence from TCGA (MC3), ICGC (v28) and GENIE (V5) datasets. Total count is 23,570.
Predicted effect on splicing:
Predicted effect on p53 protein isoforms:
The predictions provided are based on whether the variant falls within the specific isoform. Read more on p53 Isoforms
Column head Description TAp53alpha Indicate if the variant fall within the canonical isoform coding for the full length p53 protein. TAp53beta....deltap53alpha Indicate if the variant fall within the specified isoform.
SpliceAI Prediction on Canonical Transcript (NM_000546.5):
Column head Description SpliceAI_DS_AG SpliceAI delta score - acceptor gain: The highest delta score of a variant for acceptor gain (AG), ranging from 0 to 1 and can be interpreted as the probability that the variant affects splicing at any position within a window around it (10kb as maximum distance, as recommended for best performance [references: PMID 34283047 and PMID: 33942434]). The maximum of four delta scores (DS_AG, DS_AL, DS_DG, DS_DL) is used as basis for spliceogenicity predictions. SpliceAI_DS_AL SpliceAI delta score - acceptor loss: The highest delta score of a variant for acceptor loss (AL), ranging from 0 to 1 and can be interpreted as the probability that the variant affects splicing at any position within a window around it (10kb as maximum distance, as recommended for best performance [references: PMID 34283047 and PMID: 33942434]). The maximum of four delta scores (DS_AG, DS_AL, DS_DG, DS_DL) is used as basis for spliceogenicity predictions. SpliceAI_DS_DG SpliceAI delta score - donor gain: The highest delta score of a variant for donor gain (DG), ranging from 0 to 1 and can be interpreted as the probability that the variant affects splicing at any position within a window around it (10kb as maximum distance, as recommended for best performance [references: PMID 34283047 and PMID: 33942434]). The maximum of four delta scores (DS_AG, DS_AL, DS_DG, DS_DL) is used as basis for spliceogenicity predictions. SpliceAI_DS_DL SpliceAI delta score - donor loss: The highest delta score of a variant for donor loss (DL), ranging from 0 to 1 and can be interpreted as the probability that the variant affects splicing at any position within a window around it (10kb as maximum distance, as recommended for best performance [references: PMID 34283047 and PMID: 33942434]). The maximum of four delta scores (DS_AG, DS_AL, DS_DG, DS_DL) is used as basis for spliceogenicity predictions. SpliceAI_DP_AG SpliceAI delta point - acceptor gain: The location of acceptor gain (AG) relative to variant position in the pre-mRNA transcript (positive values are upstream of the variant, negative values are downstream). SpliceAI_DP_AL SpliceAI delta point - acceptor loss: The location of acceptor loss (AL) relative to variant position in the pre-mRNA transcript (positive values are upstream of the variant, negative values are downstream). SpliceAI_DP_DG SpliceAI delta point - donor gain: The location of donor gain (DG) relative to variant position in the pre-mRNA transcript (positive values are upstream of the variant, negative values are downstream). SpliceAI_DP_DL SpliceAI delta point - donor loss: The location of donor loss (DL) relative to variant position in the pre-mRNA transcript (positive values are upstream of the variant, negative values are downstream).
Tumor Variants Found in Human Tumor Samples
This dataset contains TP53 tumor variants identified in human tumor samples (including metastasis and cell-lines). It includes data on the type and position of variations, detailed information on the tumor in which the variants have been found, and on various characteristics of the patients in which the tumor developed. Please note that true somatic origin of the tumor variants may not have been confirmed unless a matched normal sample was used to filter out germline variants. Therefore, it cannot be excluded that a small number of the variants listed may be of germline origin.
Each row in the downloaded tab-delimited text file represents a single variant reported in a tumor sample with an arbitrarily assigned unique identification number. A unique identification number is also attributed to the tumor sample and to the patient. Table content is as follows:
Column head Description The first set of columns describe the variant. Mutation_ID Unique identification number for a Sample/Variant association.
Tandem variants (two adjacent base substitutions) are considered as one variant event; therefore tandem variants have only one identification number and are a single record.
MUT_ID....SwissProtLink see variant annotations The second set of columns are assigned to the description of the organ site, tissue and type of lesion in which the variant has been identified. The descriptions given in the publication are translated into the standards of the International Classification of Diseases for Oncology (ICD-O 3rd Edition, World Health Organization, Geneva, 2000) and SNOMED.
For information on tumor classification, grading and staging, check out ICD-O training at SEER and Cancer Information at NCI.
Sample_Name A sample name is assigned as follows: first 3 letters of the first author's name, year of publication (2 digit), followed by the ID number indicated in the publication. The same name or number can occur several times as in some samples more than one variant has been reported. Sample_ID Unique sample identification number. This number allows the automatic retrieval of samples with multiple variants. Sample_source...TNM see tumor annotations p53_IHC p53 immunostaining graded as ‘positive’, ‘negative’ or ‘+/-‘. ND stands for not done. Add_Info Any relevant additional information is entered here. The third set of columns are assigned to the description of the patient origin and life-style. They contain heterogeneous notes, usually comments emphasized by authors reporting the variants. It should be noted that this information is generally qualitative. No quantitative information on exposure of risk factors is included in the database. This information does not presuppose that a formal, causal link has been established between such factors and the variant described. Moreover, for most exogenous risk factors, individual exposure has not been monitored. This information is given solely to (i) permit the retrieval of variations found in patients belonging to defined groups or having specific risk factors, and (ii) facilitate access to the corresponding publications. For detailed comparison between exposure groups, users are invited to perform their own analysis based on the information given in the original publication. Individual_ID Unique identification number for an individual included in the database. It is automatically assigned by the database system. Sex...Country see patient annotations TP53polymorphism Presence of a polymorphism in TP53 gene. Germline_mutation Germline variant detected in any gene in the patient. Family_history Information on the presence or absence of cancers in the family of the patient. Tobacco Information on the smoking status of the patient. Terms occurring in this column are 'smoker' (with qualitative amount in brackets), 'non-smoker', 'passive-smoker' and 'chewer'. Alcohol Information on the drinking status of the patient. Terms occurring in this column are 'drinker' (with qualitative amount in brackets), and 'non-drinker'. Exposure Risk factors to which the patient has been exposed to, such as aflatoxins, radon, thorotrast, etc... Infectious_agent Pathogen (virus or bacteria) detected in the patient. Ref_ID Unique identification number for the reference in which the variant is described. DS_AG ... DP_DL See SpliceAI Prediction on Canonical Transcript (NM_000546.5) PubMed PubMed reference number provided by NCBI. Exclude_analysis Studies that we recommend to exclude from any analysis because of dubious quality. Such studies are identified based on the following criteria: they report several samples with multiple variants, and/or a high proportion of rare variants or variants classified as functional.
This file lists the publications in which are described the variants and gives the method used to detect the variants. Each row (record) represents a citation with an arbitrarily assigned unique identification number (Ref_ID). See standardized annotations for the description of the column content.
Prevalence of TP53 Tumor Variants by Tumor Site
This dataset contains information on the proportion of tumors that carry a TP53 tumor variant extracted from publications contained in the tumor variant dataset, and in additional publications that do not give a detailed description of the variants (many studies do not provide detailed information on each variant detected but rather report their results in the form of summary tables, preventing their inclusion in the tumor variant dataset), or publications reporting negative results (no variant found, thus not included in the tumor variant mutation dataset).
For each study, the total number of tumors or tissue samples analyzed, and the number of these samples which were found to contain a variant is provided.The reference paper, method of variant detection and country of origin of the patients are also indicated. When the same research team published several papers that describe the same set of samples, data from the most recent or more complete paper are used.
Column head Description Prevalence_ID Unique entry identification number. Topography...Morpho_code see sample annotations Sample_analyzed Number of tumor samples analyzed for TP53 variants. Sample_mutated Number of tumor samples with a variant in TP53. Country...Development see patient annotations Comment Any relevant information. Ref_ID...PubMed see reference annotations Tissue_processing....exon 11 see method annotations Exclude_Analysis Studies that we recommend to exclude from any analysis because of data quality issues. Studies are labeled as 'exclude' if: they report several samples with multiple variants in patients with no specific genetic background or exposure to mutagen; they report more variants that are classified as functional or partially functional (based on TA class) than variants classified as non-functional; variants are not precisely described and can not be fully annotated in the database; several variants in the series are reported with errors (such as position and base that do not fit, or report of neutral polymorphisms as tumor variants).
Prevalence of the R249S TP53 Variants in Liver Cancer
This dataset contains data on the prevalence of the c.747G>T (p.R249S) variant in liver cancers. It includes studies that have screened at least exon 7 of TP53 by sequencing, and studies that have searched for this specific variant by RFLP. The presence of this variant in hepatocellular carcinomas has been linked to exposure to aflatoxins and HBV, and may thus constitutes a biomarker of exposure. This dataset has been released with the R15 version of the database and has not been updtaed since then.
The file is a tab-delimited text file, that contains the following info:
Column head Description Ref_ID...PubMed see reference annotations. Country see patient annotations Sample_analyzed Number of tumor samples analyzed for the c.747G>T (p.R249S) TP53 variant. Count_R249S Number of tumor samples containing the c.747G>T (p.R249S) TP53 variant. Remark Any relevant information. Method Comment on method if different from sequencing.
Prognostic Value of TP53 Tumor Variants
This dataset includes information on all studies that have analyzed the relationship between p53 variants and cancer prognosis. For each study, the patient cohort, study settings and a summary of the results are described. When the same research team published several papers with increasing number of patients, the most recent paper with the largest dataset is used.
Many of these studies do not provide detailed information on each variant detected but rather report their results in the form of summary tables. Such publications have been included in the prognosis dataset but not in the tumor variant dataset. For some of them, the variants have been published in a previous paper and can be retrieved with the Cross_Ref_ID study identifier (see below).
The downloaded file contains the following information:
Column head Description Prognosis_ID Unique entry identification number. Topography see sample annotations Morphology see sample annotations Population see patient annotations Country see patient annotations Institution Name of the hospital(s) where the patients have been recruited. Period Time period (year) during which the patients have been recruited. Inclusion criteria ICD-O (3rd edition) or SNOMED code for morphology Treatment Treatment protocol used for most of the patients.
SU, surgery; CX, chemotherapy; RX, radiotherapy; pre-op, pre-operative; CP, cyclophosphamide; CISP, cisplatin; doxo, doxorubicin; 5-FU, 5-fluorouracil
Median FU Median follow-up time of the patients in month. Range FU Range follow-up time of the patients in month. Cohort Number of patients/tumors analyzed for TP53 variants. p53 variants Number of patients/tumors with a variant in TP53. Percent mutated Proportion of mutated tumors (%). Parameter_analyzed Clinical parameter analyzed (patient survival and/or tumor response to treatment). Association Summary result: association with the presence of a TP53 variant. Result Main findings. Ref_ID...PubMed see annotations. Exclude_analysis Papers that we recommend to exclude from analysis because of dubious data quality (report several samples with multiple variants, and/or a high proportion of rare variants or variants classified as neutral or functional).
Germline Variations in LFS/LFL Families
Inherited TP53 variants are associated with a rare autosomal dominant disorder, the Li-Fraumeni syndrome (LFS).
This dataset contains information on individuals that are carriers of a TP53 germline variant and families in which at least one family member has been identied as a carrier of a germline variant in the TP53 gene. Criteria for inclusion are the following: a) individuals carrying a sequenced TP53 germline variant, affected or not by a cancer, b) individuals affected by a cancer and belonging to a family in which at least one family member has been identified as a carrier of a germline variant in the TP53 gene.
Each row (record) in the downloaded file represents a tumor found in an individual having a TP53 germline variant. The file contains the following information:
Column head Description Family_ID Unique family identification number. Family_Code Name or number given in the original publication or an arbitrarily-assigned name, usually the 3 first letters of the first author's name and the publication date. Country see annotations Class Family classification:
LFS = strict clinical definition of Li-Fraumeni syndrome (defined by Li and Fraumeni as a Proband with sarcoma <45 years with a first degree relative with cancer at <45 and another first/second degree relative with cancer at <45 or sarcoma at any age);
LFL = Li-Fraumeni like for the extended clinical definition of Li-Fraumeni (including Birch definition: proband with any childhood cancer or sarcoma, brain tumor or adrenocortical carcinoma at <45 years, with one first or second degree relative with sarcoma, breast cancer, brain tumor, leukemia, or adrenocortical carcinoma at any age, plus one first or second degree relative in the same lineage with any cancer diagnosed under age 60; Eeles definition E1: two different tumors which are part of extended LFS in first or second degree relatives at any age (sarcoma, breast cancer, brain tumor, leukemia, adrenocortical tumor, melanoma, prostate cancer, pancreatic cancer); Eeles definition E2: sarcoma at any age in the proband with two of the following (two of the tumors may be in the same individual): breast cancer at <50 years and/or brain tumor, leukemia, adrenocortical tumor, melanoma, prostate cancer, pancreatic cancer at <60 years or sarcoma at any age).
FH: family history of cancer which does not fulfil LFS or any of the LFL definitions (Birch, Eeles E1 or E2,);
No FH: no family history of cancer.; FH= Family history of cancer (not fulfilling the definition of LFS/LFL); No= no family history of cancer; ?= unknown.
Generations_analyzed Number of generations analyzed in the family. Germline_mutation A TP53 germline variant has been identified. MUT_ID...TAclass see variant annotations Individual_ID Unique identification number for an individual included in the database. It is automatically assigned by the database system. Individual_code Code or number given in the original publication or an arbitrarily-asigned code, usually the family code followed by the position of the individual in the family tree. FamilyCase Family case in the pedigree, such as proband (index case), mother, father... FamilyCase_group Degree of relationship to the proband. Sex Gender of the individual. Germline_carrier TP53 variant status of the individual: confirmed= the individual has been tested for the presence of the variant and the variant has been found; obligatory= the individual has not been tested for the presence of the variant but must be carrier based on the variant status of the other individuals in the pedigree; 50%prob.= there is a chance of 50% that the individual is a variant carrier; negative= the individual has been tested for the presence of the variant and the variant has not been found; NA= the individual has not been tested for the presence of the variant. Mode_of_inheritance Mode of variation inheritance: P=paternal, M=maternal, M&P=maternal and paternal, de novo= variant that has not been inherited, "de novo, mosaic"= variant that has not been inherited and is present in a subpopulation of cells, na=not known. Dead Living status of the individual at time of follow-up. 0=alive; 1=dead Unaffected Disease status of the individual at time of follow-up. 0 = affected by cancer; 1 = not affected by cancer. Age Age of the individual at the time of follow-up. Topography see annotations Morphology see annotations Age at diagnosis Age of the individual at the time of diagnosis of the tumor. Ref_ID Reference number indicating the publication in which the variant is described. This number corresponds to the Ref_ID number in the GermlineRefR20 file. DS_AG ... DP_DL See SpliceAI Prediction on Canonical Transcript (NM_000546.5) PubMedLink A link to the PubMed reference in NCBI.
Each row represents a reference identified by a unique identification number (Ref_ID). See standard annotations for a description of the column content.
This dataset includes studies reporting TP53 germline variant screening in large cohorts of patients selected based on various criteria (family history of cancer, specific cancer diagnosis, ...) Each row represents the result of the analysis of TP53 germline variant status in a selected cohort.
This new dataset includes studies reporting the frequency of individual TP53 germline variants in case-control series using NGS for the screening of the entire coding regions of TP53.
Column head Description MUT_ID....DNEclass see variant annotations Freq_cases Frequency of the TP53 variant in cases. Freq_controls Frequency of the TP53 variant in controls. Total_Cases Number of patients included in the "case" group. N_Cases Number of cases found to carry the specific TP53 variant. Details on variant phenotype may be found in the dataset of germline variants if enough details is provided on patients and tumor type. Total_Controls Number of patients included in the "control" group. N_Controls Number of controls found to carry the specific TP53 variant. DataSource PubMed ID of the paper from which data have been extracted. StudyDetails Description of the case and control groups.
Functional Activities of Missense Variants
Data on the biological properties of p53 mutant proteins in functional assays performed in yeast or human cells, are provided in two datasets.
The functional properties of mutant proteins that are included in this dataset are:
The functional results have been organised in 5 columns for (1) conserved wild-type properties, (2) complete or partial loss of wild-type properties, (3) dominant-negative effects, (4) gain of function and (5) temperature sensitivity. The cell system is indicated in two columns and a detailed reference to the published report is given.
- transcriptional activities on various well-described p53-RE
- dominant negative effects on the activities of wild-type p53
- capacity to induce apoptosis, cell-cycle arrest or checkpoints in human cells
- capacity to transactivate promoters that are not induced by wild-type p53
- ability to promote cell growth and confer tumorigenicity
- sensitivity to temperature changes regarding their ability to transactivate specific promoters
Column head Description Function_ID Unique identification number for each entry. ProtDescription...Structural_motif see variant annotations Codon 72 Amino-acid at codon 72 of p53 (polymorphism) Conserved WT function Functional property of mutant that is similar to the activity of the wild-type protein.
- Activities of mutant proteins in human or yeast cells:
DNAb = DNA binding capacity tested by gel-shift or ChIP assay;
TA = transactivation of a reporter gene under the control of a p53-response element (indicated in brakets, see a list of p53 Target Genes);
TR = transrepression of a reporter gene under the control of a gene-specific response-element (name of gene indicated in brakets)
TETR = capacity to form tetramers;
x binding = interaction with protein x;
drug sensitivity = conserved capacity to mediate cytotoxic effect of drug (specific drug used is indicated, see List of Abbreviations).
- Activities of over-expressed mutant proteins in human cells:
APO = induction of apoptosis;
GS = growth suppression measured by colony forming assay (CFA), establishment of stable clones, or other proliferation assay;
GA = cell cycle arrest measured by FACS;
TUMOR- = inhibition of tumorigenicity in nude mice;
up/downregulation = induction or repression of an endogenous GENE (in upper-case letters) or protein (in lowercase letters);
HR repression = inhibition of Homologous Recombination;
- Biological effect after over-expression in mouse or rat embryonic fibroblasts:
TRANSF- = ability to counteract the transformation of primary cells induced by the co-transfection of ras or another transformant oncogene, such as HPV E7;
"super" indicates that the activity of mutant protein is higher than the one of wtp53 (on transactivation, induction of apoptosis, DNA binding or growth suppression).
Loss of Function WTp53 functional property that is lost by the mutant protein.
Same annotations as in previous column, with partial" indicating that the loss of function is partial (residual activity).
Dominant negative activity Inhibition of the wild-type protein by mutant proteins in transactivation or cell growth assays.
- Yes = the mutant protein counteract the activity of the wild-type protein when the two proteins are co-expressed in human or yeast cells (the p53-response element or cell growth assay performed is indicated in brakets);
- No = the mutant protein does not counteract the effects of the wild-type protein.
"moderate" indicates that the mutant protein has a partial inhibiting effect on the wild-type protein.
Gain of Function Functional properties displayed by the mutant but not by the wild-type protein.
- Activities of over-expressed mutant proteins in human or yeast cells:
same annotations as in column 9, plus:
TUMOR+ = confer tumorigenic property (in nude mice) to transfected cells;
p73 interference = ability to counteract p73 activity when both proteins are expressed in a cell system;
Drug resistance = confer resistance to a cytotoxic drug (see List of Abbreviations);
Growth advantage = increase growth rate.
- Biological effect after over-expression in mouse or rat embryonic fibroblasts:
TRANSF+ = ability to cooperate with ras or another transformant oncogene, such as HPV E7, in the transformation of primary cells.
"moderate" indicates that the mutant protein has a partial effect on the activity studied; "no" indicates that the mutant protein has no effect on the activity studied.
Temperature sensitivity Sensitivity of mutant to temperature changes in transactivation assays (the p53-RE is indicated in brackets), and in other experimental assays (specified in brackets).
Yes = the activity of the mutant protein is affected by the temperature at which is preformed the test;
mut_H = the protein is inactive (mutant) at higher temperatures;
mut_L = the protein is inactive (mutant) at lower temperatures;
No = the activity of the mutant protein is NOT affected by the temperature at which is preformed the test.
Note that functional tests are performed at different temperature in yeast (30℃) and human (37℃) cells. (Read more on Temperature Sensitivity Annotations to learn about its detailed annotation rules.)
Temp_ref Temperature at which experiments have been performed or which has been used as reference for temperature sensitivity assays. Cell assay Human = the activity of the mutant protein has been tested in human cells.
Yeast = the activity of the mutant protein has been tested in the yeast.
cellLines Name of cell-line(s) that have been used for testing mutant activities. "(endo)" indicates that activities have been tested on endogenous mutants. Assay design Indicates if the assay has been performed with or without wtp53 as control, or if activity has been tested on endogenous mutant. Method Details on type of experimental assay that was performed to assess function. FRef_ID...PubMed see reference annotations.
Column head Description ProtDescription...codon number see variant annotations WAF1nWT, MDM2nWT, BAXnWT,... Promoter-specific transcriptional activity measured in yeast functional assays and expressed as percent of wild-type activity. WAF1nWT_Saos2, MDM2nWT_Saos2,... Promoter-specific transcriptional activity measured in the human cell-line Saos-2. Values are normalized with p53-null vector values and expressed as percent of wild-type activity. SubG1nWT_Saos2 Induction of apoptosis by overexpression in Saos-2 cells expressed as percent of wild-type activity. Oligomerisation_yeast Capacity of mutant protein to form oligomer:
TETR=can form tetramer,
DIM=can form dimer but not tetramer,
MON= can not oligomerarize.
TP53 status of Human Cell-Lines
This dataset includes cell-lines that have been screened for TP53 variant and have been published in the scientific literature, or in the Sanger cell-line database or the Broad Cancer cell-line Encyclopedia.
Column head Description Sample_ID Unique sample identification number. Sample_name Name of the cell-line. ATCC_ID Identification number of the ATCC database. Cosmic_ID Link to sample cell-line data in the Cancer Cell Line Project of COSMIC databases of the Sanger Institute. depmap_ID Link to sample cell-line data in the depmap project. Short_topo...Tumor_origin see sample annotations Add_info Sex Gender of the patient from whom the cell-line has been isolated. Age Age at cancer diagnosis of the patient from whom the cell-line has been isolated. Country...Population see patient annotations. Germline_mutation Germline variant in TP53 or any other gene carried by the individual from which the cell-line has been isolated. Infectious_agent Infectious agent (virus or bacteria) detected in the individual from which the cell-line has been isolated. Tobacco Smoking habit of the individual from which the cell-line has been isolated. Alcohol Drinking habit of the individual from which the cell-line has been isolated. Exposure Reported exposure of the individual from which the cell-line has been isolated. KRAS_status Status of KRAS gene. WT= wild-type; MUT=mutant (base change indicated in brackets) Other_mutations Name of other genes in which a variant has been identified. TP53status Status of TP53 gene. WT= wild-type gene sequence; MUT= mutated gene sequence; NULL= entire gene deletion; LOE= loss of gene expression without gene variant. p53_IHC p53 immuno-staining status. p53_LOH Loss of heterozygocity at p53 locus. Yes= LOH, No= no LOH, NI= non informative, NA= no information MUT_ID... TP53 variant description and functional properties, see variant and function annotations. Ref_ID... Same as tumor variant Ref_ID, see reference annotations. Tissue_processing... see method annotations.
Mouse Models with Engineered TP53
The dataset contains mouse models with engineered p53 that were compiled in the caMOD database or reported in the scientific literature. Data curated at caMOD were courteously provided by the caMOD team. Data reported in the literature but not compiled in caMOD were curated at IARC and a link to PubMed abstract is provided. For a detailed description of model genetics and phenotypes, please refer to caMOD and/or original publication
Experimentally Induced Variants
This dataset contains list of variations in the human TP53 gene obtained from mutagenicity assays in the Hupki mouse model (MEF cells treated with the indicated carcinogen agent) or in a yeast assay. See original papers for detailed methods.
Column head Description MUT_ID Unique ID for the variant, used across datasets. Exposure Agents to which were exposed the cells. c_description Variant described on the cDNA sequence. See variant annotations g_description Variant described at the genome level. See variant annotations Model Experimental assay/model used. Clone_ID ID of cell clone isolated from the exposed cell population. Add info Additional details provided on assay or cell clone as derived from original publication. PubMed PMID with link to PubMed abstract that describe the model.
Tumor samples are classified according to standards of the International Classification of Diseases for Oncology (ICD-O 3rd Edition, World Health Organization, Geneva, 2000) and SNOMED.
Column head Description Sample_source Nature of the sample from which the variant has been identified: cell-line, surgery (surgical or autopsy specimen, including fresh samples and archival, pathology specimen), biopsy, xenograft, body fluid (blood, saliva, urine...). Tumor_Origin Origin of the tumor sample. Terms occurring in this column are: primary, secondary (second primary tumor in the same patient), metastasis (with the localisation of the metastasis in brackets), recurrent (tumor recurrence). Topography Site of the tumor defined by organ or group of organs, according to the ICD-O nomenclature. (examples: "colon", "brain", "bronchus and lung"). Note that some tumors are annoted "Head&Neck,NOS" or "Colorectum,NOS" because no detail is given in the original publication (NOS= not otherwise specified).
For the database search tool, a short name is used in place of the ICD-O name (example: "Lung" for "bronchus and lung"). See a numerical list of topographies.
For metastasis, the topography corresponds to the primary site of the tumor and the site of metastasis is indicated in brackets in the tumor_origin field.
Short_topography For the database search tool, a short name is used in place of the ICD-O name (example: "Lung" for "bronchus and lung"). See a numerical list of topographies. Topo_code ICD-O code for topography. Sub_topography Precise identification of anatomic site, organ or tissue. The description given in the publication is translated to ICD-O nomenclature. Morphology Tumor type, including morphology and/or histologic type. The terminology used is based on ICD-O (2nd and 3rd editions) and SNOMED classifications. Terms have been added, such as 'normal tissue' or 'na'. See alphabetical list of morphologies. Morpho_code ICD-O or SNOMED codes for morphology. Grade Information on tumor grade, as given in the cited publication. Stage Information on tumor stage, as given in the cited publication. TNM TNM classification (Tumor size, Node status, Metastasis status) for staging.
Column head Description Sex Sex of the patient (M for male, F for female). Age Age of the patient at the time of diagnosis. Ethnicity Ethnicity of the patient (when available). Groups are defined as: Asian, Black, Caucasian... Country Country/Region in which the patient was living at the time of surgery. When not otherwise specified in the original publication, the country corresponding to the address of the hospital is entered. Population Grouping by population. See the Country/Population Classification Region Grouping by region. See the Country/Region Classification Development Grouping by development status. See Country/Development Classification Geo_area City or region within the country of living of the patient. When not specified in the original publication, the city where the surgery has been done is entered.
The same references (same Ref_ID) are used for the tumor variant, prevalence and prognosis data sets. Independent references are used for the Function and Germline data sets.
Column head Description Ref_ID Unique identification number for a reference. Cross_Ref_ID Ref_ID of a reference containing related data or additional information. Title Title of the publication. Authors List of authors. Year Year of publication. Journal Name of the journal (PubMed catalogue) Volume Volume number. Start_page First page number. End_page Last page of article. PubMed_entry PubMed identification number from NCBI. Comment Any relevant information Exclude_analysis Papers that we recommend to exclude from analysis because of dubious data quality (report several samples with multiple variants, and/or a high proportion of rare variants or variants classified as neutral or functional). WGS_WXS Whole genome or whole exome sequencing study.
Variant Detection Method
List of Abbreviations
|AILD||Angioimmunoblastic Lymphadenopathy with Dysproteinemia|
|ALL||Acute lymphoblastic leukemia|
|AML||Acute myeloid leukemia|
|APC||Adenomatous polyposis coli gene|
|B(a)PDE||Benzo(a)Pyrene Diol Epoxide|
|CIS||Carcinoma in situ|
|CML||Chronic Myeloid Leukemia|
|CMML||Chronic Myelomonocytic Leukemia|
|COSMIC||Catalog Of Somatic Mutations in Cancer|
|DGE||Denaturant gel electrophoresis|
|Duke’s||Classification for colon cancer|
|FAB||FAB classification for ALL|
|FAP||Familial adenomatous polyposis|
|FGO||Female Genital Organs|
|FIGO||Classification for gynecological cancers|
|H’sD, HD||Hodgkin’s disease|
|HBV||Hepatitis B virus|
|HCV||Hepatitis C virus|
|HNPCC||Hereditary Nonpolyposis Colorectal Cancer|
|HPV||Human papilloma virus|
|HNSCC||Head and neck squamous cell carcinoma|
|ICD-O 3rd||International classification of disease for oncology|
|KAT||Potassium Antimony Tartrate|
|LFL||Li-Fraumeni like syndrome|
|LOF||Loss of function|
|MDE||Hydrolink variant detection enhancement|
|MGO||Male Genital Organs|
|Moderately dif.||Moderately differentiated|
|Mx||Presence of distant metastasis according to the TNM classification|
|NA||Not applicable/Not available|
|NES||Nuclear Export Signal|
|NLS||Nuclear Localization Signal|
|NOS||Not otherwise specified|
|NOS2||Nitric oxide synthase 2|
|NSCLC||Non-small cell lung cancer|
|Nx||Extent of regional LN metastasis according to the TNM classification|
|Poorly dif.||Poorly differentiated|
|RA, RAEB, RAS||Classification for Myelodysplastic syndrome|
|RFLP||Restriction fragment length polymorphism|
|SCLC||Small cell lung cancer|
|SSCP||Single strand conformation polymorphism|
|TGGE||Temperature gradient gel electrophoresis|
|Tx||Extent of primary tumor according to the TNM classification|
|TxNxMx||TNM classification (tumor stage, presence or absence of LN metastasis, presence or absence of distant metastasis)|
|Well dif.||Well differentiated|
|WGS||Whole genome screen|
Dataset Exploration Options
Graphs and Search Options
- Functional / Structural Data. This option allows the functional and structural analysis of all possible single nucleotide substitutions in TP53 exonic sequences (including those that have never been reported in cancer). In addition, all other types of variations that have been reported in human samples and validated polymorphisms are included in this dataset. Functional and structural annotations and frequency statistics for these gene variations can be retrieved with this search option. Each dataset entry corresponds to a unique gene variation.
- Tumor variants. This option allows the retrieval and analysis of TP53 variants reported as somatic events in tumor samples and cell-lines. Each dataset entry corresponds to a variant identified in a human sample.
- Tumor variant prevalence. This option allows the analysis of the prevalence of TP53 tumor variants by cancer type and population groups. Each dataset entry corresponds to the prevalence of TP53 variant for a specific type of cancer in a defined human population.
- Germline variants. This option allows the retrieval and analysis of TP53 variants reported as germline events in human individuals. Each dataset entry corresponds to a tumor identified in an individual carrier of a TP53 germline variant. The searchable dataset only includes cancer-affected individuals who are confirmed or obligatory carrier of a TP53 variant (data on non-affected carriers or non-confirmed carrier can be retrieved by downloading the full dataset with the 'data downloads' option).
- Germline variant prevalence. This table lists diferent studies reporting the prevalence of TP53 germline variant in selected groups of individuals.
- Cell-lines. This option allows the retrieval and analysis of TP53 variants reported in human cell-lines. Each dataset entry corresponds to a variant identified in a cell-line.
- Mouse models. This option allows the display or download of the description of mouse models with engineered p53 that are compiled in the caMOD database or reported in the scientific literature. Links to caMOD database are available for further details on the model phenotypes.
Variant Distribution Graphs
- Variant type. Proportion of variations classified by their nature (base change, insertions, deletions....): number of variations of each class divided by the total number of variants selected (% is shown).
- Codon distribution. Proportion of exonic point variants at each codon position: number of variations at each codon position divided by the total number of exonic variants selected (% is shown).
- Exon/intron distribution. Proportion of variations in each exon/intron: number of variations within each Exon/intron divided by the total number of variations selected (% is shown).
- 3D JMOL graph. Residues (within the central domain of p53 protein -codons 96 to 289) are highlighted according to the proportion of exonic variants at this position (start site of variant) among all selected variants: number of variations at each codon position divided by the total number of exonic variants selected: red colored are the most frequently mutated, yellow colored the less frequently mutated, orange are intermediate.
- Variant effect. Proportion of variations classified according to their predicted effect on protein sequence (missense, nonsense, frameshift ins/del, …): number of variations of each class divided by the total number of variants selected (% is shown).
- Point variant. Proportion of single amino-acid substitutions classified according to their predicted effect on protein sequence (missense, nonsense, silent): number of variations of each class divided by the total number of point variants selected (% is shown).
- Point variant scatter-plot. Each dot represent a specific point variant, colored according to their predicted effect on protein sequence (missense in blue, nonsense in red and silent in green); the x axis shows the proportion of the specific variant in the selected dataset (% of total point variants in the selected dataset); the Y axis shows the predicted variant rate for the particular point variant (see variant annotations).
- SIFT. Proportion of missense variants classified according to their predicted deleterious/damaging or neutral/tolerated effect based on SIFT algorithm: number of variations of each class divided by the total number of missense variants selected (% is shown).
- SIFT scatter-plot. Each dot represent a specific point variant, colored according to their predicted deleterious/damaging or neutral/tolerated effect based on SIFT algorithm; the x axis shows the proportion of the specific variant in the selected dataset (% of total point variants in the selected dataset); the Y axis shows the predicted variant rate for the particular point variant (see variant annotations).
- Transactivation. Proportion of missense variants classified according to their experimentally measured transactivation activities (based on FASAY): number of variations of each class divided by the total number of missense variants selected (% is shown).
- Transactivation scatter-plot. Each dot represent a specific point variant, colored according to their experimentally measured transactivation activities; the x axis shows the proportion of the specific variant in the selected dataset (% of total point variants in the selected dataset); the Y axis shows the predicted variant rate for the particular point variant (see variant annotations).
Tumor Distribution Graphs
- Germline data. Distribution of tumor sites associated with the selected variants; number of tumors classified by tumor site divided by total number of tumors observed in individuals carriers of the selected variants (% is shown).
- Tumor variant data. Distribution of tumor sites associated with the selected variants; number of variations classified by tumor site divided by total number of variations observed in tumors carrying the selected variants (% is shown).
- Gene variation data, tumor variant graph. Proportion of the selected variants among all variants reported in the database by tumor sites; number of selected variants classified by tumor site divided by total number of variations in the database for each tumor sites (% is shown).
- Gene variation data, germline graph. Distribution of tumor sites associated with the selected variants; number of tumors classified by tumor site divided by total number of tumors observed in individuals carriers of the selected variants (% is shown).
- Variant prevalence. Proportion of mutated samples by cancer site (topography graph), cancer type (morphology graph), or by country of origin of the patients (country graph); number of mutated samples divided by total number of samples analyzed (% is shown).
Dealing with variants screened from RNA
Descriptions of deletions, insertions, and other complex variants
Some variants present in the tumor variant dataset may not be retrieved with the 'Functional/Structural Data' search option
- Variants retrieved with this option only include gene variation that are fully described, while in the dataset of tumor variants some variants are not fully described.
- Tumor variants may be reported in individuals with different SNP status. If a variant is close to a SNP, it may have a different impact on the protein sequence depending on the SNP status. For example, the variant c.637C>T on the first base of codon 213, will result in a p.R213X change in the protein sequence if the SNP present on the third base of the codon is an A (CGA>TGA), while it will result in a p.R213W change if the SNP present on the third base of the codon is a G (CGG>TGG). Variants not described from the reference sequence are included in the tumor variant dataset, but not in the Gene variation dataset.
Some variant numbers retrieved from the variant prevalence and variant spectrum datasets may be different
- Because we retrieve data from papers, and in many papers the only information that can be extracted is the total number of samples analyzed and total number of samples mutated (variants are not described in details), variants cannot be included in the tumor variant dataset. Thus, numbers in the prevalence table do not match numbers in the tumor variant dataset (variant spectrum).
- Numbers by histology may also differ, as for example, a paper may contain variant details for lung ADC, SCC, LCC (which are all non-small cell lung cancers), but total numbers of samples analyzed for each histology is not available. In this case, variants corresponding to ADC, SCC and LCC will be entered in the tumor variant dataset but in the prevalence table the prevalence will be indicated only for non-small cell carcinoma (group that includes the 3 tumor types).
- Cell-lines are not included in the prevalence count.
- Samples with more than one TP53 variant are counted once in the prevalence table while all variants are entered in the tumor variant dataset.
- The prevalence may be missing for some papers that describe variants included in the tumor variant dataset. The prevalence dataset has been added in a recent version of the database (2001 while the database started in 1994) and not all papers have been reviewed. The non-reviewed papers correspond mainly to publications that describe less than 10 variants (about 400 papers). For some papers, the prevalence could not be retrieved from the information provided in the publication (about 100 papers).
For specific questions on The TP53 Database and interpretation of search results, please email:
For general questions on the ISB-CGC website platforms, please email:
Provide the type of browser you are using and the steps to recreate the issue.