User Manual

The TP53 Database compiles various types of data and information from the literature and generalist databases on human TP53 gene variants related to cancer.

Data are organized in datasets that can be searched and analyzed with graph tools (see below) or fully downloaded to perform custom analyses. Downloaded files, annotations, and graph tools are described below.

Datasets and Annotations

Functional / Structural Data

MutationView_r20.csv

Column head	Description
MUT_ID	Unique identifier of each gene variation reported in the database. This identifier is used in all datasets (tumor variant, polymorhisms, germline).
hg19_Chr17_coordinates	Chromosome coordinate of variation: start position based on hg19 human genome assembly.
hg38_Chr17_coordinates	Chromosome coordinate of variation: start position based on hg38 human genome assembly.
ExonIntron	Location of the variant in the introns or exons in TP53 gene for the reference sequence NM_000546.5. Terms occurring in this column are "1-intron"' to "11-intron" and "2-exon" to "11-exon". An "i" or "e" in front mean that the variant is located within the indicated intron or exon with no information on the precise location.
Codon_number	For variants in exons, codon number at which the variant is located (1-393). If a variant spans more than one codon, (e.g. tandem variant or deletion of several bases) only the first (5') codon is entered. For variants in introns, 0 is entered.
WT_nucleotide	Base in the reference sequence at the position of the variant on the coding sequence.
Mutant_nucleotide	Mutant base, described on the coding strand.
Description	Nucleotide change read from the coding sequence. For deletions and insertions, the number of bases deleted (del) or inserted (ins) is given. For more complex variant events, a full description is given as indicated in the original publication.
c_description	Variant nomenclature according to HGVS standards and using the NM_000546.5 coding sequence as reference.
g_description	Variant nomenclature according to HGVS standards and using the GenBank NC_000017.10 (hg19 assembly) genomic sequence as reference.
g_description_GRCh38	Variant nomenclature according to HGVS standards and using the GenBank NC_000017.11 (hg38 assembly) genomic sequence as reference.
Type	Nature of the variant. The terms occurring in this column are "A:T>C:G" (A to C or T to G base change), "A:T>G:C" (A to G or T to C base change), "A:T>T:A" (A to T or T to A base change), "G:C>A:T" (G to A or C to T base change at non CpG sites), "G:C>A:T at CpG" (G to A or C to T base change at CpG sites), "G:C>C:G" (G to C or C to G base change), "G:C>T:A" (G to T or C to A base change), "tandem" (two consecutive base changes), "ins" (insertion), "del" (deletion) and "complex" (complex changes).
Splice_site	Annotation on the position of the variant within conserved nucleotides of p53 consensus, criptic or alternative splice sites: consensus SD or SA= the variant is located at conserved dinucleotides involved in p53 consensus splice sites (SD for splice donor site, SA for splice acceptor site) producing the full-lenght p53 protein (TA isoform); criptic SD or SA= the variant is located at conserved dinucleotides involved in splice sites (gt or ag) that have been observed experimentally in p53 sequences carrying mutated consensus splice sites; alternative SD or SA= conserved dinucleotides involved in splice sites (gt or ag) responsible for producing p53 isoforms beta and gamma; alternative = mutated nucleotides are in the "cassette" sequence responsible for producing the p53 delta isoform; no= the position is outside the above mentioned nucleotides. Information on splice site
CpG_site	Yes or No indicate if the position of the variant falls within a CpG site or not.
Context_coding	Trinucleotide sequence context of variants. The 5' base and 3' base of the start position of the variant are indicated on the left and right respectively of the mutated base. This context is provided on the coding strand of the gene sequence and is based on hg38 TP53 sequence.
Mut_rate	Substitution rates were calculated for all single base substitutions in the coding sequence of p53 according to the dinucleotide substitution rates derived from human-mouse aligned sequences of chromosomes 21 and 10 (Lunter and Hein 2004). The variant probabilities for a given single nucleotide substitution are calculated by averaging the dinucleotide substitution rates at that position for the forward and reverse strands.
WT_codon	For variants in exons, sequence of the codon in which the variant occurred in NM_000546.5 transcript.
Mutant_codon	Base sequence of the mutated codon in NM_000546.5 transcript.
WT_AA	Wild-type amino acid encoded at the codon in which the variant occurred (three-letter amino acid abbreviation). Check AA Letter Code and Genetic Code
Mutant_AA	Mutated amino acid encoded at the codon in which the variant occurred (three-letter amino acid abbreviation). The chain terminating variants due to single base substitutions are designated by "stop". Check AA Letter Code and Genetic Code
ProtDescription	Variant description at the protein level as recommended by HGVS and using the Uniprot reference sequence P04637.
Mut_rateAA	Variant rate of amino-acid substitution calculated by summing up the nucleotide substitution rates. This value is only valid for amino-acid substitutions resulting from single nucleotide substitutions.
Effect	Effect of the variant. The terms occurring in this column are: missense (change of one amino-acid), nonsense (introduction of a stop codon), FS (frameshift), silent (no change in the protein sequence), splice (variants located in the two first and two last conserved nucleotides of the introns and are thus predicted to alter splicing, or variants that have been shown to alter splicing experimentally), other (inframe deletions or insertions), intronic (variants in introns outside splicing sites), NA (variants upstream in 5' or 3' UTR).
Polymorphism	Polymorphic status of the gene variation. Validated : MAF > 0.001 in ESP6500, 1000G or gnomAD databases; No : not reported or reported at MAF < 0.001 in ESP6500, 1000G, or gnomAD databases; NA : not applicable.
SNPlink	Link to NCBI SNP database.
gnomADlink	Link to gnomAD database.
SourceDatabases	SNP databases from which the variants have been extracted.
PubMedlink	PubMedID of the publications in which was reported the polymorphic status of the variant.
Domain function	Function of the domain in which the mutated residue is located.
Residue function	Known function of the wild-type residue. When the function is not known but the structure is known, the solvent accessibility (SA) of the residue is indicated by the terms buried, exposed or partially exposed (SA calculated with Naccess and 1TSR (chain B) structure of p53: <20 = buried, > =20 and <50 = partially exposed, > =50 = exposed).
Hotspot	"Yes" indicate the a variant is located in a codon defined as a cancer hotspot by Chang (2017).
Structural_motif	2D and 3D motifs where the variant is located according to structures described in Cho et al. (1994) and May and May (1999)
SA	Solvent accessibility of the wild-type residue as calculated with Naccess and the 1TSR (chain B) structure of p53.
AGVGD class	Missense variant functional predictions by an optimized Align-GVGD tool. Variants classified as "C0" are considered tolerated while other classes are considered damaging. Further details in Fortuno (2018).
BayesDel	Missense variant functional predictions by BayesDel tool (Feng 2017) used without allele frequency. Score >=0.16: damaging Score <0.16: tolerated Further details in Fortuno (2018).
REVEL	Missense variant functional predictions by REVEL tool (Ioannidis 2016). Score >=0.5: damaging Score <0.5: tolerated Further details in Fortuno (2018).
SIFT class	Functional classification based on SIFT program using default settings. Missense variants are classified as "damaging" or "tolerated".
PolyPhen2	Functional classification based on PolyPhen2 HVAR annotations retrieved with Annovar software: D: probably Damaging P: Possibly damaging B: Benign.
Transactivation	Promoter-specific transcriptional activity measured in yeast functional assays and expressed as percent of wild-type activity. Data from Kato (2003)
TransactivationClass	Functional classification based on the overall transcriptional activity (TA) on 8 different promoters as measured in yeast assays by Kato et al. For each mutant, the median of the 8 promoter-specific activities (expressed as percent of the wild-type protein) is calculated and missense variants are classified as "non-functional" if the median is <=20, "partially functional" if the median is >20 and <=75, "functional" if the median is >75 and <=140, and "supertrans" if the median is >140.
DNE_LOFclass	Functional classification for loss of growth-suppression and dominant-negative activities based on Z-scores from Giacomelli et al., (2018) study: DNE_LOF = p53WTNutlin3 Z-score >= 0.61 and Etoposide Z-score <= -0.21; notDNE_notLOF = p53WTNutlin3 Z-score < 0.61 and Etoposide Z-score > -0.21; notDNE_LOF if p53WTNutlin3 Z-score < 0.61 and Etoposide Z-score <= -0.21; unclass = others
DNE class	Dominant-negative (DN) Effect on transactivation by wild-type p53. Classification established for mutants for which available DN activity on more than 2 p53-response elements is available. Data are based on WAF1 and RGC promoters in various studies (these promoters were the most frequently used in different studies to assess DNE status), and on two large systematic study (Dearth et al that includes 76 mutants; Monti et al that includes 104 mutants). Mutants were classified as "Yes" if they had dominant-negative activity on both WAF1 and RGC promoters, or on all promoters in the large studies, "Moderate" if they had dominant-negative activity on some but not all promoters, and "No" if they had no dominant-negative activity on both WAF1 and RGC promoters, or none of the promoters in the large studies.
Structure/Function class	Functional predictions derived from a computer model that takes into account the 3D structure of WT and mutant proteins and is trained on the transactivaton dataset from Kato et al. Variants are classified as "functional" or "non-functional". (Read more details on Structure-Function Predictions Based on Scores Derived from Delaunay Tessellations.)
EffectGroup3	Variant classification based on protein 3D structure and variant type. This classification has been used to derive gentoype-phenotype correlations in sporadic breast cancers (Olivier et al., 2006). 0=silent+intron; 1=missense in DNA-binding loops(L2,H1,L3,L1,S2,S2',H2); 2=other missense; 3=inFrame del/ins; 4=FS+splice+nonsense.
SwissProtLink	SwissProt identification number with link to the variant page of the SwissProt database.
Somatic_count	Number of occurrence in the tumor variant dataset (number of tumors reported to carry this tumor variant). Total count is 29,891 in R20.
Germline_count	Number of occurrence in the IARC germline dataset (number of pedigree/individual carriers of this germline variant). Total count is 1,532 in R20.
CellLine_count	Number of occurrence in the IARC cell-line dataset (number of cell-lines reported to carry this variant). Total cell-line count is 2,766 in R20.
COSMIClink	Link to variant ID in COSMIC database.
CLINVARlink	Link to ClinVar database.
TCGA_ICGC_GENIE_count	Sum of variation occurrence from TCGA (MC3), ICGC (v28) and GENIE (V5) datasets. Total count is 23,570.

Predicted effect on splicing:

Column head	Description
Site Type	Indicate if the predicted splice site is an acceptor site or donor site.
p53 Site	Indicate if the predicted splice site correspond to a canonical p53 splice site.
WT score	Fit score of the predicted splice site for the non-mutated sequence (scores are specific of prediction tools).
MUT score	Fit score of the predicted splice site for the mutated sequence (scores are specific of prediction tools).
Variation	Predicted effect of the variant on the predicted splice site.
Source	Prediction tool used.

Predicted effect on p53 protein isoforms:

The predictions provided are based on whether the variant falls within the specific isoform. Read more on p53 Isoforms

Column head	Description
TAp53alpha	Indicate if the variant fall within the canonical isoform coding for the full length p53 protein.
TAp53beta....deltap53alpha	Indicate if the variant fall within the specified isoform.

SpliceAI Prediction on Canonical Transcript (NM_000546.5):

Column head	Description
SpliceAI_DS_AG	SpliceAI delta score - acceptor gain: The highest delta score of a variant for acceptor gain (AG), ranging from 0 to 1 and can be interpreted as the probability that the variant affects splicing at any position within a window around it (10kb as maximum distance, as recommended for best performance [references: PMID 34283047 and PMID: 33942434]). The maximum of four delta scores (DS_AG, DS_AL, DS_DG, DS_DL) is used as basis for spliceogenicity predictions.
SpliceAI_DS_AL	SpliceAI delta score - acceptor loss: The highest delta score of a variant for acceptor loss (AL), ranging from 0 to 1 and can be interpreted as the probability that the variant affects splicing at any position within a window around it (10kb as maximum distance, as recommended for best performance [references: PMID 34283047 and PMID: 33942434]). The maximum of four delta scores (DS_AG, DS_AL, DS_DG, DS_DL) is used as basis for spliceogenicity predictions.
SpliceAI_DS_DG	SpliceAI delta score - donor gain: The highest delta score of a variant for donor gain (DG), ranging from 0 to 1 and can be interpreted as the probability that the variant affects splicing at any position within a window around it (10kb as maximum distance, as recommended for best performance [references: PMID 34283047 and PMID: 33942434]). The maximum of four delta scores (DS_AG, DS_AL, DS_DG, DS_DL) is used as basis for spliceogenicity predictions.
SpliceAI_DS_DL	SpliceAI delta score - donor loss: The highest delta score of a variant for donor loss (DL), ranging from 0 to 1 and can be interpreted as the probability that the variant affects splicing at any position within a window around it (10kb as maximum distance, as recommended for best performance [references: PMID 34283047 and PMID: 33942434]). The maximum of four delta scores (DS_AG, DS_AL, DS_DG, DS_DL) is used as basis for spliceogenicity predictions.
SpliceAI_DP_AG	SpliceAI delta point - acceptor gain: The location of acceptor gain (AG) relative to variant position in the pre-mRNA transcript (positive values are upstream of the variant, negative values are downstream).
SpliceAI_DP_AL	SpliceAI delta point - acceptor loss: The location of acceptor loss (AL) relative to variant position in the pre-mRNA transcript (positive values are upstream of the variant, negative values are downstream).
SpliceAI_DP_DG	SpliceAI delta point - donor gain: The location of donor gain (DG) relative to variant position in the pre-mRNA transcript (positive values are upstream of the variant, negative values are downstream).
SpliceAI_DP_DL	SpliceAI delta point - donor loss: The location of donor loss (DL) relative to variant position in the pre-mRNA transcript (positive values are upstream of the variant, negative values are downstream).

Tumor Variants Found in Human Tumor Samples

TumorVariantDownload_r20.csv

This dataset contains TP53 tumor variants identified in human tumor samples (including metastasis and cell-lines). It includes data on the type and position of variations, detailed information on the tumor in which the variants have been found, and on various characteristics of the patients in which the tumor developed. Please note that true somatic origin of the tumor variants may not have been confirmed unless a matched normal sample was used to filter out germline variants. Therefore, it cannot be excluded that a small number of the variants listed may be of germline origin.

Each row in the downloaded tab-delimited text file represents a single variant reported in a tumor sample with an arbitrarily assigned unique identification number. A unique identification number is also attributed to the tumor sample and to the patient. Table content is as follows:

Column head	Description
The first set of columns describe the variant.
Mutation_ID	Unique identification number for a Sample/Variant association. Tandem variants (two adjacent base substitutions) are considered as one variant event; therefore tandem variants have only one identification number and are a single record.
MUT_ID....SwissProtLink	see variant annotations
The second set of columns are assigned to the description of the organ site, tissue and type of lesion in which the variant has been identified. The descriptions given in the publication are translated into the standards of the International Classification of Diseases for Oncology (ICD-O 3rd Edition, World Health Organization, Geneva, 2000) and SNOMED. For information on tumor classification, grading and staging, check out ICD-O training at SEER and Cancer Information at NCI.
Sample_Name	A sample name is assigned as follows: first 3 letters of the first author's name, year of publication (2 digit), followed by the ID number indicated in the publication. The same name or number can occur several times as in some samples more than one variant has been reported.
Sample_ID	Unique sample identification number. This number allows the automatic retrieval of samples with multiple variants.
Sample_source...TNM	see tumor annotations
p53_IHC	p53 immunostaining graded as ‘positive’, ‘negative’ or ‘+/-‘. ND stands for not done.
Add_Info	Any relevant additional information is entered here.
The third set of columns are assigned to the description of the patient origin and life-style. They contain heterogeneous notes, usually comments emphasized by authors reporting the variants. It should be noted that this information is generally qualitative. No quantitative information on exposure of risk factors is included in the database. This information does not presuppose that a formal, causal link has been established between such factors and the variant described. Moreover, for most exogenous risk factors, individual exposure has not been monitored. This information is given solely to (i) permit the retrieval of variations found in patients belonging to defined groups or having specific risk factors, and (ii) facilitate access to the corresponding publications. For detailed comparison between exposure groups, users are invited to perform their own analysis based on the information given in the original publication.
Individual_ID	Unique identification number for an individual included in the database. It is automatically assigned by the database system.
Sex...Country	see patient annotations
TP53polymorphism	Presence of a polymorphism in TP53 gene.
Germline_mutation	Germline variant detected in any gene in the patient.
Family_history	Information on the presence or absence of cancers in the family of the patient.
Tobacco	Information on the smoking status of the patient. Terms occurring in this column are 'smoker' (with qualitative amount in brackets), 'non-smoker', 'passive-smoker' and 'chewer'.
Alcohol	Information on the drinking status of the patient. Terms occurring in this column are 'drinker' (with qualitative amount in brackets), and 'non-drinker'.
Exposure	Risk factors to which the patient has been exposed to, such as aflatoxins, radon, thorotrast, etc...
Infectious_agent	Pathogen (virus or bacteria) detected in the patient.
Ref_ID	Unique identification number for the reference in which the variant is described.
DS_AG ... DP_DL	See SpliceAI Prediction on Canonical Transcript (NM_000546.5)
PubMed	PubMed reference number provided by NCBI.
Exclude_analysis	Studies that we recommend to exclude from any analysis because of dubious quality. Such studies are identified based on the following criteria: they report several samples with multiple variants, and/or a high proportion of rare variants or variants classified as functional.

TumorVariantRefDownload_r20.csv

This file lists the publications in which are described the variants and gives the method used to detect the variants. Each row (record) represents a citation with an arbitrarily assigned unique identification number (Ref_ID). See standardized annotations for the description of the column content.

Prevalence of TP53 Tumor Variants by Tumor Site

PrevalenceDownload_r20.csv

This dataset contains information on the proportion of tumors that carry a TP53 tumor variant extracted from publications contained in the tumor variant dataset, and in additional publications that do not give a detailed description of the variants (many studies do not provide detailed information on each variant detected but rather report their results in the form of summary tables, preventing their inclusion in the tumor variant dataset), or publications reporting negative results (no variant found, thus not included in the tumor variant mutation dataset).

For each study, the total number of tumors or tissue samples analyzed, and the number of these samples which were found to contain a variant is provided.The reference paper, method of variant detection and country of origin of the patients are also indicated. When the same research team published several papers that describe the same set of samples, data from the most recent or more complete paper are used.

Column head	Description
Prevalence_ID	Unique entry identification number.
Topography...Morpho_code	see sample annotations
Sample_analyzed	Number of tumor samples analyzed for TP53 variants.
Sample_mutated	Number of tumor samples with a variant in TP53.
Country...Development	see patient annotations
Comment	Any relevant information.
Ref_ID...PubMed	see reference annotations
Tissue_processing....exon 11	see method annotations
Exclude_Analysis	Studies that we recommend to exclude from any analysis because of data quality issues. Studies are labeled as 'exclude' if: they report several samples with multiple variants in patients with no specific genetic background or exposure to mutagen; they report more variants that are classified as functional or partially functional (based on TA class) than variants classified as non-functional; variants are not precisely described and can not be fully annotated in the database; several variants in the series are reported with errors (such as position and base that do not fit, or report of neutral polymorphisms as tumor variants).

Prevalence of the R249S TP53 Variants in Liver Cancer

This dataset contains data on the prevalence of the c.747G>T (p.R249S) variant in liver cancers. It includes studies that have screened at least exon 7 of TP53 by sequencing, and studies that have searched for this specific variant by RFLP. The presence of this variant in hepatocellular carcinomas has been linked to exposure to aflatoxins and HBV, and may thus constitutes a biomarker of exposure. This dataset has been released with the R15 version of the database and has not been updtaed since then.

The file R249Sprevalence_IARC_R15.txt is a tab-delimited text file, that contains the following info:

Column head	Description
Ref_ID...PubMed	see reference annotations.
Country	see patient annotations
Sample_analyzed	Number of tumor samples analyzed for the c.747G>T (p.R249S) TP53 variant.
Count_R249S	Number of tumor samples containing the c.747G>T (p.R249S) TP53 variant.
Remark	Any relevant information.
Method	Comment on method if different from sequencing.

Prognostic Value of TP53 Tumor Variants

PrognosisDownload_r20.csv

This dataset includes information on all studies that have analyzed the relationship between p53 variants and cancer prognosis. For each study, the patient cohort, study settings and a summary of the results are described. When the same research team published several papers with increasing number of patients, the most recent paper with the largest dataset is used.

Many of these studies do not provide detailed information on each variant detected but rather report their results in the form of summary tables. Such publications have been included in the prognosis dataset but not in the tumor variant dataset. For some of them, the variants have been published in a previous paper and can be retrieved with the Cross_Ref_ID study identifier (see below).

The downloaded file contains the following information:

Column head	Description
Prognosis_ID	Unique entry identification number.
Topography	see sample annotations
Morphology	see sample annotations
Population	see patient annotations
Country	see patient annotations
Institution	Name of the hospital(s) where the patients have been recruited.
Period	Time period (year) during which the patients have been recruited.
Inclusion criteria	ICD-O (3rd edition) or SNOMED code for morphology
Treatment	Treatment protocol used for most of the patients. SU, surgery; CX, chemotherapy; RX, radiotherapy; pre-op, pre-operative; CP, cyclophosphamide; CISP, cisplatin; doxo, doxorubicin; 5-FU, 5-fluorouracil
Median FU	Median follow-up time of the patients in month.
Range FU	Range follow-up time of the patients in month.
Cohort	Number of patients/tumors analyzed for TP53 variants.
p53 variants	Number of patients/tumors with a variant in TP53.
Percent mutated	Proportion of mutated tumors (%).
Parameter_analyzed	Clinical parameter analyzed (patient survival and/or tumor response to treatment).
Association	Summary result: association with the presence of a TP53 variant.
Result	Main findings.
Ref_ID...PubMed	see annotations.
Exclude_analysis	Papers that we recommend to exclude from analysis because of dubious data quality (report several samples with multiple variants, and/or a high proportion of rare variants or variants classified as neutral or functional).

Germline Variations in LFS/LFL Families

GermlineDownload_r20.csv

Inherited TP53 variants are associated with a rare autosomal dominant disorder, the Li-Fraumeni syndrome (LFS).

This dataset contains information on individuals that are carriers of a TP53 germline variant and families in which at least one family member has been identied as a carrier of a germline variant in the TP53 gene. Criteria for inclusion are the following: a) individuals carrying a sequenced TP53 germline variant, affected or not by a cancer, b) individuals affected by a cancer and belonging to a family in which at least one family member has been identified as a carrier of a germline variant in the TP53 gene.

Each row (record) in the downloaded file represents a tumor found in an individual having a TP53 germline variant. The file contains the following information:

Column head	Description
Family_ID	Unique family identification number.
Family_Code	Name or number given in the original publication or an arbitrarily-assigned name, usually the 3 first letters of the first author's name and the publication date.
Country	see annotations
Class	Family classification: LFS = strict clinical definition of Li-Fraumeni syndrome (defined by Li and Fraumeni as a Proband with sarcoma <45 years with a first degree relative with cancer at <45 and another first/second degree relative with cancer at <45 or sarcoma at any age); LFL = Li-Fraumeni like for the extended clinical definition of Li-Fraumeni (including Birch definition: proband with any childhood cancer or sarcoma, brain tumor or adrenocortical carcinoma at <45 years, with one first or second degree relative with sarcoma, breast cancer, brain tumor, leukemia, or adrenocortical carcinoma at any age, plus one first or second degree relative in the same lineage with any cancer diagnosed under age 60; Eeles definition E1: two different tumors which are part of extended LFS in first or second degree relatives at any age (sarcoma, breast cancer, brain tumor, leukemia, adrenocortical tumor, melanoma, prostate cancer, pancreatic cancer); Eeles definition E2: sarcoma at any age in the proband with two of the following (two of the tumors may be in the same individual): breast cancer at <50 years and/or brain tumor, leukemia, adrenocortical tumor, melanoma, prostate cancer, pancreatic cancer at <60 years or sarcoma at any age). FH: family history of cancer which does not fulfil LFS or any of the LFL definitions (Birch, Eeles E1 or E2,); No FH: no family history of cancer.; FH= Family history of cancer (not fulfilling the definition of LFS/LFL); No= no family history of cancer; ?= unknown.
Generations_analyzed	Number of generations analyzed in the family.
Germline_mutation	A TP53 germline variant has been identified.
MUT_ID...TAclass	see variant annotations
Individual_ID	Unique identification number for an individual included in the database. It is automatically assigned by the database system.
Individual_code	Code or number given in the original publication or an arbitrarily-asigned code, usually the family code followed by the position of the individual in the family tree.
FamilyCase	Family case in the pedigree, such as proband (index case), mother, father...
FamilyCase_group	Degree of relationship to the proband.
Sex	Gender of the individual.
Germline_carrier	TP53 variant status of the individual: confirmed= the individual has been tested for the presence of the variant and the variant has been found; obligatory= the individual has not been tested for the presence of the variant but must be carrier based on the variant status of the other individuals in the pedigree; 50%prob.= there is a chance of 50% that the individual is a variant carrier; negative= the individual has been tested for the presence of the variant and the variant has not been found; NA= the individual has not been tested for the presence of the variant.
Mode_of_inheritance	Mode of variation inheritance: P=paternal, M=maternal, M&P=maternal and paternal, de novo= variant that has not been inherited, "de novo, mosaic"= variant that has not been inherited and is present in a subpopulation of cells, na=not known.
Dead	Living status of the individual at time of follow-up. 0=alive; 1=dead
Unaffected	Disease status of the individual at time of follow-up. 0 = affected by cancer; 1 = not affected by cancer.
Age	Age of the individual at the time of follow-up.
Topography	see annotations
Morphology	see annotations
Age at diagnosis	Age of the individual at the time of diagnosis of the tumor.
Ref_ID	Reference number indicating the publication in which the variant is described. This number corresponds to the Ref_ID number in the GermlineRefR20 file.
DS_AG ... DP_DL	See SpliceAI Prediction on Canonical Transcript (NM_000546.5)
PubMedLink	A link to the PubMed reference in NCBI.

GermlineRefDownload_r20.csv

Each row represents a reference identified by a unique identification number (Ref_ID). See standard annotations for a description of the column content.

GermlinePrevalenceView_r20.csv

This dataset includes studies reporting TP53 germline variant screening in large cohorts of patients selected based on various criteria (family history of cancer, specific cancer diagnosis, ...) Each row represents the result of the analysis of TP53 germline variant status in a selected cohort.

Column head	Description
Diagnosis	Tumor site or clinical description of the selected cohort.
Cohort	Detailed criteria for patient selection.
Cases analyzed	Number of patients included in the variant screen.
Cases mutated	Number of patients found to carry a TP53 variant. Details on variants can be found in the dataset of germline variants when the information was provided, but many studies do not provide a detailed list of variations.
Variant prevalence	Percent of mutated cases.
Remark	Any further information on the cohort or method.
PubMed	PubMed ID with link to ncbi database.

GermlineFrequencyDownload_r20.csv

This new dataset includes studies reporting the frequency of individual TP53 germline variants in case-control series using NGS for the screening of the entire coding regions of TP53.

Column head	Description
MUT_ID....DNEclass	see variant annotations
Freq_cases	Frequency of the TP53 variant in cases.
Freq_controls	Frequency of the TP53 variant in controls.
Total_Cases	Number of patients included in the "case" group.
N_Cases	Number of cases found to carry the specific TP53 variant. Details on variant phenotype may be found in the dataset of germline variants if enough details is provided on patients and tumor type.
Total_Controls	Number of patients included in the "control" group.
N_Controls	Number of controls found to carry the specific TP53 variant.
DataSource	PubMed ID of the paper from which data have been extracted.
StudyDetails	Description of the case and control groups.

Functional Activities of Missense Variants

Data on the biological properties of p53 mutant proteins in functional assays performed in yeast or human cells, are provided in two datasets.

FunctionDownload_r20.csv

In this dataset, data were extracted from publications that report functional assessment of p53 mutant proteins in human or yeast cells, assessed either by transfection and overexpression of mutant proteins, or by assessment of endogenous mutants. Comparison between mutants requires caution since functional assays differ from one study to the other, in particular with respect to the expression vector (which influences the level of expression of the mutant protein), the p53-responsive elements (generic consensus sequence versus gene-specific response elements from WAF1, BAX or PIG3), and the recipient cells that have been used.

The functional properties of mutant proteins that are included in this dataset are:

transcriptional activities on various well-described p53-RE
dominant negative effects on the activities of wild-type p53
capacity to induce apoptosis, cell-cycle arrest or checkpoints in human cells
capacity to transactivate promoters that are not induced by wild-type p53
ability to promote cell growth and confer tumorigenicity
sensitivity to temperature changes regarding their ability to transactivate specific promoters

The functional results have been organised in 5 columns for (1) conserved wild-type properties, (2) complete or partial loss of wild-type properties, (3) dominant-negative effects, (4) gain of function and (5) temperature sensitivity. The cell system is indicated in two columns and a detailed reference to the published report is given.

Column head	Description
Function_ID	Unique identification number for each entry.
ProtDescription...Structural_motif	see variant annotations
Codon 72	Amino-acid at codon 72 of p53 (polymorphism)
Conserved WT function	Functional property of mutant that is similar to the activity of the wild-type protein. - Activities of mutant proteins in human or yeast cells: DNAb = DNA binding capacity tested by gel-shift or ChIP assay; TA = transactivation of a reporter gene under the control of a p53-response element (indicated in brakets, see a list of p53 Target Genes); TR = transrepression of a reporter gene under the control of a gene-specific response-element (name of gene indicated in brakets) TETR = capacity to form tetramers; x binding = interaction with protein x; drug sensitivity = conserved capacity to mediate cytotoxic effect of drug (specific drug used is indicated, see List of Abbreviations). - Activities of over-expressed mutant proteins in human cells: APO = induction of apoptosis; GS = growth suppression measured by colony forming assay (CFA), establishment of stable clones, or other proliferation assay; GA = cell cycle arrest measured by FACS; TUMOR- = inhibition of tumorigenicity in nude mice; up/downregulation = induction or repression of an endogenous GENE (in upper-case letters) or protein (in lowercase letters); HR repression = inhibition of Homologous Recombination; - Biological effect after over-expression in mouse or rat embryonic fibroblasts: TRANSF- = ability to counteract the transformation of primary cells induced by the co-transfection of ras or another transformant oncogene, such as HPV E7; "super" indicates that the activity of mutant protein is higher than the one of wtp53 (on transactivation, induction of apoptosis, DNA binding or growth suppression).
Loss of Function	WTp53 functional property that is lost by the mutant protein. Same annotations as in previous column, with partial" indicating that the loss of function is partial (residual activity).
Dominant negative activity	Inhibition of the wild-type protein by mutant proteins in transactivation or cell growth assays. - Yes = the mutant protein counteract the activity of the wild-type protein when the two proteins are co-expressed in human or yeast cells (the p53-response element or cell growth assay performed is indicated in brakets); - No = the mutant protein does not counteract the effects of the wild-type protein. "moderate" indicates that the mutant protein has a partial inhibiting effect on the wild-type protein.
Gain of Function	Functional properties displayed by the mutant but not by the wild-type protein. - Activities of over-expressed mutant proteins in human or yeast cells: same annotations as in column 9, plus: TUMOR+ = confer tumorigenic property (in nude mice) to transfected cells; p73 interference = ability to counteract p73 activity when both proteins are expressed in a cell system; Drug resistance = confer resistance to a cytotoxic drug (see List of Abbreviations); Growth advantage = increase growth rate. - Biological effect after over-expression in mouse or rat embryonic fibroblasts: TRANSF+ = ability to cooperate with ras or another transformant oncogene, such as HPV E7, in the transformation of primary cells. "moderate" indicates that the mutant protein has a partial effect on the activity studied; "no" indicates that the mutant protein has no effect on the activity studied.
Temperature sensitivity	Sensitivity of mutant to temperature changes in transactivation assays (the p53-RE is indicated in brackets), and in other experimental assays (specified in brackets). Yes = the activity of the mutant protein is affected by the temperature at which is preformed the test; mut_H = the protein is inactive (mutant) at higher temperatures; mut_L = the protein is inactive (mutant) at lower temperatures; No = the activity of the mutant protein is NOT affected by the temperature at which is preformed the test. Note that functional tests are performed at different temperature in yeast (30℃) and human (37℃) cells. (Read more on Temperature Sensitivity Annotations to learn about its detailed annotation rules.)
Temp_ref	Temperature at which experiments have been performed or which has been used as reference for temperature sensitivity assays.
Cell assay	Human = the activity of the mutant protein has been tested in human cells. Yeast = the activity of the mutant protein has been tested in the yeast.
cellLines	Name of cell-line(s) that have been used for testing mutant activities. "(endo)" indicates that activities have been tested on endogenous mutants.
Assay design	Indicates if the assay has been performed with or without wtp53 as control, or if activity has been tested on endogenous mutant.
Method	Details on type of experimental assay that was performed to assess function.
FRef_ID...PubMed	see reference annotations.

FunctionIshiokaDownload_r20.csv

The functional data that are included in this dataset were provided by Chikashi Ishioka and have been published in Kato et al., Kakudo et al., and Kawaguchi et al..

Column head	Description
ProtDescription...codon number	see variant annotations
WAF1nWT, MDM2nWT, BAXnWT,...	Promoter-specific transcriptional activity measured in yeast functional assays and expressed as percent of wild-type activity.
WAF1nWT_Saos2, MDM2nWT_Saos2,...	Promoter-specific transcriptional activity measured in the human cell-line Saos-2. Values are normalized with p53-null vector values and expressed as percent of wild-type activity.
SubG1nWT_Saos2	Induction of apoptosis by overexpression in Saos-2 cells expressed as percent of wild-type activity.
Oligomerisation_yeast	Capacity of mutant protein to form oligomer: TETR=can form tetramer, DIM=can form dimer but not tetramer, MON= can not oligomerarize.

TP53 status of Human Cell-Lines

CellLineDownload_r20.csv

This dataset includes cell-lines that have been screened for TP53 variant and have been published in the scientific literature, or in the Sanger cell-line database or the Broad Cancer cell-line Encyclopedia.

Column head	Description
Sample_ID	Unique sample identification number.
Sample_name	Name of the cell-line.
ATCC_ID	Identification number of the ATCC database.
Cosmic_ID	Link to sample cell-line data in the Cancer Cell Line Project of COSMIC databases of the Sanger Institute.
depmap_ID	Link to sample cell-line data in the depmap project.
Short_topo...Tumor_origin	see sample annotations
Add_info
Sex	Gender of the patient from whom the cell-line has been isolated.
Age	Age at cancer diagnosis of the patient from whom the cell-line has been isolated.
Country...Population	see patient annotations.
Germline_mutation	Germline variant in TP53 or any other gene carried by the individual from which the cell-line has been isolated.
Infectious_agent	Infectious agent (virus or bacteria) detected in the individual from which the cell-line has been isolated.
Tobacco	Smoking habit of the individual from which the cell-line has been isolated.
Alcohol	Drinking habit of the individual from which the cell-line has been isolated.
Exposure	Reported exposure of the individual from which the cell-line has been isolated.
KRAS_status	Status of KRAS gene. WT= wild-type; MUT=mutant (base change indicated in brackets)
Other_mutations	Name of other genes in which a variant has been identified.
TP53status	Status of TP53 gene. WT= wild-type gene sequence; MUT= mutated gene sequence; NULL= entire gene deletion; LOE= loss of gene expression without gene variant.
p53_IHC	p53 immuno-staining status.
p53_LOH	Loss of heterozygocity at p53 locus. Yes= LOH, No= no LOH, NI= non informative, NA= no information
MUT_ID...	TP53 variant description and functional properties, see variant and function annotations.
Ref_ID...	Same as tumor variant Ref_ID, see reference annotations.
Tissue_processing...	see method annotations.

Mouse Models with Engineered TP53

The dataset contains mouse models with engineered p53 that were compiled in the caMOD database or reported in the scientific literature. Data curated at caMOD were courteously provided by the caMOD team. Data reported in the literature but not compiled in caMOD were curated at IARC and a link to PubMed abstract is provided. For a detailed description of model genetics and phenotypes, please refer to caMOD and/or original publication

MouseModelView_r20.csv

Column head	Description
Model descriptor	Model name as indicated in caMOD or original publication.
Affected organs	List of organs affected or targeted by transgene.
AA change in human	Amino-acid substitution. Note that amino-acids are numbered according to the human sequence.
caMOD link	Model ID from caMOD database.
PubMed	PMID link to original publication.

Experimentally Induced Variants

This dataset contains list of variations in the human TP53 gene obtained from mutagenicity assays in the Hupki mouse model (MEF cells treated with the indicated carcinogen agent) or in a yeast assay. See original papers for detailed methods.

InducedMutationView_r20.csv

Column head	Description
MUT_ID	Unique ID for the variant, used across datasets.
Exposure	Agents to which were exposed the cells.
c_description	Variant described on the cDNA sequence. See variant annotations
g_description	Variant described at the genome level. See variant annotations
Model	Experimental assay/model used.
Clone_ID	ID of cell clone isolated from the exposed cell population.
Add info	Additional details provided on assay or cell clone as derived from original publication.
PubMed	PMID with link to PubMed abstract that describe the model.

Standardized Annotations

Tumor

Tumor samples are classified according to standards of the International Classification of Diseases for Oncology (ICD-O 3rd Edition, World Health Organization, Geneva, 2000) and SNOMED.

For information on tumor classification, grading and staging, check out ICD-O training at SEER and Cancer Information at NCI.

Column head	Description
Sample_source	Nature of the sample from which the variant has been identified: cell-line, surgery (surgical or autopsy specimen, including fresh samples and archival, pathology specimen), biopsy, xenograft, body fluid (blood, saliva, urine...).
Tumor_Origin	Origin of the tumor sample. Terms occurring in this column are: primary, secondary (second primary tumor in the same patient), metastasis (with the localisation of the metastasis in brackets), recurrent (tumor recurrence).
Topography	Site of the tumor defined by organ or group of organs, according to the ICD-O nomenclature. (examples: "colon", "brain", "bronchus and lung"). Note that some tumors are annoted "Head&Neck,NOS" or "Colorectum,NOS" because no detail is given in the original publication (NOS= not otherwise specified). For the database search tool, a short name is used in place of the ICD-O name (example: "Lung" for "bronchus and lung"). See a numerical list of topographies. For metastasis, the topography corresponds to the primary site of the tumor and the site of metastasis is indicated in brackets in the tumor_origin field.
Short_topography	For the database search tool, a short name is used in place of the ICD-O name (example: "Lung" for "bronchus and lung"). See a numerical list of topographies.
Topo_code	ICD-O code for topography.
Sub_topography	Precise identification of anatomic site, organ or tissue. The description given in the publication is translated to ICD-O nomenclature.
Morphology	Tumor type, including morphology and/or histologic type. The terminology used is based on ICD-O (2nd and 3rd editions) and SNOMED classifications. Terms have been added, such as 'normal tissue' or 'na'. See alphabetical list of morphologies.
Morpho_code	ICD-O or SNOMED codes for morphology.
Grade	Information on tumor grade, as given in the cited publication.
Stage	Information on tumor stage, as given in the cited publication.
TNM	TNM classification (Tumor size, Node status, Metastasis status) for staging.

Patient

Column head	Description
Sex	Sex of the patient (M for male, F for female).
Age	Age of the patient at the time of diagnosis.
Ethnicity	Ethnicity of the patient (when available). Groups are defined as: Asian, Black, Caucasian...
Country	Country/Region in which the patient was living at the time of surgery. When not otherwise specified in the original publication, the country corresponding to the address of the hospital is entered.
Population	Grouping by population. See the Country/Population Classification
Region	Grouping by region. See the Country/Region Classification
Development	Grouping by development status. See Country/Development Classification
Geo_area	City or region within the country of living of the patient. When not specified in the original publication, the city where the surgery has been done is entered.

Reference

The same references (same Ref_ID) are used for the tumor variant, prevalence and prognosis data sets. Independent references are used for the Function and Germline data sets.

Column head	Description
Ref_ID	Unique identification number for a reference.
Cross_Ref_ID	Ref_ID of a reference containing related data or additional information.
Title	Title of the publication.
Authors	List of authors.
Year	Year of publication.
Journal	Name of the journal (PubMed catalogue)
Volume	Volume number.
Start_page	First page number.
End_page	Last page of article.
PubMed_entry	PubMed identification number from NCBI.
Comment	Any relevant information
Exclude_analysis	Papers that we recommend to exclude from analysis because of dubious data quality (report several samples with multiple variants, and/or a high proportion of rare variants or variants classified as neutral or functional).
WGS_WXS	Whole genome or whole exome sequencing study.

Variant Detection Method

Column head	Description
Tissue_processing	Indicates if the sample analysed was fresh, fixed or frozen.
Start_material	Indicates if DNA or RNA was screened for variants.
Prescreening/Method	Prescreening method used to select sample to be sequenced: ‘SSCP’ for single strand polymorphism, ‘DGE’ for denaturant gel electrophoresis, ‘FASAY’ for yeast assay, ‘none’ if no prescreening was done, etc…
Material_sequenced	Indicates if the DNA or RNA was cloned or not (direct) before sequencing.
Exon2-11	Exons that have been screened for variant. In the downloaded file, "-1" or "TRUE" indicate that the exon has been screened and "0" or "FALSE" indicate that it has not been screened.

List of Abbreviations

Abbreviation	Description
5-aza-CdR	5-aza-2’-deoxycytidine
AILD	Angioimmunoblastic Lymphadenopathy with Dysproteinemia
ALL	Acute lymphoblastic leukemia
AML	Acute myeloid leukemia
APC	Adenomatous polyposis coli gene
B(a)P	Benzo(a)Pyrene
B(a)PDE	Benzo(a)Pyrene Diol Epoxide
BLEO	Bleomycin
CHEM	Chemotherapy
CIS	Carcinoma in situ
CDDP	Cisplatin
CML	Chronic Myeloid Leukemia
CMML	Chronic Myelomonocytic Leukemia
COSMIC	Catalog Of Somatic Mutations in Cancer
CPT	Camptothecin
ddF	Dideoxyfingerprinting
DFI	Disease-free interval
DGE	Denaturant gel electrophoresis
DNE	Dominant-negative effect
Dif.	Differentiated
DBM	DNA-binding motif
DOX	Doxorubicin
Duke’s	Classification for colon cancer
EBV	Epstein-Barr Virus
ER	Estrogen Receptor
ETO	Etoposide
FAB	FAB classification for ALL
FAP	Familial adenomatous polyposis
FGO	Female Genital Organs
FH	Family history
FIGO	Classification for gynecological cancers
GEM	Gemcitabine
H’sD, HD	Hodgkin’s disease
HBV	Hepatitis B virus
HCC	Hepatocellular carcinoma
HCV	Hepatitis C virus
HNPCC	Hereditary Nonpolyposis Colorectal Cancer
HPV	Human papilloma virus
HNSCC	Head and neck squamous cell carcinoma
ICD-O 3rd	International classification of disease for oncology
IHC	Immunohistochemistry
KAT	Potassium Antimony Tartrate
LFL	Li-Fraumeni like syndrome
LFS	Li-Fraumeni syndrome
LN	Lymph node
LOF	Loss of function
MTX	Methotrexate
MDE	Hydrolink variant detection enhancement
MDS	Myelodysplastic syndrome
MGO	Male Genital Organs
Moderately dif.	Moderately differentiated
MMR	MisMatch Repair
MNNG	N-methyl-N-nitro-N-nitrosoguanidine
MSI	Microsatellite instable
MSS	Microsatellite stable
Mx	Presence of distant metastasis according to the TNM classification
NA	Not applicable/Not available
ND	Not done
NDBL	Non-DNA-binding loop
neg.	Negative
NES	Nuclear Export Signal
NIM	Nimustine
NLS	Nuclear Localization Signal
NHL	Non-Hodgkin’s lymphoma
NOS	Not otherwise specified
NOS2	Nitric oxide synthase 2
NSCLC	Non-small cell lung cancer
Nx	Extent of regional LN metastasis according to the TNM classification
OXA	Oxaliplatin
PJS	Peutz-Jeghers Syndrome
Poorly dif.	Poorly differentiated
pos.	Positive
PR	Progesterone Receptor
RA, RAEB, RAS	Classification for Myelodysplastic syndrome
RAD	Radiotherapy
RFLP	Restriction fragment length polymorphism
SCLC	Small cell lung cancer
SSCP	Single strand conformation polymorphism
TAX	Taxol
TGGE	Temperature gradient gel electrophoresis
TOP	Topotecan
Tx	Extent of primary tumor according to the TNM classification
TxNxMx	TNM classification (tumor stage, presence or absence of LN metastasis, presence or absence of distant metastasis)
Undif.	Undifferentiated
VIN	Vinblastine
VINC	Vincristine
VM26	Teniposide
Well dif.	Well differentiated
WGS	Whole genome screen
XP	Xeroderma pigmentosum
Yeast	Yeast assay
yo	Years old

Dataset Exploration Options

Graphs and Search Options

Search Options
- Functional / Structural Data. This option allows the functional and structural analysis of all possible single nucleotide substitutions in TP53 exonic sequences (including those that have never been reported in cancer). In addition, all other types of variations that have been reported in human samples and validated polymorphisms are included in this dataset. Functional and structural annotations and frequency statistics for these gene variations can be retrieved with this search option. Each dataset entry corresponds to a unique gene variation.
- Tumor variants. This option allows the retrieval and analysis of TP53 variants reported as somatic events in tumor samples and cell-lines. Each dataset entry corresponds to a variant identified in a human sample.
- Tumor variant prevalence. This option allows the analysis of the prevalence of TP53 tumor variants by cancer type and population groups. Each dataset entry corresponds to the prevalence of TP53 variant for a specific type of cancer in a defined human population.
- Germline variants. This option allows the retrieval and analysis of TP53 variants reported as germline events in human individuals. Each dataset entry corresponds to a tumor identified in an individual carrier of a TP53 germline variant. The searchable dataset only includes cancer-affected individuals who are confirmed or obligatory carrier of a TP53 variant (data on non-affected carriers or non-confirmed carrier can be retrieved by downloading the full dataset with the 'data downloads' option).
- Germline variant prevalence. This table lists diferent studies reporting the prevalence of TP53 germline variant in selected groups of individuals.
- Cell-lines. This option allows the retrieval and analysis of TP53 variants reported in human cell-lines. Each dataset entry corresponds to a variant identified in a cell-line.
- Mouse models. This option allows the display or download of the description of mouse models with engineered p53 that are compiled in the caMOD database or reported in the scientific literature. Links to caMOD database are available for further details on the model phenotypes.
Variant Distribution Graphs
- Variant type. Proportion of variations classified by their nature (base change, insertions, deletions....): number of variations of each class divided by the total number of variants selected (% is shown).
- Codon distribution. Proportion of exonic point variants at each codon position: number of variations at each codon position divided by the total number of exonic variants selected (% is shown).
- Exon/intron distribution. Proportion of variations in each exon/intron: number of variations within each Exon/intron divided by the total number of variations selected (% is shown).
- 3D JMOL graph. Residues (within the central domain of p53 protein -codons 96 to 289) are highlighted according to the proportion of exonic variants at this position (start site of variant) among all selected variants: number of variations at each codon position divided by the total number of exonic variants selected: red colored are the most frequently mutated, yellow colored the less frequently mutated, orange are intermediate.
- Variant effect. Proportion of variations classified according to their predicted effect on protein sequence (missense, nonsense, frameshift ins/del, …): number of variations of each class divided by the total number of variants selected (% is shown).
- Point variant. Proportion of single amino-acid substitutions classified according to their predicted effect on protein sequence (missense, nonsense, silent): number of variations of each class divided by the total number of point variants selected (% is shown).
- Point variant scatter-plot. Each dot represent a specific point variant, colored according to their predicted effect on protein sequence (missense in blue, nonsense in red and silent in green); the x axis shows the proportion of the specific variant in the selected dataset (% of total point variants in the selected dataset); the Y axis shows the predicted variant rate for the particular point variant (see variant annotations).
- SIFT. Proportion of missense variants classified according to their predicted deleterious/damaging or neutral/tolerated effect based on SIFT algorithm: number of variations of each class divided by the total number of missense variants selected (% is shown).
- SIFT scatter-plot. Each dot represent a specific point variant, colored according to their predicted deleterious/damaging or neutral/tolerated effect based on SIFT algorithm; the x axis shows the proportion of the specific variant in the selected dataset (% of total point variants in the selected dataset); the Y axis shows the predicted variant rate for the particular point variant (see variant annotations).
- Transactivation. Proportion of missense variants classified according to their experimentally measured transactivation activities (based on FASAY): number of variations of each class divided by the total number of missense variants selected (% is shown).
- Transactivation scatter-plot. Each dot represent a specific point variant, colored according to their experimentally measured transactivation activities; the x axis shows the proportion of the specific variant in the selected dataset (% of total point variants in the selected dataset); the Y axis shows the predicted variant rate for the particular point variant (see variant annotations).
Tumor Distribution Graphs
- Germline data. Distribution of tumor sites associated with the selected variants; number of tumors classified by tumor site divided by total number of tumors observed in individuals carriers of the selected variants (% is shown).
- Tumor variant data. Distribution of tumor sites associated with the selected variants; number of variations classified by tumor site divided by total number of variations observed in tumors carrying the selected variants (% is shown).
- Gene variation data, tumor variant graph. Proportion of the selected variants among all variants reported in the database by tumor sites; number of selected variants classified by tumor site divided by total number of variations in the database for each tumor sites (% is shown).
- Gene variation data, germline graph. Distribution of tumor sites associated with the selected variants; number of tumors classified by tumor site divided by total number of tumors observed in individuals carriers of the selected variants (% is shown).
- Variant prevalence. Proportion of mutated samples by cancer site (topography graph), cancer type (morphology graph), or by country of origin of the patients (country graph); number of mutated samples divided by total number of samples analyzed (% is shown).

Additional Information

TP53 reference sequences used in the database: TP53 gene and p53 protein sequences

Due to lack of fully described information across data sources, some search results may appear inconsistent. See below for some examples.

Dealing with variants screened from RNA

All TP53 variants are annotated in the database at the genomic level. For variants identified from RNA screening, annotations may not be accurate. For example, a variant described as a deletion of exon 5 at RNA level might in fact be a point variant located in a splice site at the genomic level (inducing skipping of an exon). It is of note that this concerns only a small fraction of the data included in the database. You may exclude studies that have screened RNA by using the 'Start material' control.

Descriptions of deletions, insertions, and other complex variants

The exact location of deletions, insertions and complex TP53 variants are often poorly described in original reports (often reported at the codon but not genomic level). Annotations for these variants are thus not precise since we annotate variants at the genomic level. For example, if a deletion is described as a deletion of one nucleotide at codon 158, it is entered in the database as deletion of the first nucleotide of codon 158 while it may in fact be the second or third nucleotide that is actually deleted. This information is thus only reliable at the codon level.

Some variants present in the tumor variant dataset may not be retrieved with the 'Functional/Structural Data' search option

There may be two reasons for this:

Variants retrieved with this option only include gene variation that are fully described, while in the dataset of tumor variants some variants are not fully described.
Tumor variants may be reported in individuals with different SNP status. If a variant is close to a SNP, it may have a different impact on the protein sequence depending on the SNP status. For example, the variant c.637C>T on the first base of codon 213, will result in a p.R213X change in the protein sequence if the SNP present on the third base of the codon is an A (CGA>TGA), while it will result in a p.R213W change if the SNP present on the third base of the codon is a G (CGG>TGG). Variants not described from the reference sequence are included in the tumor variant dataset, but not in the Gene variation dataset.

Some variant numbers retrieved from the variant prevalence and variant spectrum datasets may be different

Data in the prevalence dataset are "independent" from data included in the tumor variants dataset. Numbers differ for the following reasons:

Because we retrieve data from papers, and in many papers the only information that can be extracted is the total number of samples analyzed and total number of samples mutated (variants are not described in details), variants cannot be included in the tumor variant dataset. Thus, numbers in the prevalence table do not match numbers in the tumor variant dataset (variant spectrum).
Numbers by histology may also differ, as for example, a paper may contain variant details for lung ADC, SCC, LCC (which are all non-small cell lung cancers), but total numbers of samples analyzed for each histology is not available. In this case, variants corresponding to ADC, SCC and LCC will be entered in the tumor variant dataset but in the prevalence table the prevalence will be indicated only for non-small cell carcinoma (group that includes the 3 tumor types).
Cell-lines are not included in the prevalence count.
Samples with more than one TP53 variant are counted once in the prevalence table while all variants are entered in the tumor variant dataset.
The prevalence may be missing for some papers that describe variants included in the tumor variant dataset. The prevalence dataset has been added in a recent version of the database (2001 while the database started in 1994) and not all papers have been reviewed. The non-reviewed papers correspond mainly to publications that describe less than 10 variants (about 400 papers). For some papers, the prevalence could not be retrieved from the information provided in the publication (about 100 papers).

Contact

For specific questions on The TP53 Database and interpretation of search results, please email:
tp53-info@isb-cgc.org
For general questions on the ISB-CGC website platforms, please email:
feedback@isb-cgc.org

When addressing an issue, it is important that you supply us with detailed information.
Provide the type of browser you are using and the steps to recreate the issue.