Information concerning UniProtKB/Swiss-Prot human variation pages

Table of contents

   Gene symbol(s)
   Feature Identifier (FTId)
   Residue change
   Blosum matrix score
   Status
   Disease
   Polymorphism
   Comment
   Allele(s)
   Location on the sequence
   Protein features in neighbourhood
   Residue conservation
   3D structure
   Protein features in structural neighbourhood
   Interface(s) involvment
   Surface accessibility
   Physico-chemical properties
   3D homology models
   Reference

Gene symbol(s)

The "Gene symbol(s)" line contains the name(s) of the gene(s) that code for the stored protein sequence.
It often occurs that more than one gene name has been assigned to an individual locus. The Official name given is assumed to be the most correct (or most current) designation. It is usually the gene symbol attributed by the HUGO Gene Nomenclature Committee (HGNC). All the other names are listed as Synonyms.

Swiss-Prot user manual on gene name

Feature Identifier (FTId)

The "FTId" line contains a unique and stable feature identifier (FTId). The format of the FTId specific for a variant is: VAR_number
where the number is a 6-digit number.

Residue change

The "Residue change" line indicates the amino acid change of the variant. The one-letter and three-letter codes for amino acids used in Swiss-Prot are those adopted by the commission on Biochemical Nomenclature of the IUPAC-IUB. The change is described for example as:
From Gln (Q) to His (H), Q94H
where Gln (Q) is the reference residue, and His (H) is the mutant residue. The number, 94 in this example, indicates the position of the variant.

Blosum score

The Blosum score line indicates the score within a Blosum matrix for the corresponding wild-type to variant amino acid change. The log-odds score measures the logarithm for the ratio of the likelihood of two amino acids appearing by chance. The Blosum62 substitution matrix is used. This substitution matrix contains scores for all possible exchanges of one amino acid with another.
Lowest score: -4 (low probability of substitution), highest score: 11 (high probability of substitution)
Information on Blosum matrix

Status

The "Status" line gives a classification of the variant. Swiss-Prot systematically classifies the variant into three categories: "Disease", "Polymorphism" and "Unclassified".

Disease

The "Disease" line gives the name of the disease associated with the variant. For more information about the disease, the user can refer to the OMIM link provided in the Cross-reference section of the page.

Polymorphism

The "Polymorphism" line gives additional information on the polymorphism described.

Comment

The "Comment" line contains free text comments on the variant. It is used to convey any additional useful information about the variant.

Allele(s)

The "Allele(s)" line contains a list of allele(s) on which the variant can be found. The whole sequence of the allele can be visualized through the hyperlink provided.

Location on the sequence

The "Location on the sequence" line shows the position of the residue change on the sequence. Unless the variant is located at the beginning or at the end of the protein sequence, both residues upstream (20) and downstream (20) of the variant will be shown.

Protein features in neighbourhood

The "Protein features in neighborhood" lines describe regions or sites of interest surrounding the variant. In general the features listed are posttranslational modifications, binding sites, enzyme active sites, local secondary structure or other characteristics reported in the cited references.
The Protein features in neighbourhood lines have a fixed format. The columns contain the Key name, the 'From' endpoint, the 'To' endpoint, and the description of the feature.

The key name is a fixed abbreviation (of up to 8 characters) with a defined meaning.

The 'From' and 'To' endpoint specifications designate (inclusively) the endpoints of the feature named in the key field. In general, these fields simply contain residue numbers which indicate positions in the sequence as listed. Note that these positions are always specified assuming a numbering of the listed sequence from 1 to n; this numbering is not necessarily the same as that used in the original reference(s). The following should be noted:
The description part contains additional information about the feature.

Residue conservation

The "Residue conservation" lines contains information on the conservation score of the residue. The score was calculated using orthologous sequences from the Orthologs Matrix Project (OMA) project (1). The computation involves several steps:
NB: The diversity of the alignment indicates the information content of the alignment, i.e. it measures how 'different' are the sequences used in the alignment. In general, the more diverse the sequences used in the alignment, the better is the alignment.

The line indicates the diversity of the alignment, the number of the sequences aligned, and the conservation score of the position of the variant. A hyperlink is provided that leads the users to a page showing the entire alignment where all the columns are colored according to the degree of conservation. If a representative 3D structure exists for the variant, the conservation score is mapped onto each residue of the structure.

Conservation Score

References:
1) Schneider, A. (2007) Bioinformatics. 23(16): 2180-2182.
2) Katoh, K.; Misawa, K.; Kuma, K.; Miyata, T. (2002) Nucleic Acids Res. 30(15): 3059-3066.
3) Valdar, W.S. (2002) Proteins. 48(2): 227-241.

If no OMA group is identified and no conservation score can be computed, the "Residue conservation" line contains a hyperlink to a sequence alignment obtained by a BLAST search. A fragment of the reference protein sequence (40 residues) surrounding the variant position was used as query to search against the Swiss-Prot database using the blastp program.

Additional parameters used were: Up to 20 highest scoring matches are displayed. The result thus gives an idea of the degree of conservation of the variant among species. At the variant position, residues that are the same as the reference residue are highlighted in red, whereas residues that are different from the reference residue are highlighted in green. Residues with a negative score in the PAM-70 substitution matrix are in lower case.
Users can perform their own blast search using the ExPASy BLAST interface. For more information, please consult this page.

Physico-chemical properties

The "Physico-chemical properties" line indicates the physico-chemical property of the reference and variant residues and the change implicated. The physico-chemical property of each amino acid is listed in the table below.

Residue Symbol Size Type
GlyGSmall-
AlaASmallHydrophobic
ValVMediumHydrophobic
IleIMediumHydrophobic
LeuLMediumHydrophobic
ProPMediumHydrophobic
ThrTMediumPolar
SerSSmallPolar
MetMMediumHydrophobic
CysCMediumPolar
GlnQMediumPolar
AsnNMediumPolar
HisHMediumPolar
LysKLargeBasic
ArgRLargeBasic
AspDMediumAcidic
GluEMediumAcidic
PheFLargeAromatic
TyrYLargeAromatic
TrpWLargeAromatic

References:
1) Thomas E. Creighton (1993) "Proteins." W.H. Freeman and Company, New York. 2nd Edition.
2) Richards, F.M. (1974) J. Mol.Biol. 82:1-14. [Van-der-Waals radii of amino acids]

3D structure(s)

The "3D structure(s)" line indicates the best representative 3D structure(s) for a given variant. This information is available only when an experimentally resolved 3D structure exists. A schematic view provides a linear view of the UniProt sequence, the part for which the 3D structure is revolved, and the position of the variant on the sequence. The corresponding PDB entry can be reached by clicking on the image of the structure.

Protein features in structural neigbhourhood

The "Protein features in structural neighbourhood" line contains a link to a page where the structural local environment of the position of the variant is shown in 3D. This link is provided only if a best representative 3D structure exists for the variant. The page shows all the residues that are localized in a given radius around the variant in a given PDB chain. This radius varies between 3 (the default value) to 6 angstroms and can be chosen by the user. The possibility to visualize the whole PDB chain to which the variant belongs is also provided. The main functionality of the page resides in the viewing of all Swiss-Prot features involved in the environment for a given radius. The mapping of the Swiss-Prot features onto 3D structures was performed using SSMap (David, F.P.A. and Yip, Y.L. submitted).

A residue in the environment is colored in blue when it represents a Swiss-Prot feature. Otherwise, the residues are in red, except for the variant, which is shown in green.

Interface(s) involvment

The "Interface(s) involvement" line indicates if the wild type residue is involved in a protein-protein interface. The line contains a link to a page where the chain-chain interface(s) is shown in 3D. This link is provided only if a best representative 3D structure exists for the variant and if this structure contains multiple chains. In this page, PDB chains involved in the interaction are present. These chains can belong to the same protein or belong to different proteins, depending on the PDB entry chosen for the display. All the residues of a chain that interact with another chain are shown in red. The user can choose which interface to be shown, as well as the method employed (carbon alpha or Van der Waals*). Similar to the display of structural local environment, the mapping of the residues onto the Swiss-Prot sequence can be viewed. And all the chains involved can be viewed in order to have a global 3D view of the context. Finally, this page indicates whether a variant is implicated or not in the interface.

* We consider that a residue is involved in the interface if one of its atoms is located within a distance r of an atom of a residue present in another protein chain. In the "carbon alpha" method, we only consider the atom carbon alpha of the residue and the distance r is set to 6 Å. In the "Van der Waal" method, all atoms are taken into consideration, and the distance r is set to 4.5 Å.

Surface accessibility

The "Surface accessibility" line indicates if the wild type residue is surface accessible or buried. Surface accessibility is calculated using the MSMS program. When accessible to surface, the solvent-accessible surface area (SAS) of the residue is also indicated.

Reference:
Sanner, M.F., Olson, A.J. & Spehner, J.C. (1996). Biopolymers, 38:305-320.

3D homology models

The "3D homology models" line gives access to available protein homology model(s) showing the location of the variant on 3D structure. The models were constructed using PromodII, the core program of SWISS-MODEL (Guex, N and Peitsch, M.C. Electrophoresis 18_2714-2723, 1997).

Protein homology models were constructed only for proteins that have a suitable structural template deposited in the Protein Data Bank (PDB). The sequence identity between the Swiss-Prot protein sequence and the PDB template is at least 70%. In addition, only crystal structures with better than 2.5 A resolution are selected as templates. In cases where there are several suitable templates, an additional selection step will be performed to select only templates that are significantly different from each other, i.e. they display a root mean square deviation (rmsd) of more than 1.5 A.

The template codes are constructed according to the following rule:
PDBCODE+ChainID
Examples: The chain A of the protein structure 1CPC will be coded 1CPCA

For further information about the principle of homology modelling or the construction of these models, please consult the online course or contact us.

Two methods of visualisation are available:

Reference

Yip Y.L., Scheib H., Diemand A.V., Gattiker A., Famiglietti L.M., Gasteiger E., Bairoch A.
The Swiss-Prot Variant Page and the ModSNP Database: A Resource for Sequence and Structure information on Human Protein Variants
Hum. Mutat. 23:464-470(2004).

Full text