Information concerning UniProtKB/Swiss-Prot human variation pages
-
- Gene symbol(s)
- Feature Identifier (FTId)
- Residue change
- Blosum matrix score
- Status
- Disease
- Polymorphism
- Comment
- Allele(s)
- Location on the sequence
- Protein features in neighbourhood
- Residue conservation
- 3D structure
- Protein features in structural neighbourhood
- Interface(s) involvment
- Surface accessibility
- Physico-chemical properties
- 3D homology models
- Reference
The "Gene symbol(s)" line contains the name(s) of the gene(s) that code for the stored protein sequence.
It often occurs that more than one gene name has been assigned to an individual locus. The Official name given is assumed to be the most correct (or most current) designation. It is usually the gene symbol attributed by the HUGO Gene Nomenclature Committee (HGNC). All the other names are listed as Synonyms.
Swiss-Prot user manual on gene name
|
Feature Identifier (FTId)
|
The "FTId" line contains a unique and stable feature identifier (FTId). The format of the FTId specific for a variant is: VAR_number
where the number is a 6-digit number.
The "Residue change" line indicates the amino acid change of the variant. The one-letter and three-letter codes for amino acids used in Swiss-Prot are those adopted by the commission on Biochemical Nomenclature of the IUPAC-IUB. The change is described for example as:
From Gln (Q) to His (H), Q94H
where Gln (Q) is the reference residue, and His (H) is the mutant residue. The number, 94 in this example, indicates the position of the variant.
The Blosum score line indicates the score within a Blosum matrix for the corresponding wild-type to variant amino acid change. The log-odds score measures the logarithm for the ratio of the likelihood of two amino acids appearing by chance. The Blosum62 substitution
matrix is used. This substitution matrix contains scores for all possible exchanges of one amino acid with another.
Lowest score: -4 (low probability of substitution), highest score: 11 (high probability of substitution)
Information on Blosum matrix
The "Status" line gives a classification of the variant. Swiss-Prot systematically classifies
the variant into three categories: "Disease", "Polymorphism" and "Unclassified".
- Disease: A variant is classified as "Disease" when it is found in patients and disease-association is reported in literature. However, this classification is not a definitive assessment of pathogenicity;
- Polymorphism: A variant is classified as "Polymorphism" if no disease-association has been reported;
- Unclassified: A variant is "unclassified" if disease-association remains unclear.
The "Disease" line gives the name of the disease associated with the variant. For more information about the disease, the user can refer to the OMIM link provided in the Cross-reference section of the page.
The "Polymorphism" line gives additional information on the polymorphism described.
The "Comment" line contains free text comments on the variant. It is used to convey any additional useful information about the variant.
The "Allele(s)" line contains a list of allele(s) on which the variant can be found. The whole sequence of the allele can be visualized through the hyperlink provided.
The "Location on the sequence" line shows the position of the residue change on the sequence. Unless the variant is located at the beginning or at the end of the protein sequence, both residues upstream (20) and downstream (20) of the variant will be shown.
|
Protein features in neighbourhood
|
The "Protein features in neighborhood" lines describe regions or sites of interest surrounding the variant. In general the features listed are posttranslational modifications, binding sites, enzyme active sites, local secondary structure or other characteristics reported in the cited references.
The Protein features in neighbourhood lines have a fixed format. The columns contain the Key name, the 'From' endpoint, the 'To' endpoint, and the description of the feature.
The key name is a fixed abbreviation (of up to 8 characters) with a defined meaning.
The 'From' and 'To' endpoint specifications designate (inclusively) the endpoints of the feature named in the key field. In general, these fields simply contain residue numbers which indicate positions in the sequence as listed. Note that these positions are always specified assuming a numbering of the listed sequence from 1 to n; this numbering is not necessarily the same as that used in the original reference(s). The following should be noted:
- If the 'From' and 'To' specifications are identical, the feature
involves one single amino acid;
- When a feature is known to extend beyond the end(s) of the sequenced region, the
endpoint specification will be preceded by '<' for features which continue to the left
end (N-terminal direction) or by '>' for features which continue to the right end (C-
terminal direction);
The description part contains additional information about the feature.
The "Residue conservation" lines contains information on the conservation score of the residue. The score was calculated using orthologous sequences from the Orthologs Matrix Project (OMA) project (1). The computation involves several steps:
- Identify to which OMA group the UniProt sequence belongs;
- Perform multiple sequences alignment of all the sequences belonging to the OMA group identified above using MAFFT alignment program (2)
- Compute the diversity of the alignment as well as the conservation score of each residue (or position) of the UniProt sequence using the program SCORECONS (3)
NB: The diversity of the alignment indicates the information content of the alignment, i.e. it measures how 'different' are the sequences used in the alignment. In general, the more diverse the sequences used in the alignment, the better is the alignment.
The line indicates the diversity of the alignment, the number of the sequences aligned, and the conservation score of the position of the variant. A hyperlink is provided that leads the users to a page showing the entire alignment where all the columns are colored according to the degree of conservation. If a representative 3D structure exists for the variant, the conservation score is mapped onto each residue of the structure.
References:
1) Schneider, A. (2007) Bioinformatics. 23(16): 2180-2182.
2) Katoh, K.; Misawa, K.; Kuma, K.; Miyata, T. (2002) Nucleic Acids Res. 30(15): 3059-3066.
3) Valdar, W.S. (2002) Proteins. 48(2): 227-241.
If no OMA group is identified and no conservation score can be computed, the "Residue conservation" line contains a hyperlink to a sequence alignment obtained by a BLAST search. A fragment of the reference protein sequence (40 residues) surrounding the variant position was used as query to search against the Swiss-Prot database using the blastp program.
Additional parameters used were:
- Comparison matrix: PAM-70
- E threshold: 1e-10
Up to 20 highest scoring matches are displayed. The result thus gives an idea of the degree of conservation of the variant among species. At the variant position, residues that are the same as the reference residue are highlighted in red, whereas residues that are different from the reference residue are highlighted in green. Residues with a negative score in the PAM-70 substitution matrix are in lower case.
Users can perform their own blast search using the ExPASy BLAST interface. For more information, please consult this page.
|
Physico-chemical properties
|
The "Physico-chemical properties" line indicates the physico-chemical property of the reference and variant residues and the change implicated. The physico-chemical property of each amino acid is listed in the table below.
-
| Residue |
Symbol |
Size |
Type |
| Gly | G | Small | - |
| Ala | A | Small | Hydrophobic |
| Val | V | Medium | Hydrophobic |
| Ile | I | Medium | Hydrophobic |
| Leu | L | Medium | Hydrophobic |
| Pro | P | Medium | Hydrophobic |
| Thr | T | Medium | Polar |
| Ser | S | Small | Polar |
| Met | M | Medium | Hydrophobic |
| Cys | C | Medium | Polar |
| Gln | Q | Medium | Polar |
| Asn | N | Medium | Polar |
| His | H | Medium | Polar |
| Lys | K | Large | Basic |
| Arg | R | Large | Basic |
| Asp | D | Medium | Acidic |
| Glu | E | Medium | Acidic |
| Phe | F | Large | Aromatic |
| Tyr | Y | Large | Aromatic |
| Trp | W | Large | Aromatic |
References:
1) Thomas E. Creighton (1993) "Proteins." W.H. Freeman and Company, New York. 2nd Edition.
2) Richards, F.M. (1974) J. Mol.Biol. 82:1-14. [Van-der-Waals radii of amino acids]
The "3D structure(s)" line indicates the best representative 3D structure(s) for a given variant. This information is available only when an experimentally resolved 3D structure exists. A schematic view provides a linear view of the UniProt sequence, the part for which the 3D structure is revolved, and the position of the variant on the sequence. The corresponding PDB entry can be reached by clicking on the image of the structure.
|
Protein features in structural neigbhourhood
|
The "Protein features in structural neighbourhood" line contains a link to a page where the structural local environment of the position
of the variant is shown in 3D. This link is provided only if a best representative 3D structure exists for the variant. The page shows all the
residues that are localized in a given radius around the variant in a given PDB chain. This radius varies between 3 (the default value) to 6 angstroms and can be chosen by the user. The possibility to visualize the whole PDB chain to which the variant belongs is also provided. The main functionality of the page resides in the viewing of all Swiss-Prot features involved in the environment for a given radius. The mapping of the Swiss-Prot features onto 3D structures was performed using SSMap (David, F.P.A. and Yip, Y.L. submitted).
A residue in the environment is colored in blue when it represents a Swiss-Prot feature. Otherwise, the residues are in red, except for
the variant, which is shown in green.
The "Interface(s) involvement" line indicates if the wild type residue is involved in a protein-protein interface. The line contains a link to a page where the chain-chain interface(s) is shown in 3D. This link is provided only if a best representative 3D structure exists for the variant and if this structure contains multiple chains. In this page, PDB chains involved in the interaction are present. These chains can belong to the same protein or belong to different proteins, depending on the PDB entry chosen for the display. All the residues of a chain that interact with another chain are shown in red. The user can choose which interface to be shown, as well as the method employed (carbon alpha or Van der Waals*). Similar to the display of structural local environment, the mapping of the residues onto the Swiss-Prot sequence can be viewed. And all the chains involved can be viewed in order to have a global 3D view of the context. Finally, this page indicates whether a variant is implicated or not in the interface.
* We consider that a residue is involved in the interface if one of its atoms is located within a distance r of an atom of a residue present in another protein chain. In the "carbon alpha" method, we only consider the atom carbon alpha of the residue and the distance r is set to 6 Å. In the "Van der Waal" method, all atoms are taken into consideration, and the distance r is set to 4.5 Å.
The "Surface accessibility" line indicates if the wild type residue is surface accessible or buried. Surface accessibility is calculated using the MSMS program. When accessible to surface, the solvent-accessible surface area (SAS) of the residue is also indicated.
Reference:
Sanner, M.F., Olson, A.J. & Spehner, J.C. (1996). Biopolymers, 38:305-320.
The "3D homology models" line gives access to available protein homology model(s) showing the location of the variant on 3D structure. The models were constructed using PromodII, the core program of SWISS-MODEL (Guex, N and Peitsch, M.C. Electrophoresis 18_2714-2723, 1997).
Protein homology models were constructed only for proteins that have a suitable structural template deposited in the Protein Data Bank (PDB). The sequence identity between the Swiss-Prot protein sequence and the PDB template is at least 70%. In addition, only crystal structures with better than 2.5 A resolution are selected as templates. In cases where there are several suitable templates, an additional selection step will be performed to select only templates that are significantly different from each other, i.e. they display a root mean square deviation (rmsd) of more than 1.5 A.
The template codes are constructed according to the following rule:
PDBCODE+ChainID
Examples: The chain A of the protein structure 1CPC will be coded 1CPCA
For further information about the principle of homology modelling or the construction of these models, please consult the online course or contact us.
Two methods of visualisation are available:
- ExPASy: It provides a view of the wild-type and the variant structures. To have a more interactive display, users need to install chime. Otherwise, a static view can be obtained by choosing [statics image].
- AstexViewer: A program to allow an interactive display of both wild-type and the variant structures via a java applet. Users can choose to view the whole structure [Groups->FtID] or to centre on the changed residue [Groups->centre]. Please consult the documentation for the conditions and use of AstexViewer.
Yip Y.L., Scheib H., Diemand A.V., Gattiker A., Famiglietti L.M., Gasteiger E., Bairoch A.
The Swiss-Prot Variant Page and the ModSNP Database: A Resource for Sequence and Structure information on Human Protein Variants
Hum. Mutat. 23:464-470(2004).
Full text