UniProtKB/Swiss-Prot is a manually annotated protein knowledgebase established in 1986 and maintained since 2003 by the UniProt Consortium, a collaboration between the Swiss Institute of Bioinformatics (SIB) and the Department of Bioinformatics and Structural Biology of the Geneva University, the European Bioinformatics Institute (EBI) and the Georgetown University Medical Center's Protein Information Resource (PIR).
UniProtKB/Swiss-Prot, together with UniProtKB/TrEMBL, its computer-annotated supplement, constitutes the UniProt Knowledgebase (UniProtKB), a major project of the UniProt consortium. UniProtKB/Swiss-Prot and UniProtKB/TrEMBL give access to all the publicly available protein sequences.
The UniProt Knowledgebase consists of sequence entries. Sequence entries are composed of different line-types, each with their own format. For standardization purposes the format of the UniProt Knowledgebase follows as closely as possible that of the EMBL Nucleotide Sequence Database.
The UniProtKB/Swiss-Prot database distinguishes itself from other protein sequence databases by three
distinct criteria:
Annotation
Data integrated into UniProtKB/Swiss-Prot, including the protein sequence and current
knowledge on each protein, are manually checked and continuously updated.
Each UniProtKB/Swiss-Prot entry contains core data (sequence data; bibliographical references
and taxonomic data (description of the biological source of the protein)) and annotation, which consists of the description of the following items:
Function(s) of the protein
Post-translational modification(s). For example carbohydrates, phosphorylation,
acetylation, GPI-anchor, etc.
Domains and sites. For example calcium binding regions, ATP-binding sites, zinc fingers,
homeobox, kringle, etc.
Secondary structure
Quaternary structure. For example homodimer, heterotrimer, etc.
Similarities to other proteins
Disease(s) associated with deficiencie(s) in the protein
Sequence conflicts, variants, etc.
A special emphasis is laid on the annotation of biological events which generate protein diversity that cannot be predicted at the genomic level. Alternative products (alternative splicing), RNA editing and post-translational modifications (PTMs) are extensively annotated. For additional information, see Boeckmann et al., C.R.Biol. (2005) [16286078].
Our main sources of data are scientific publications, that report new sequence data, and/or review articles to periodically update the annotations of families or groups of proteins. We also make use of external experts, who have been recruited to send us their comments and updates concerning specific groups of proteins.
The annotation is mainly found in the comment lines (CC),
in the feature table (FT) and in the keyword lines (KW).
Most comments are classified by `topics'; this approach permits the easy retrieval of specific categories of data from the database.
Minimal redundancy
In order to have minimal redundancy and to improve sequence reliability, all protein sequences encoded by a same gene are merged into a single UniProtKB/Swiss-Prot entry. Differences found between various sequencing reports are analysed and fully described in the feature table (alternative splicing events, polymorphisms or conflicts for example).
Integration with other databases
Detailed expertise that goes behond the scope of UniProtKB/Swiss-Prot is made available via cross-references to specialised data collections such as EMBL/GenBank/DDBJ nucleotide sequence databases, 3D structure database (PDB), various protein domain and family characterisation databases etc.
UniProtKB/Swiss-Prot is currently cross-referenced with about 60 different databases. Cross-references indicated in the DR lines are used to provide 'explicit' links to many databases; additionally, 'implicit' links are created on the fly by the ExPASy server.