ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by br flag LNCC Brazil Mirror sites: Australia  Canada  China  Korea  Switzerland
Search for


                    SWISS-PROT RELEASE 26.0 RELEASE NOTES


                               1. INTRODUCTION

   1.1  Evolution

   Release 26.0  of SWISS-PROT  contains 31808 sequence entries, comprising
   10'875'091 amino acids abstracted from 30967 references. This represents
   an increase  of 6.5% over release 25. The recent growth of the data bank
   is summarized below.

   Release    Date   Number of entries     Nb of amino acids

   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504
   18.0       05/91              20772             6 792 034
   19.0       08/91              21795             7 173 785
   20.0       11/91              22654             7 500 130
   21.0       03/92              23742             7 866 596
   22.0       05/92              25044             8 375 696
   23.0       08/92              26706             9 011 391
   24.0       12/92              28154             9 545 427
   25.0       04/93              29955            10 214 020
   26.0       07/93              31808            10 875 091

   1.2  Source of data

   Release 26.0  has been  updated using protein sequence data from release
   36.0 of  the PIR (Protein Identification Resource) protein data bank, as
   well as translation of nucleotide sequence data from release 35.0 of the
   EMBL Nucleotide Sequence Database.

   As an  indication to  the source  of the sequence data in the SWISS-PROT
   data bank we list here the statistics concerning the DR (Database cross-
   references) pointer lines:

   Entries with pointer(s) to only PIR entri(es):            4485
   Entries with pointer(s) to only EMBL entri(es):           4471
   Entries with pointer(s) to both EMBL and PIR entri(es):  22181
   Entries with no pointers lines:                            671




<PAGE>




      2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 25


   2.1  Sequences and annotations

   About 1875 sequences have been added since release 25, the sequence data
   of 286  existing entries  has been  updated and  the annotations of 4100
   entries have  been revised.  In particular we have used reviews articles
   to update  the annotations  of  the  following  groups  or  families  of
   proteins:

   -  6-hydroxy-D-nicotine oxidase and reticuline oxidase
   -  ABC-2 type transport system integral membrane proteins
   -  Acyl-CoA-binding protein
   -  Beta-eliminating lyases
   -  Calsequestrins
   -  CAP-Gly domain protein
   -  Carbamoyl-phosphate synthase
   -  Chaperonins clpA/B
   -  Chloroplast encoded proteins
   -  Cys/Met metabolism enzymes
   -  D-alanine--D-alanine ligases
   -  Dihydroxy-acid and 6-phosphogluconate dehydratases
   -  Galanin
   -  General secretion pathway (GSP) proteins
   -  Guanylate kinase family
   -  GTP cyclohydrolase I
   -  Hox complex proteins
   -  Indoleamine 2,3-dioxygenase
   -  MCM2/3/5 family
   -  Nickel-dependent hydrogenases b-type cytochrome subunit
   -  Ornithine/DAP/arginine decarboxylases family 2
   -  Osteopontin
   -  'POU' domain proteins
   -  Prephenate dehydratase
   -  Proteins containing cyclic nucleotide-binding domain(s)
   -  Proteasome A-type and B-type subunits
   -  Sodium:alanine symporters
   -  Sodium:galactoside symporters
   -  Sodium:neurotransmitter symporters


   2.2  Weekly updates of SWISS-PROT

   Since release 24, we provide weekly updates of SWISS-PROT. These updates
   are available by anonymous FTP. Three files are updated every week:

   new_seq.dat    Contains all the new entries since the last full release.
   upd_seq.dat    Contains the entries for which the sequence data has been
                  updated since the last release.
   upd_ann.dat    Contains the  entries for  which one  or more  annotation
                  fields have been updated since  the last release.




<PAGE>




   Currently these  files are  available on  the  following  anonymous  ftp
   servers:

   Organism       EMBL ftp server
   Address        ftp.embl-heidelberg.de (or 192.54.41.33)
   Directory      /pub/databases/swissprot/new

   Organism       Basel Biozentrum Biocomputing server (EMBnet SWISS node)
   Address        bioftp.unibas.ch (or 131.152.8.1)
   Directory      /archive_data/brand_new/swissprot/updates

   Organism       ExPASy (Geneva University Expert Protein Analysis System)
   Address        expasy.hcuge.ch  (or 129.195.254.61)
   Directory      /databases/swiss-prot/updates

   Organism       National Center for Biotechnology Information (NCBI)
   Address        ncbi.nlm.nih.gov (or 130.14.20.1)
   Directory      /repository/swiss-prot/updates


   !! Important notes !!!

   Although we  try to  follow a  regular schedule,  we do  not promise  to
   update these  files every  week. In some cases two weeks will elapse in-
   between two updates.

   Due to  the current  mechanism used  to build a release the entries that
   are provided in these updates are not guaranteed to be error free. Also,
   for the  same reason,  new  entries  do  not  contain  an  OC  (Organism
   Classification) line.



                            3. ENZYME AND PROSITE

   3.1  The ENZYME data bank

        3.1.1  Statistics

   Release 13.0  of the  ENZYME data bank is distributed with release 26 of
   SWISS-PROT. ENZYME  release 13.0  contains information  relative to 3489
   enzymes.


        3.1.2  Important change to the User's manual

   A form  is now  provided that  can be  used to  fill in  the information
   necessary for  the Nomenclature  Committee of  the IUBMB to assign an EC
   number or  to correct  error in existing entries. this form can be found
   at the end of the User's manual of the ENZYME data bank (Appendix 1).






<PAGE>




   The NC-IUBMB  will regularly  send  us  updates  and  additions  to  the
   nomenclature so  that they  can be  integrated into  the data  bank in a
   timely manner.


        3.1.3  Citation

   A paper  has been published that briefly describes the ENZYME data bank.
   You can use it if you want to cite ENZYME in a publication:

      Bairoch A.
      The ENZYME data bank
      Nucleic Acids Res. 21:3155-3156(1993).


   3.2  The PROSITE data bank

   Release 10.2  of the PROSITE data bank is distributed with release 26 of
   SWISS-PROT.  Release  10.2  contains  635  documentation  chapters  that
   describes 803 different patterns. Release 10.2 does not really represent
   a new release; the only changes between releases 10.0; 10.1 and 10.2 are
   updating of  the pointers to the SWISS-PROT entries whose name have been
   modified between  releases 25 and 26. The next release of PROSITE (11.0)
   will be distributed with release 27 of SWISS-PROT.

   Normally release  11 should  have been  distributed with  release 26  of
   SWISS-PROT, but  technical difficulties  have delayed  this new  release
   which will offer many new patterns.



                            4. WE NEED YOUR HELP !

   We welcome  feedback from our users. We would especially appreciate that
   you notify  us if  you find  that sequences  belonging to  your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about  annotations to be updated, if, for example, the function
   of a protein has been clarified or if new post-translational information
   has become available.

















<PAGE>



                         APPENDIX A: SOME STATISTICS



   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.68   Gln (Q) 4.03   Leu (L) 9.19   Ser (S) 7.06
   Arg (R) 5.25   Glu (E) 6.26   Lys (K) 5.80   Thr (T) 5.83
   Asn (N) 4.43   Gly (G) 7.08   Met (M) 2.35   Trp (W) 1.31
   Asp (D) 5.26   His (H) 2.25   Phe (F) 3.99   Tyr (Y) 3.22
   Cys (C) 1.79   Ile (I) 5.52   Pro (P) 5.04   Val (V) 6.53

   Asx (B) 0.005  Glx (Z) 0.005  Xaa (X) 0.02


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala,  Gly, Ser,  Val, Glu,  Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 4052

        A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 1844
                            2x:  662
                            3x:  385
                            4x:  243
                            5x:  179
                            6x:  135
                            7x:   87
                            8x:   76
                            9x:   81
                           10x:   45
                       11- 20x:  147
                       21- 50x:   98
                       51-100x:   32
                         >100x:   38













<PAGE>




        A.2.2  Table of the most represented species

    Number   Frequency          Species
         1        2454          Human
         2        2222          Escherichia coli
         3        1439          Mouse
         4        1339          Rat
         5        1220          Baker's yeast (Saccharomyces cerevisiae)
         6         634          Bovine
         7         560          Fruit fly (Drosophila melanogaster)
         8         477          Chicken
         9         454          Bacillus subtilis
        10         362          African clawed frog (Xenopus laevis)
        11         340          Salmonella typhimurium
        12         333          Rabbit
        13         298          Pig
        14         251          Vaccinia virus (strain Copenhagen)
        15         222          Maize
        16         193          Human cytomegalovirus (strain AD169)
        17         177          Arabidopsis thaliana (Mouse-ear cress)
                   177          Rice
        19         176          Vaccinia virus (strain WR)
        20         167          Bacteriophage T4
        21         161          Pea
        22         159          Tobacco
                   159          Wheat
        24         151          Pseudomonas aeruginosa
        25         142          Caenorhabditis elegans
        26         141          Fission yeast (Schizosaccharomyces pombe)
        27         133          Barley
        28         129          Staphylococcus aureus
        29         127          Spinach
        30         125          Soybean
        31         123          Sheep
        32         122          Slime mold (Dictyostelium discoideum)
        33         119          Marchantia polymorpha (Liverwort)
        34         118          Rhodobacter capsulatus
        35         115          Dog
        36         113          Pseudomonas putida
        37         110          Neurospora crassa
                   110          Klebsiella pneumoniae















<PAGE>













   A.3  Repartition of the sequences by size



               From   To  Number             From   To   Number
                  1-  50    1915             1001-1100      306
                 51- 100    3248             1101-1200      188
                101- 150    4568             1201-1300      151
                151- 200    3081             1301-1400       95
                201- 250    2653             1401-1500       80
                251- 300    2358             1501-1600       41
                301- 350    2165             1601-1700       43
                351- 400    2244             1701-1800       36
                401- 450    1657             1801-1900       40
                451- 500    1851             1901-2000       31
                501- 550    1247             2001-2100       12
                551- 600     871             2101-2200       37
                601- 650     614             2201-2300       47
                651- 700     445             2301-2400       16
                701- 750     437             2401-2500       17
                751- 800     351             >2500           94
                801- 850     256
                851- 900     270
                901- 950     166
                951-1000     177


   Currently the ten largest sequences are:


                            RYNR_RABIT  5037 a.a.
                            RYNR_HUMAN  5032 a.a.
                            APB_HUMAN   4563 a.a.
                            APOA_HUMAN  4548 a.a.
                            RRPA_CVMJH  4488 a.a.
                            DYHC_TRIGR  4466 a.a.
                            GRSB_BACBR  4451 a.a.
                            PLEC_RAT    4140 a.a.
                            POLG_BVDV   3988 a.a.
                            VGF1_IBVB   3951 a.a.







<PAGE>



                         APPENDIX B: ON-LINE EXPERTS


   B.1  List of on-line experts for PROSITE and SWISS-PROT


Field of expertise            Name               Email address
---------------------------   ------------------ ----------------------------
Alcohol dehydrogenases        Joernvall H.       hans.jornvall@k1m.ki.se
                              Persson B.         bengt.persson@embl-
                                                 heidelberg.de
Aldehyde dehydrogenases       Joernvall H.       hans.jornvall@k1m.ki.se
                              Persson B.         bengt.persson@embl-
                                                 heidelberg.de
Alpha-crystallins/HSP-20      Leunissen J.A.M.   jackl@caos.caos.kun.nl
                              de Jong W.         u629000@hnykun11.bitnet
Alpha-2-macroglobulins        Van Leuven F.      fred@blekul13.bitnet
AA-tRNA synthetases class II  Leberman R.        leberman@frembl51.bitnet
Apolipoproteins               Boguski M.S.       boguski@ncbi.nlm.nih.gov
AraC family HTH proteins      Ramos J.L.         jlramos@cnbvx3.cnb.uam.es
Arrestins                     Kolakowski L.F.Jr. kolakowski@helix.mgh.
                                                 harvard.edu
ATP synthase c subunit        Recipon H.         recipon@ncbi.nlm.nih.gov
Band 4.1 family proteins      Rees J.            jrees@vax.oxford.ac.uk
Beta-lactamases               Brannigan J.       jab5@vaxa.york.ac.uk
Beta-transducin family        Boguski M.S.       boguski@ncbi.nlm.nih.gov
C-type lectin domain          Drickamer K.       drick@cuhhca.hhmi.columbia.
                                                 edu
Chalcone/stilbene synthases   Schroeder J.       raf@sun1.ruf.uni-freiburg.de
Chaperonins cpn10/cpn60       Georgopoulos C.    georgopo@cmu.unige.ch
Chaperonins TCP1 family       Willison K.R.      willison@icr.ac.uk
Chitinases                    Henrissat B.       bernie@cermav.grenet.fr
Clusterin                     Peitsch M.C.       peitsch@ulbio1.unil.ch
Cold shock domain             Landsman D.        landsman@ncbi.nlm.nih.gov
CTF/NF-I                      Mermod N.          nmermod@ulys.unil.ch
                              Gronostajski R.    gronosr@ccsmtp.ccf.org
Cytochromes P450              Holsztynska E.J.   ela@netcom.uucp
                                                 netcom!ela@apple.com
DEAD-box helicases            Linder P.          linder@urz.unibas.ch
dnaJ family                   Kelley W.          kelley@cmu.unige.ch
EF-hand calcium-binding       Cox J.A.           cox@sc2a.unige.ch
                              Kretsinger R.H.    rhk5i@virginia.bitnet
Elongation factor 1           Amons R.           wmbamons@rulgl.leidenuniv.nl
Enoyl-CoA hydratase           Hofmann K.O.       khofmann@biomed.biolan.uni-
                                                 koeln.de
Fatty acid desaturases        Piffanelli P.      piffanelli@jii.afrc.ac.uk
fruR/lacI family HTH proteins Reizer J.          jreizer@ucsd.edu
GATA-type zinc-fingers        Boguski M.S.       boguski@ncbi.nlm.nih.gov
GDT/GTP dissociation stimul.  Boguski M.S.       boguski@ncbi.nlm.nih.gov
GltP family of transporters   Hofmann K.O.       khofmann@biomed.biolan.uni-
                                                 koeln.de
Glucanases                    Henrissat B.       bernie@cermav.grenet.fr
                              Beguin P.          phycel@pasteur.bitnet



<PAGE>




Glutamine synthetase          Tateno Y.          ytateno@genes.nig.ac.jp
G-protein coupled receptors   Chollet A.         arc3029@ggr.co.uk
                              Attwood T.K.       bph6tka@biovax.leeds.ac.uk
GTPase-activating proteins    Boguski M.S.       boguski@ncbi.nlm.nih.gov
HIT family                    Seraphin B.        seraphin@embl-heidelberg.de
HMG1/2 and HMG-14/17          Landsman D.        landsman@ncbi.nlm.nih.gov
Inorganic pyrophosphatases    Kolakowski L.F.Jr. kolakowski@helix.mgh.
                                                 harvard.edu
Integrases                    Roy P.H.           2020000@saphir.ulaval.ca
Kringle domain                Ikeo K.            kikeo@genes.nig.ac.jp
Lipocalins                    Boguski M.S.       boguski@ncbi.nlm.nih.gov
                              Peitsch M.C.       peitsch@ulbio1.unil.ch
lysR family HTH proteins      Henikoff S.        henikoff@sparky.fhcrc.org
MAC components / perforin     Peitsch M.C.       peitsch@ulbio1.unil.ch
Malic enzymes                 Glynias M.         mglynias@ncsa.uiuc.edu
MAM domain                    Bork P.            bork@embl-heidelberg.de
MIP family proteins           Reizer J.          jreizer@ucsd.edu
Myelin proteolipid protein    Hofmann K.O.       khofmann@biomed.biolan.uni-
                                                 koeln.de
Pancreatic trypsin inhibitor  Ikeo K.            kikeo@genes.nig.ac.jp
PEP requiring enzymes         Reizer J.          jreizer@ucsd.edu
pfkB carbohydrate kinases     Reizer J.          jreizer@ucsd.edu
Phytochromes                  Partis M.D.        partis@gcri.afrc.ac.uk
Protein kinases               Hanks S.           hanks@vuctrvax.bitnet
                              Hunter T.          hunter@salk-sc2.sdsc.edu
PTS proteins                  Reizer J.          jreizer@ucsd.edu
Restriction-modification      Bickle T.          bickle@urz.unibas.ch
            enzymes           Roberts R.J.       roberts@neb.com
Ribosomal protein S15         Ellis S.R.         srelli01@ulkyvm.bitnet
Ring-cleavage dioxygenases    Harayama S.        sharayam@ddbj.nig.ac.jp
Signal sequence peptidases    von Heijne G.      gvh@csb.ki.se
                              Dalbey R.E.        rdalbey@magnus.acs.ohio-
                                                 state.edu
Sodium symporters             Reizer J.          jreizer@ucsd.edu
Subtilases                    Brannigan J.       jab5@vaxa.york.ac.uk
                              Siezen R.J.        nizo@caos.caos.kun.nl
Thiol proteases               Turk B.            turk@ijs.ac.mail.yu
Thiol proteases inhibitors    Turk B.            turk@ijs.ac.mail.yu
TNF family                    Jongeneel C.V.     vjongene@isrecmail.unil.ch
TPR repeats                   Boguski M.S.       boguski@ncbi.nlm.nih.gov
Transit peptides              von Heijne G.      gvh@csb.ki.se
Type-II membrane antigens     Levy S.            levy@cellbio.stanford.edu
Uracil-DNA glycosylase        Aasland R.         aasland@bio.uib.no
Vitamin K-depend. Gla domain  Price P.A.         pprice@ucsd.edu
XPGC protein                  Clarkson S.G.      clarkson@cmu.unige.ch
Xylose isomerase              Jenkins J.         jenkins@frira.afrc.ac.uk
WAP-type domain               Claverie J.-M.     jmc@ncbi.nlm.nih.gov
ZP domain                     Bork P.            bork@embl-heidelberg.de









<PAGE>




African swine fever virus     Yanez R.J.         ryanez@cbm2.uam.es
Bacteriophage P4              Halling C.         chh9@midway.uchicago.edu
Chloroplast encoded proteins  Hallick R.B.       hallick@arizona.edu
Drosophila                    Ashburner M.       ma11@phx.cam.ac.uk
Escherichia coli              Rudd K.            rudd@ncbi.nlm.nih.gov
Salmonella typhimurium        Rudd K.            rudd@ncbi.nlm.nih.gov
Snakes                        Stocklin R.        stocklin@cmu.unige.ch


Requirements to fulfill to become an on-line expert
===================================================

An expert should be a scientist working with specific famili(es) of proteins
(or specific domains) and who would:

  a) Review the protein sequences in SWISS-PROT and the patterns/matrices
     in PROSITE relevant to their field of research.
  b) Agree to be contacted by  people  that have obtained new sequence(s)
     which seem to belong to "their" familie(s) of proteins.
  c) Have access to electronic mail  and be willing to use it to send and
     receive data.

 If you are willing to be part of this scheme please contact Amos Bairoch
 at the following electronic mail address:

                       bairoch@cmu.unige.ch






























<PAGE>




           APPENDIX C: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES

   The current  status of the relationships (cross-references) between some
   biomolecular databases is shown in the following schematic:

                                                       **********************
                        ***********************        * EPD [Euk. Promot.] *
                        *  EMBL Nucleotide    * <----> **********************
                        *  Sequence Data      *
******************      *  Library            *        **********************
* FLYBASE        * <--> *********************** <----- * ECD [E. coli map]  *
* [Drosophila    *                ^         ^          **********************
* genomic d.b.]  * <------+       |         |
******************        |       |         +--------- **********************
                          |       |                    * TFD [Trans. fact.] *
                          |       |         +--------> **********************
                          |       |         |  
******************        v       v         v          **********************
* REBASE         *      ***********************        * ENZYME [Nomencl.]  *
* [Restriction   * <--- *  SWISS-PROT         * <----- **********************
*  enzymes]      *      *  Protein Sequence   *            |
******************      *  Data Bank          *            v
                        ***********************        **********************
******************       ^  ^  |  |  ^   ^  |          * OMIM   [Diseases]  *
* EcoGene/EcoSeq *       |  |  |  |  |   |  +--------> **********************
* [E. coli]      * <-----+  |  |  |  |   |
******************          |  |  |  |   +-----------> **********************
                            |  |  |  |                 * ECO2DBASE     [2D] *
                            |  |  |  |                 **********************
******************          |  |  |  |
* PROSITE        * <--------+  |  |  +---------------> **********************
* [Patterns]     *             |  |                    * SWISS-2DPAGE  [2D] *
******************             |  +---------------+    **********************
             |                 v                  |                          
             |          ***********************   |    **********************
             +--------> * PDB [3D structures] *   +--> * Aarhus/Ghent  [2D] *
                        ***********************        **********************



















<PAGE>

ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by br flag LNCC Brazil Mirror sites: Australia  Canada  China  Korea  Switzerland