ExPASy Home page |
Site Map | Search ExPASy | Contact us | Swiss-Prot | Proteomics tools |
HTML - BLAST native output format with hyperlinks and some formatting.
NiceBlast - View with full descriptions and organism sources.
Plain Text - Text format with no links.
Programs available on ExPASy |
|
| blastp | compares a protein query sequence against a protein sequence database. |
|---|---|
| tblastn | compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames. |
Programs available elsewhere |
|
| blastn |
compares a nucleotide query sequence against a nucleotide sequence database. |
| blastx | compares a nucleotide query sequence translated in all
reading frames against a protein sequence database. Available at EMBnet Switzerland |
| tblastx |
compares the six-frame translations of a nucleotide query sequence against
the six-frame translations of a nucleotide sequence database. |
| PSI-BLAST | Position Specific Iterative BLAST detects weak homologs
by building a profile from a multiple alignment of the highest scoring hits
in an initial BLAST search. Available at NCBI |
| PHI-BLAST |
Pattern-Hit Initiated BLAST combines matching of regular expressions
with local alignments surrounding the match. |
| UniProt Knowledgebase (UniProtKB) |
UniProt (Universal Protein Resource) is a central
repository of protein sequence and function created by joining the
information contained in Swiss-Prot, TrEMBL, and PIR. The UniProt Knowledgebase
consists of two sections: Swiss-Prot, containing manually-annotated records
with information extracted from literature and curator-evaluated
computational analysis, and TrEMBL, a section with computationally analyzed
records that await full manual annotation. Updated biweekly and includes
splice variants.
Since UniProtKB contains a huge number of sequences, it may be useful to restrict the search using the following criteria:
|
UniRef100, UniRef90 and UniRef50 | The UniProt Non-redundant Reference (UniRef) databases combine closely related sequences into a single record to speed searches. The UniRef100 database combines identical sequences and sub-fragments of the UniProt Knowledgebase (from any species) into a single UniRef entry, displaying the sequence of a representative protein, the accession numbers of all the merged UniProt entries, and links to the corresponding UniProt and UniParc records. UniRef90 and UniRef50 are built by clustering UniRef100 sequences with 11 or more residues such that each cluster is composed of sequences that have at least 90% or 50% sequence identity, respectively, to the representative sequence. UniRef90 and UniRef50 yield a database size reduction of approximately 40% and 65%, respectively, providing for significantly faster sequence searches. |
|---|---|
| PDB | Protein Data Bank for protein 3D structures. Sequences extracted from the PDB SEQRES lines are processed into a non-redundant set where identical sequences are merged into a single record. |
| Translated EST | Protein sequences derived from EST sequencing data (human, mouse, rat, zebrafish, drosophila, bovine, arabidopsis). This database contains many potential errors because of the low quality of the data. |
All databases are subdivided into taxonomic sections, selectable from the Taxonomic groups drop-down list.
| All EMBL + GSS | All entries from the EMBL database (equivalent to GenBank and DDBJ). |
|---|---|
| HTG | Unverified data from high-throughput genomic sequencing. Usually in the form of cosmids. |
| dbEST | Expressed sequence tag database from the NCBI. |
| EST contigs | Database of contigs based on EST clusters from Unigene (human, mouse, rat, bovine, zebrafish) and SwissClusters (Drosophila melanogaster, Arabidopsis thaliana). |
| Unigene EST | Database of EST clusters (list of ESTs known to match the same cDNA) from the NCBI (updated occasionally). This database contains also useful information like STS matches, tissue distribution, or transcript map. |
| Complete genomes | Genomes released in the form of a complete, assembled sequence. |
| Select a microbial genome | One of the genomes released in the form of a complete, assembled sequence. |
| Query length | Substitution matrix |
|---|---|
| <35 | PAM-30 |
| 35-50 | PAM-70 |
| 50-85 | BLOSUM-80 |
| >85 | BLOSUM-62 |
The expectation value (E) threshold is a statistical measure of the number of expected matches in a random database. The lower the e-value, the more likely the match is to be significant. E-values between 0.1 and 10 are generally dubious, and over 10 are unlikely to have biological significance. In all cases, those matches need to be verified manually. You may need to increase the E threshold in the following cases :
BLAST Frequently Asked questions at NCBI (includes error messages)
The Statistics of Sequence Similarity Scores by Altschul
ExPASy Home page |
Site Map | Search ExPASy | Contact us | Swiss-Prot | Proteomics tools |
| Hosted by | Mirror sites: | Australia | Canada | China | Korea | Switzerland |