Note to users. If you're seeing this message, it means that your browser cannot find this page's style/presentation instructions -- or possibly that you are using a browser that does not support current Web standards. Find out more about why this message is appearing, and what you can do to make your experience of our site the best it can be.


Science Functional Genomics Resources
Featured Sites

Site of the Month

DNA Patent Database
For patented methods ranging from the construction of DNA cloning vectors to the synthesis of amino acids, explore the DNA Patent Database. This joint project of Georgetown University's Kennedy Institute of Ethics and the Foundation for Genetic Medicine allows free public access to the full text and analysis of all DNA patents issued by the United States Patent and Trademark Office. You can search for patents using keywords or browse the collection by name and title. An accompanying users guide describes the database search feature and even includes a step-by-step strategy for finding your patent of interest.




Gateways to General Resources




Nine Great General Biology, Biochemistry, and Bioinformatics Sites



Selected Research Centers

Brutlag Bioinformatics Group
The Brutlag Bioinformatics Group is a Stanford-based organization with a focus on predicting protein structure and function from primary sequence. Products developed by the group (EMOTIF, EMATRIX, and 3MOTIF) are used in assigning functions to unidentified genomic sequences. Other software products developed include LOCK and 3DSEARCH, which are used for comparing protein structures and searching structural databases.
Munich Information Center for Protein Sequences (MIPS)
MIPS is a bioinformatics center at the Max Planck Institute involved in numerous bioinformation and sequencing projects. The center contributes to the International Protein Sequence Databank, the European Yeast Functional Analysis Program, the European Molecular Biology Network (EMBNET), PEDANT (proteomics tool for genomic analysis), and Protfam (protein family database). Genome sequencing projects include Arabidopsis thaliana and Neurospora crassa.
National Center for Biotechnology Information (NCBI)
The NCBI is a bioinformatics-based center of the NIH. Its major efforts are creation of public databases, research in computational biology, software for analyzing genomic data, and dissemination of biomedical information. Through Entrez, the NCBI provides access to Genbank, protein sequences, completed genomes, structure databases, evolutionarily related sequences, taxonomy, and the Online Mendelian Inheritance in Man database of human genes involved in disease.
The Institute for Genomic Research (TIGR)
TIGR is a not-for-profit research institute that has had a significant role in revealing structural, functional, and comparative features of genomes and gene products in dozens of organisms, including humans. The TIGR site provides access to numerous genome sequences, software tools, and microarray information.

Links to other major research centers




Sequence Data

DNA Data Bank of Japan (DDBJ)
The lesser known of the the world public DNA sequence databases, DDBJ is the sole DNA databank in Japan certified to serve as an official repository of DNA information. While the DDBJ serves to collect information from Japanese sources, it also accepts data from other international contributors. DDBJ shares information with both Genbank and EMBL on a daily basis so that the same data can be viewed on any of the services virtually simultaneously.
NCBI-UniGene
NCBI's UniGene system is designed to partition GenBank sequences into a non-redundant set of gene-oriented clusters. The database uses both established genes and novel expressed sequence tag (EST) sequences in defining clusters. Clusters contain unique gene sequences, the tissue types in which the gene has been expressed, and their map locations.
Protein Information Resource (PIR)
The PIR, hosted by Georgetown University, is one of several international databases of protein amino acid sequence information. In collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), PIR collects, publishes and distributes information about protein sequences, alignments, and groups of proteins to aid understanding of molecular evolution and protein function. The PIR repository also provides supplementary sequence and annotation databases, including: NRL 3D (from Brookhaven crystallographic data); PIR-ALN (protein sequence alignments), and RESID (a database of covalent protein modifications).

Other sites focused on nucleotide, genome, or protein sequence data




Expression Data

BodyMap
BodyMap offers a database of the expression of human and mouse genes, using data generated by random sequencing of cDNAs from carefully isolated tissues. Visitors to the site can select a tissue and get a list of the frequency of occurrence of each mRNA in it.
MIRAGE (Molecular Informatics Resource for the Analysis of Gene Expression)
A self-described "experimental Web resource" at the Institute For Transcriptional Informatics (IFTI), MIRAGE provides several tools for studying gene expression. They include downloadable software products, an impressive (albeit somewhat disorganized) collection of links, and access to the object-oriented Transcription Factor Database (ooTFD). A particularly strong point of the site is its emphasis on transcription factors images.
MRC HGU Mouse Atlas and Gene Expression Home Page
A different approach to describing gene expression is provided at the Mouse Atlas and Gene Expression site - 3D models. These are derived from measuring expression in anatomically-based mouse embryonic development. Though still in the construction phase, the site has the potential to expand the dimensions of understanding of gene expression in the mouse.
Stanford Microarray Database
Microarray technology provides the most comprehensive methodology for studying genome-wide gene expression. The Stanford Microarray Database not only provides access not only to extensive array data in yeast, but is a good source of information about microarray use in general.

Other gene expression and microarray sites




Protein Structure

3DB Browser
Provides an excellent mechanism for accessing Protein Data Bank (PDB) structures. The user interface consists of a search engine with many user-specified criteria, such as keyword, ID, text, or author. Additional user-specified restrictions include the method of structure determination (X-ray, NMR, or theoretical), resolution, organism, deposit date, chain size, and data source. FASTA searches are also available.
CATH Protein Classification
The CATH database provides a view of protein structures in the Brookhaven crystallographic database organized in a hierarchical domain-based fashion. The hierarchy consists of sortings according to class (secondary structure composition), architecture (overall domain structure), topology (fold families), homologous superfamilies (protein domains with common ancestors), and sequence families (sequence identity relationships).
Database of Macromolecular Movements
What PDB is to protein structure and IMB-JENA is to macromolecular visualization, the Database of Macromolecular Movements is to macromolecular motion. Visitors to the Database of Molecular Movements can watch complex molecular motions (primarily in proteins), create movies with the online Morph tool, download software for plotting molecular geometries, and read informative papers on the subject.
PRINTS
As stated at the site, PRINTS is a database of protein fingerprints. Fingerprints comprise groups of conserved motifs in proteins that can be used to characterize a family. Several homology-based mechanisms of searching the database are provided, including BLAST, InterPro, SPRINT (search a PRINTS relational database), in addition to simple retrievals via accession number, text, title, sequence, or author.
SCOP (Structural Classification of Proteins)
The most broadly encompassing and widely used database of protein structure classes, SCOP relies on structural and evolutionary knowledge to identify structural domains. Numerous links and search options are available, including hierarchical searches by class, by entry of a PDB identifier, by sequences corresponding to SCOP domains, by superfamily searches, and others. A site used in virtually every protein structure analysis.

Other sites on protein structure and function




Proteomics

BCM Search Launcher
Gateway to an extensive set of analysis tools for DNA and protein sequences. The centralized search launcher provides access to standard sequence searchers (multiple BLAST and FASTA algorithms), pattern searches (Prosite, Blocks, COGs), sequence alignments (CLUSTALW, CAP, PIMA), gene feature identification (exon/intron boundaries, promoter/transcription factor binding, open-reading-frame identification), secondary structure prediction (coiled-coils, transmembrane, hydrophilicity/hydrophobicity), and miscellaneous sequence utilities (FASTA format conversion, restriction cutters, etc.). A great one-stop shop.
EMBL DALI
The DALI server provides email- or Web-based querying of an unknown protein structure against the Protein Data Bank structures in 3D. Users submit the coordinates of an unknown and receive back multiple alignments of similar 3D homologs.
InterPro
InterPro attempts to unify information stored in structure databases, such as PROSITE, PRINTS, Pfam, and ProDom, into a format that can be accessed easily from a single site. Strong points include a simple interface, excellent information compilations from many sources, and very helpful hyperlinking. InterPro can be either downloaded by FTP or accessed via the Web at the address above.
NCBI HomoloGene
A variation on identifying orthologs (genes in different species with a common origin) is taken by HomoloGene. Human, mouse, rat, and zebrafish genes in the UniGene and Locuslink were compared for nucleotide sequence similarity. Putative orthologs were then identified as UniGene clusters corresponding to the two sequences that are each other's best match. Datasets from Homogene can be downloaded by FTP.
PredictProtein
PredictProtein provides a portal to common protein databases (EMBL, SWISS-PROT, TREMBL, and PDB) for users to submit unknown sequences for comparison against. Similarity matching methods provided by PredictProtein include BLAST, COILS, ProSite, ProDom, and others. A one-stop shop for working with unknown protein sequences.
Protein Topology (TOPS)
The TOPS language of describing protein structure provides the underpinnings of this site, which allows visitors to search structure databases for structural domains. In addition, users can supply a protein structure/domain and have the TOPS site compare it against all domains in the TOPS Atlas or against the entire PDB set of domains (over 24,000).

Links to other proteomics resources




Medical Resources

GeneCards (human genes, maps, proteins and diseases)
One of the most comprehensive sources of information about genes involved in disease, GeneCards is an elegantly designed site at the Weizmann Institute. The database is accessed by a simple text-based search engine on the opening page or by specification of chromosome region or marker name. Cards of information retrieved by the program contain text with hyperlinks to dozens of other Web analyses for protein structure, relation to disease, sequence, alignments, homologues, and references.
Human Chromosome Launchpad
Designed to provide an easy-to-use interface to human genetic information, the opening page of the Human Chromosome Launchpad presents hyperlinks for each numbered chromosome. Clicking on one of the links brings up an extensive set of hyperlinked information about sequences in that particular chrmosome. While the approach is rather broad-based for gene information, the site functions well at the chromosomal level due to its simple design.

Other resources on genomics and medicine


 

Page content:
ahernk{at}ucs.orst.edu
,
Oregon State University/
DaVinci Press, Inc.

 


Search the Web
Google


Upcoming Meetings

Database QuickSearch

Integrated databases

Sequence Retrieval System (EBI/LION Bioscience)
Entrez (NCBI)

Sequences

GenBank nucleotides
GenBank genomes

Sequence homology

NCBI BLAST server (using NCBI-BLAST 2.1):
Basic search
Advanced search
Blast two nucleotide or protein sequences
BLAST overview
BLAST tutorial

EMBL BLAST servers (using WU-BLAST 2.0):
EBI BLAST
Advanced BLAST search at EMBL
Ensembl BLASTView

Gene expression

NCBI:
Gene Expression Omnibus (GEO)
Serial Analysis of Gene Expression (SAGE)
UniGene

Other sources:
Stanford Microarray Database
ChipDB (Whitehead Institute)
Bodymap (human and mouse expression data)

Protein sequence/
proteomics


ExPASy Servers:
Sequence Retrieval System (SRS) (gateway to ExPASy databanks)
Swiss-Prot/TrEMBL (protein sequences)
Prosite (protein families and domains)
Swiss-2D PAGE (gel electrophoresis database)
ENZYME (enzyme nomenclature)

MIPS:
ATLAS Gateway (multiple protein sequence databases)
MIPS Sequence Retrieval System (SRS)
ProtFam (protein families and superfamilies)

Other sources:
NCBI protein search
Proteome Databases
(Proteome, Inc.)

Protein structure

RCSB Protein Data Bank
3-D protein structure at NCBI
Swiss-Model (automated comparative protein structure modeling from ExPASy)
Swiss-3DImage (automated comparative protein structure modeling from ExPASy
Other protein structure servers

Other databases

Online Mendelian
Inheritance in Man

(NCBI/Johns Hopkins)
Online Mendelian
Inheritance in Animals

(University of Syndey)




AutoAlert Services

Cite-Track
E-mail updates of new content in Science and other journals (Science Online)

PubCrawler
Scans daily for updates to NCBI's Medline and GenBank databases (Trinity College, Dublin)

Swiss-Shop
E-mail reports of new SWISS-PROT protein sequence entries relevant to specific fields of interest (ExPASy)

Automated Entrez Queries
User-customized daily, weekly, or monthly queries of various NCBI databases (Arizona Research Laboratories)

Sequence Alerting System
E-mail updates on homologues of user-specified DNA and protein sequences (EMBL)

Journal Abstracts Delivered Electronically (JADE)
Weekly search of new Medline entries (NCEMI)

BioMail
New references from Medline to users' e-mail accounts, based on user-customized criteria (SUNY, Stony Brook)




Seen anything interesting on the Web in genomics or post-genomics lately? Send us the URL!




Copyright © 2004 by The American Association for the Advancement of Science. All rights reserved.