About

During the post-genomic era, the large volume of new genotypic and phenotypic information makes it extremely difficult for researchers to keep up to date. However, understanding relations among genotypic and phenotypic information is crucial to biological research. This creates a pressing need for visualizing a large number of dimensions of genotypic and phenotypic information across multiple databases using user-friendly interfaces. Our motivation is the development of a method that satisfies the basic requirements for visualization of multidimensional biological databases, where we currently focus on genotypic-phenotypic relations obtained from bibliographic and biological databases.

We developed a novel, flexible and generalizable visualization tool, called PGviewer, which was used to display gene-phenotype relations extracted from articles using a Natural Language Processing (NLP) tool called BioMedLEE and from a human-curated database OMIM. Data obtained from multiple databases were first integrated into a uniform structure and then manipulated by PGviewer. PGviewer provides a query interface that allows dynamic selection and ordering of any desired dimension in the databases. Based on users' queries, results can be visualized using expandable trees that present views specified by users according to their research interests. We believe that this method, which allows users to dynamically organize and visualize multiple dimensions, is a potentially powerful and promising tool that should substantially facilitate biological research.

Home

PGviewer version 5: A work in progress

Human genomics dataset

The first dataset shows gene-phenotype relations collected in OMIM, which were obtained by human manual curation. The human genomics dataset was obtained from the entire OMIM Gene Map table downloaded from the OMIM website, which, as of May 15 of 2004, contains 9,042 entries of gene-disorder relations. For this dataset, we extracted gene name, gene location, disorder and OMIM number from this table. We also obtained the bibliographic information for each OMIM entry using a script to read OMIM website. To disclose the molecular mechanism of human hereditary diseases, we added GO terms for each OMIM entry via LocusLink (Maglott, Katz et al. 2000). The files we used are mim2loc and loc2go downloaded from the OMIM website. We have nine dimensions in our human genomics dataset: 1) OMIM_ID (including OMIM title), 2) gene location, 3) gene, 4) GO_term, 5) disorder, 6) PubMed_ID (including article titles), 7) year, 8) journal and 9) authors.

  You see this message because your browser does not support Java applet (Email to yves.lussier@dbmi.columbia.edu for username and password)

Mouse genomics dataset

The second dataset shows gene-phenotype relations extracted from a subset of MGI using an automatic NLP program called BioMedLEE. The mouse genomics dataset comes from three databases: 1) a subset of MEDLINE citations related to tumorigenesis in MGI, 2) gene and phenotype relations extracted from these articles using BioMedLEE, where phenotypes are encoded using the Unified Medical Language System (UMLS), and 3) a UMLS-GO mapping database (Sarkar, Cantor et al. 2003) which map terms from the Unified Medical Language System (UMLS) (Lindberg 1990) to GO terms.

You see this message because your browser does not support Java applet (Email to yves.lussier@dbmi.columbia.edu for username and password)

Home

People

PGviewer was developed by Ying Tao under the guidance of Dr. Carol Friedman and Dr Yves A. Lussier. Lyudmila Shagina, Hua Xu and Jianrong Li also contributed to this project. The authors thank Judith A. Blake, Janan T. Eppig and Joanna Amberger for providing assistance in understanding the MGI and OMIM genomics databases.

Home

Grants

This study is partially supported by the National Institute for Allergy and Infectious Disease Grant #1U54 AI 57159-01, and by the National Library of medicine Grants # R01 LM007659-01, 1K22 LM008308-01 and by the NYSTAR grant # 5-67674.

Home