|
About
During the post-genomic era, the
large volume of new genotypic and phenotypic information makes it
extremely difficult for researchers to keep up to date. However,
understanding relations among genotypic and phenotypic information is
crucial to biological research. This creates a pressing need for
visualizing a large number of dimensions of genotypic and phenotypic
information across multiple databases using user-friendly
interfaces. Our motivation is the development of a method that satisfies
the basic requirements for visualization of multidimensional
biological databases, where we currently focus on genotypic-phenotypic
relations obtained from bibliographic and biological databases. We developed a novel, flexible and generalizable visualization tool,
called PGviewer, which was used to display gene-phenotype relations
extracted from articles using a Natural Language Processing (NLP) tool called
BioMedLEE and from a human-curated database OMIM. Data obtained from
multiple databases were first integrated into a uniform structure and
then manipulated by PGviewer. PGviewer provides a query
interface that allows dynamic selection and ordering of any desired
dimension in the databases. Based on users' queries, results can be
visualized using expandable trees that present views
specified by users according to their research interests. We believe
that this method, which allows users to dynamically organize and
visualize multiple dimensions, is a potentially powerful and promising
tool that should substantially facilitate biological research.
Home |
|
PGviewer version 5: A work in progress
Human genomics dataset
The first dataset shows gene-phenotype relations collected in OMIM,
which were obtained by human manual curation. The human genomics dataset
was obtained from the entire OMIM Gene Map table downloaded from the
OMIM website, which, as of May 15 of 2004, contains 9,042 entries of gene-disorder relations.
For this dataset, we extracted gene name, gene location, disorder and
OMIM number from this table. We also obtained the bibliographic
information for each OMIM entry using a script to read OMIM website.
To disclose the molecular mechanism of human hereditary diseases, we
added GO terms for each OMIM entry via LocusLink (Maglott, Katz et al.
2000). The files we used are mim2loc and loc2go downloaded from the OMIM
website. We have nine dimensions in our human genomics dataset: 1)
OMIM_ID (including OMIM title), 2) gene location, 3) gene, 4) GO_term,
5) disorder, 6) PubMed_ID (including article titles), 7) year, 8)
journal and 9) authors.
(Email to
yves.lussier@dbmi.columbia.edu
for username and password)
Mouse genomics dataset The second dataset shows gene-phenotype relations extracted from a
subset of MGI using an automatic NLP program called BioMedLEE. The mouse
genomics dataset comes from three databases: 1) a subset of MEDLINE
citations related to tumorigenesis in MGI, 2) gene and phenotype
relations extracted from these articles using BioMedLEE, where phenotypes are encoded using the Unified Medical Language System (UMLS), and 3) a UMLS-GO
mapping database (Sarkar, Cantor et al. 2003) which map terms from the
Unified Medical Language System (UMLS) (Lindberg 1990) to GO terms.
(Email to
yves.lussier@dbmi.columbia.edu
for username and password)
Home |