A Structured Interface to the Object-Oriented Genomics Unified Schema for XML-Formatted Data

Authors: Clark, Terry1; Jurek, Josef2; Kettler, Gregory2; Preuss, Daphne2

Source: Applied Bioinformatics, Volume 4, Number 1, 2005 , pp. 13-24(12)

Publisher: Adis International

Key:
Free Content - Free Content
New Content - New Content
Subscribed Content - Subscribed Content
Free Trial Content - Free Trial Content

Abstract:

Data management systems are fast becoming required components in many biology laboratories as the role of computer-based information grows. Although the need for data management systems is on the rise, their inherent complexities can deter the full and routine use of their computational capabilities. The significant undertaking to implement a capable production system can be reduced in part by adapting an established data management system. In such a way, we are leveraging the Genomics Unified Schema (GUS) developed at the Computational Biology and Informatics Laboratory at the University of Pennsylvania as a foundation for managing and analysing DNA sequence data in centromere research projects around Arabidopsis thaliana and related species. Because GUS provides a core schema that includes support for genome sequences, mRNA and its expression, and annotated chromosomes, it is ideal for synthesising a variety of parameters to analyse these repetitive and highly dynamic portions of the genome. Despite this, production-strength data management frameworks are complex, requiring dedicated efforts to adapt and maintain. The work reported in this article addresses one component of such an effort, namely the pivotal task of marshalling data from various sources into GUS. In order to harness GUS for our project, and motivated by efficiency needs, we developed a structured framework for transferring data into GUS from outside sources. This technology is embodied in a GUS object-layer processor, XMLGUS. XMLGUS facilitates incorporating data into GUS by (i) formulating an XML interface that includes relational database key constraint definitions, (ii) regularising traversal through that XML, (iii) realising automatic processing of the XML with database key constraints and (iv) allowing for special processing of input data within the framework for automated processing. The application of XMLGUS to production pipeline processing for a sequencing project and inputting the Arabidopsis genome into GUS is discussed. XMLGUS is available from the Flora website (http://flora.ittc.ku.edu/).

Keywords: Bioinformatics; Computers; Genomics

Document Type: Research article

Affiliations: 1: 1 Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, Kansas, USA 2: 2 Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, USA

The full text electronic article is available for purchase. You will be able to download the full text electronic article after payment.

$57.95 plus tax      Refund Policy

 

OR

Back to top

Key:
Free Content - Free Content
New Content - New Content
Subscribed Content - Subscribed Content
Free Trial Content - Free Trial Content
Share this item with others: These icons link to social bookmarking sites where readers can share and discover new web pages.
Page Help Click here for Page Help
Shopping cart
Tools
Sign in






Need to register?
Sign up here
Text size: A | A | A | A