[ <Architecture> ]

Architecture

Blip is a data integration toolkit. It wraps a variety of file formats and data resources (called bioresources), providing a unified query layer

Diagram

This figure shows how a general purpose application (in this case Amigo-NG, a serval web application) queries data from a variety of sources

TODO: imagemap this png

TODO: fix diagram - prolog xml transform now used in place of XSL for SBML data

The core of Amigo-NG is a generic ontology class and instance browser. Any biological model and data can be represented as classes and instances, so Amigo-NG can act as a generic data browser. The architecture also allows for various plugins to create user-views over specific data types, with the help of specific blip db modules.

Bioresources

At the bottom of the diagram are the bioresources. For AmiGO these will typically be files containing class data and ontology instance data (in OBO or OWL format), although different AmiGO plugins extend the basic AmiGO class-and-instance layer views, which may require different input files

The prolog database

bioresources are typically loaded into the in-memory prolog database using a variety of mechanisms. The prolog database is partitioned into seperate modules; class and instance data live in either the rdf_db module or the ontol_db db module, organismal taxonomy into taxon_db db, etc. Each of these db modules has a set of extensional and intensional predicates (ie the actual data predicates and view predicates). By partitioning into modules we avoid predicate clashes. Predicates can either be imported into the user space or refered to by prefixing the module name.

Parsers

Some file formats can be read in directly; for example. the NH and NHX phylogenetic tree format can be parsed using the parser_nhx module. For OWL files, SWI-Prolog comes with rdf_db and owl modules (themselves layered off of the SWI sgml module, which cab parse XML).

The user or application programmer does not need to worry about using parsers directly - this is all handled by the io module.

Prolog fact files

Many bio file formats are ad-hoc text formats. Whilst prolog is an ideal language for implementing parsers, we have chosen not to reinvent the wheel here - we use existing parsers (BioPerl, go-perl and XSLT stylesheets) to convert native file formats into prolog fact files which can then be loaded directly into the prolog in-memory database. This approach is not ideal, as many existing perl parsers are slow.

The user or application programmer does not need to worry about the details of converting bio formats to fact files - this is all handled by the io module.

XML mapping modules

Many data resources exist in XML format. SWI-Prolog comes with an XML parser, and blip comes with which is a prolog specification language for mapping XML files to data predicates

The specifications themselves live in xmlmap modules - for example, seqfeature_xmlmap_chaos which is a mapping between Chaos-XML files and the data predicates defined in the seqfeature_db module. Currently the mapping is one-way, but it will soon be reversible

Bridge modules

Bridge modules can be used where two db modules cover similar domains. For example, the SWI-Prolog owl.pl module and the blip db modules have similar functionality. Applications such as AmiGO-NG use the ontol_db interface. AmiGO-NG can be used seamlessly over OWL ontologies using ontol_bridge_to_owl. Bridge modules can also be used to provide abstract relational views over ontology instance data.

SQL Databases

Prolog is both a database and an application language. Sometimes it may be desirable to swap in an existing relational database in place of the in-memory prolog database. This can be done seamlessly by using the sqldb bridges, for example ontol_bridge_from_gosql, which maps tables in the GO SQL database (see amigo) to the data predicates defined in ontol_db

Applications

Applications can integrate various resources using the io module, and then query them using the predicates defined in any particular db module. Utility modules such as bioseq provide useful operations on the different kinds of data

See applications for some blip apps