Position

Currently employed by Howard Hughes Medical Institute as a bioinformatics specialist, with the BDGP and GO Consortium

Research Interests

Projects

My current projects, responsibilities and roles include the following:

Software

Current software

I am still actively using and developing the following systems and tools:

  • Obol is both an ontological reasoner and a system for deriving meaning encoded in natural language descriptions of class names, . Obol is currently being used by The Gene Ontology and the The Plant Ontology Consortium, and has so far detected hundreds of errors. See Comparative and Functional Genomics; Volume 5, Issue 6-7, 2004. Pages 509-520 and this article in The Scientist.
  • The Chado schema (co-authored by David Emmert). Chado is a modular schema for modeling biological data in relational databases, using ontologies such as The Sequence Ontology. Currently there are modules for sequence features, IDs, genetics, phylogeny, maps, bibliographic data and expression data. No publication as yet.
  • Chaos-XML and supporting software and documentation (scripts, library, XSLTs, DTD). Publication forthcoming. Chaos is an integral part of CGL
  • go-perl, go-db-perl and the GO MySQL schema, parts of the go-dev toolkit. go-dev also includes AmiGO (written by Brad Marshall and ShengQiang Shu) and OBO-Edit (written by John Day-Richter)
  • Stag is a framework for manipulating nested tag-value data in perl, and for mapping between XML and SQL Databases
  • Skam is a functional-logical replacement for Makefiles, specifically designed at automating large bioinformatics pipelines on beowulf clusters. Skam is an example of a bottom-up domain specific language
  • The BioPerl Unflattener. This is for discovering and normalising gene models from lossy representations such as GenBank. Available as part of BioPerl (1.5+ recommended).
  • Blipkit: Biomedical Logic Programming Knowledge Integration Toolkit. A collection of SWI-Prolog modules for bioinformatics and ontologies. Website forthcoming, contact me for details

Previous software projects

I no longer support or maintain the following pieces of software:

  • The GadFly genome annotation database, pipeline and browser. This has been superseded by Chado and is no longer supported. Still in use at the BDGP and JGI. Described in Genome Biology. Software available from BDGP CVS, or on request.
  • sim4wrap - a simple standalone C program and perl wrapper for speeding up sim4 analyses using blast, described in the above publication
  • The Anubis map viewer, which I developed whilst employed at the Roslin Institute. I also contributed to ArkDB. Both Anubis and ArkDB are currently still in operation in multiple sites across the world.

Availability

All the software I have written is available under an open source license, unless otherwise indicated. Software should be available from the the project sites listed above, or in some cases from the following repositories:

Please contact me if you have trouble obtaining any of these

Publications

See my publications (PDF file)

Selected Presentations

Education

BSc Hons in Artificial Intelligence and Computer Science from University of Edinburgh . Currently about to embark on a PhD by research publication, also from University of Edinburgh.

Contact

240C Building 64
Lawrence Berkeley National Lab
1 Cyclotron Road
Berkeley CA 94720   

Email  --> Name @ Domain
Name   --> cjm
Domain --> fruitfly.org