Genotator: A Workbench for Sequence Annotation and Browsing

Sequencing centers such as the Human Genome Center at LBNL are producing an ever-increasing flood of genetic data. Annotation can greatly enhance the biological value of these sequences. Useful annotations include homologies to known genes, possible gene locations, gene signals such as promoters, etc. We are developing a workbench for automatic sequence annotation and annotation viewing and editing. The goal is to run a series of sequence analysis tools and display the results in such a way that the various predictions can be compared. Researchers will then be able to examine all of the annotations (for example, the genes predicted by various gene-finding methods) and select the ones that look the best.

Nomi Harris has developed Genotator (formerly known as Genotater), an annotation workbench consisting of a portion that runs various sequence analysis programs, and a standalone annotation browser.

The Genotator back end

The Genotator back end runs several gene finders, homology searches (using blast), and signal searches and saves the results in .ace format. Genotator thus automates the tedious process of running a dozen different sequence analysis programs with a dozen different input and output formats. Genotator can be run via command-line arguments or with the GUI shown below.

The Genotator browser: The map display

The Genotator browser, developed in collaboration with Gregg Helt (formerly with the UC Berkeley Drosophila Genome Project; now with Affymetrix) uses Gregg's bioTkperl sequence and map display widgets (which were inspired by the bioTk widgets by David Searls). Color-coded sequence annotations for both strands are displayed on a canvas that can be scrolled and zoomed. Clicking on an annotation displays additional information about it.

The annotations currently computed and displayed by Genotator are:

Magenta NNPP promoter predictions
Red GenPept hits (using blastx): GenPept consists of all the GenBank coding regions translated to amino acids
Orange EST hits (using blastn)
Yellow Human repeat sequence hits (using blastn)
Chartreuse xpound exon predictions
Green GeneFinder exon predictions (using human tables)
Turquoise GRAIL exon predictions
Dark Blue Genie (UCSC/LBNL collaboration) exon predictions
Purple GenBank CDS (exons)
Magenta/Red/Orange Open reading frames (colored by frame)

Here the Genotator browser is shown displaying the annotations on HUMTFPB (a human sequence obtained from GenBank). The annotations were generated automatically by Genotator. (Splice site predictions and start/stop codons can also be displayed.)


The user has clicked on one of the red GenPept blast hits. The browser put a black frame around the hit and printed information about the hit in the box labeled "Annotation".

The Genotator browser: The sequence display

The Genotator browser can display the actual DNA sequence (or its complement) in a separate window; this is shown below. Interaction between the map and sequence displays is bidirectional. When a region is selected in the map display, it is automatically highlighted in the sequence display in the appropriate color. Here, for example, the selected GenPept hit is highlighted in red in the sequence display. When a region is selected in the sequence display, it is boxed in the map display.

Adding personal annotations

The Genotator browser allows users to add new annotations to either the map or the sequence display. These personal annotations are saved along with the precomputed annotations.

Primer selection

In order to help the user design primers for a region of interest, Genotator can call Primer3, a primer selection program developed at the Whitehead Institute (Rozen and Skaletsky, 1996). Genotator users can select a sequence region, select "Design Primers" from the menu, and change any of the default Primer3 options if desired. Once the user is satisfied with the option settings, the best forward and reverse primers are printed to the terminal (so that they can be cut and pasted into a primer order form) and are also indicated in the sequence display:

Searching for patterns

Another feature lets users look for sequence patterns (such as restriction sites) or regular expressions in a sequence. For example, suppose you wanted to find all instances of an A followed by either or a C or a G followed by one or more Ts followed by an A. The Unix-style regular expression for that pattern is "A[CG]T+A". Genotator will locate and highlight all subsequences that match the specified pattern.

Viewing GenBank records for sequences that were hit

A recently-added feature lets you bring up the GenBank records for subject sequences (sequences to which a homology was found). Genotator interacts directly with your Netscape browser in order to pull up the GenBank records (if it can't find Netscape on your system, it simply creates a text box).

Obtaining Genotator

Genotator runs on Unix workstations; it has been tested on Suns (running Solaris 1 or 2), SGIs, and DEC Alphas. The browser is straightforward to install offsite; the back end is more complex because it expects to find the various sequence analysis tools (BLAST, gene finders, etc.), some of which are not yet available offsite. The back end is set up so that it can run with any subset of the analysis tools it is expecting, but if you're missing most of them, the results won't be very interesting.

For information on obtaining Genotator, please contact Nomi Harris.

Bibliography

A paper about Genotator appeared in Genome Research in 1997:
Harris, N.L. (1997). Genotator: A workbench for sequence annotation. Genome Research 7(7):754-762.



Genotator and this Web page were written by Nomi Harris (nlharris@lbl.gov)