Genotator: A Workbench for Sequence Annotation and Browsing
Sequencing centers such as the Human Genome Center at LBNL are producing
an ever-increasing flood of genetic data. Annotation can greatly enhance
the biological value of these sequences. Useful annotations include
homologies to known genes, possible gene locations, gene signals such as
promoters, etc.
We are developing a workbench for automatic sequence annotation and
annotation viewing and editing. The goal is to run a series of
sequence analysis tools and display the results in such a way that the various
predictions can be compared. Researchers will then be able to examine
all of the annotations (for example, the genes predicted by various
gene-finding methods) and select the ones that look the best.
Nomi Harris
has developed Genotator (formerly known as Genotater), an annotation workbench consisting of a portion
that runs various sequence analysis programs, and a standalone annotation browser.
The Genotator back end
The Genotator back end runs several gene finders, homology searches (using
blast),
and signal searches and saves the results in .ace format. Genotator thus
automates the tedious process of running a dozen different sequence
analysis programs with a dozen different input and output formats.
Genotator can be run via command-line arguments or with the GUI shown
below.
The Genotator browser: The map display
The Genotator browser, developed in
collaboration with Gregg Helt (formerly with the
UC Berkeley Drosophila Genome Project; now with Affymetrix) uses Gregg's
bioTkperl
sequence and map display widgets (which were inspired by the bioTk widgets by David Searls).
Color-coded sequence annotations for both strands are displayed on a
canvas that can be scrolled and zoomed. Clicking on an annotation
displays additional information about it.
The annotations currently computed and displayed by Genotator are:
| Magenta |
NNPP promoter predictions |
| Red |
GenPept hits (using blastx): GenPept consists of all the GenBank coding regions translated to amino acids |
| Orange |
EST hits (using blastn) |
| Yellow |
Human repeat sequence hits (using blastn) |
| Chartreuse |
xpound exon predictions |
| Green |
GeneFinder exon predictions (using human tables) |
| Turquoise |
GRAIL exon predictions |
| Dark Blue |
Genie (UCSC/LBNL collaboration) exon predictions |
| Purple |
GenBank CDS (exons) |
| Magenta/Red/Orange |
Open reading frames (colored by frame) |
Here the Genotator browser is shown displaying the annotations
on HUMTFPB (a human sequence obtained from GenBank). The annotations
were generated automatically by Genotator. (Splice site predictions and start/stop codons can
also be displayed.)
The user has clicked on one of the red GenPept blast hits. The browser put a black frame
around the hit and printed information about the hit in the box labeled "Annotation".
The Genotator browser: The sequence display
The Genotator browser can display the
actual DNA sequence (or its complement) in a separate window; this
is shown below. Interaction
between the map and sequence displays is bidirectional. When a region is selected
in the map display, it is automatically highlighted in the sequence
display in the appropriate color. Here, for example, the selected GenPept
hit is highlighted in red in the sequence display. When a region is selected in the sequence display, it is boxed
in the map display.
Adding personal annotations
The Genotator browser allows users to add new annotations to either
the map or the sequence display. These personal annotations are saved
along with the precomputed annotations.
Primer selection
In order to help the user design primers for a region of interest,
Genotator can call Primer3, a primer selection program
developed at the Whitehead Institute (Rozen and Skaletsky, 1996).
Genotator users can select a sequence region,
select "Design Primers" from the menu, and change any of the default
Primer3 options if desired.
Once the user is satisfied with the option settings, the best forward and
reverse primers are printed to the terminal (so that they can be cut and
pasted into a primer order form) and are also indicated in the sequence display:
Searching for patterns
Another feature lets users look for sequence patterns (such as
restriction sites) or regular expressions in a sequence.
For example, suppose you wanted to find all instances of
an A followed by either or a C or a G followed by one or more Ts followed
by an A. The Unix-style regular expression for that pattern is "A[CG]T+A".
Genotator will locate and highlight all subsequences that match the specified pattern.
Viewing GenBank records for sequences that were hit
A recently-added feature lets you bring up the GenBank records for
subject sequences (sequences to which a homology was found). Genotator
interacts directly with your Netscape browser in
order to pull up the GenBank records (if it can't find Netscape on your
system, it simply creates a text box).
Obtaining Genotator
Genotator runs on Unix workstations; it has been tested on Suns (running Solaris 1 or 2), SGIs, and DEC Alphas.
The browser is straightforward to install offsite; the back end is more complex
because it expects to find the various sequence analysis tools (BLAST, gene finders, etc.), some of which are not yet available offsite. The back end is set up so that it can run with any subset of the analysis tools it is expecting,
but if you're missing most of them, the results won't be very interesting.
For information on obtaining Genotator, please contact Nomi Harris.
A paper about Genotator appeared in
Genome Research in 1997:
Harris, N.L. (1997). Genotator: A workbench for sequence annotation.
Genome Research 7(7):754-762.
Genotator and this Web page were written by Nomi Harris (nlharris@lbl.gov)