Genotator User Manual

Nomi L. Harris

Introduction: What is Genotator?

Genotator is a workbench for automated sequence annotation and annotation browsing. It consists of two main portions, a back end and a browser. The back end runs a series of sequence analysis tools on a DNA sequence, handling the various input and output formats required by the tools. Genotator currently runs five different gene finding programs, three homology searches, and searches for promoters, splice sites, and ORFs.

The results of the analyses run by Genotator can be viewed with the interactive graphical browser. The browser displays color-coded sequence annotations on a canvas that can be scrolled and zoomed, allowing the annotated sequence to be explored at multiple levels of detail. The user can view the actual DNA sequence in a separate window; when a region is selected in the map display, it is automatically highlighted in the sequence display, and vice-versa. By displaying the output of all of the sequence analyses, Genotator provides an intuitive way to identify the significant regions (for example, probable exons) in a sequence. Users can interactively add personal annotations to label regions of interest. Additional capabilities of Genotator include primer design and pattern searching. A new feature in version 1.51 of the browser lets you retrieve the GenBank record for a sequence that has significant homologies to your sequence.

Genotator runs on UNIX workstations. The back end is written in perl and Tkperl, and calls out to various sequence analysis programs written here and elsewhere. The front end, which is also written in perl and Tkperl, uses Gregg Helt's bioTkperl widgets.

Running the Genotator back end

The Genotator back end runs a series of analyses on a sequence file and saves the results for later browsing. Out of the many available sequence analysis tools, I chose a reasonable subset to integrate into Genotator. The analysis programs called by Genotator fall into three main categories: gene finders (Genie (Kulp et al., 1997), GRAIL (Xu et. al, 1994), GeneFinder (Green, 1994), xpound (Thomas and Skolnick, 1994), and GENSCAN (Burge and Karlin, 1997)); database homology searches (BLASTN (Altschul et. al, 1990) on dbEST and database of human or drosophila repeat sequences; BLASTX on GenPept (Benson et. al, 1993)); and sequence feature predictors (start/stop codons, open reading frames (ORFs), promoters (Reese and Eeckman, 1994), splice sites (Reese et al., 1997), and tRNA genes (Lowe and Eddy, 1997)). The promoter and splice site predictors and the Genie gene finder were developed by Martin Reese, a member of my group at LBNL. Most of the other programs are freely available (see need.html for information on where to obtain them). ).

The Genotator back end can be invoked via the graphical user interface (GUI) or with command-line options. To invoke Genotator via the GUI, type

        ~nomi/genotator/genotator
(for local LBNL users; offsite users will use a different directory).
The Genotator GUI looks like this:

Click the file selection box (here labeled "humtfpb" because the user has already selected humtfpb as the sequence to be annotated) to bring up a file selection menu, and use it to select the sequence file you wish to annotate. Acceptable formats for the sequence file are:

Here is an example of a sequence in FASTA format:
>gb|J02846|HUMTFPB Human tissue factor gene, complete cds.
GAATTCTCCCAGAGGCAAACTGCCAGATGTGAGGCTGCTCTTCCTCAGTCACTATCTCTG
GTCGTACCGGGCGATGCCTGAGCCAACTGACCCTCAGACCTGTGAGCCGAGCCGGTCACA
CCGTGGCTGACACCGGCATTCCCACCGCCTTTCTCCTGTGCGACCCGCTAAGGGCCCCGC
[etc.]

Other options that can be configured via the Genotator GUI:

If you wish to run Genotator on a bunch of sequence files, it may be easier to do with command-line options. Usually, you will simply type
        ~nomi/genotator/genotator -batch seq1 seq2 seq3 ...
where seq1, etc. are the names of plain or FASTA-format sequence files, and -batch tells Genotator not to bring up the GUI. Invoking the Genotator back end with the -h option causes it to print out a list of legal command-line options:
Usage: genotator [seqfile1 [seqfile2 ...]]
     [-human or -drosophila] [-none] [-nomail] [-exit] [-d(ebug)]
     [-noblast] [-nomask] [-dir output_dir] [-ace] [-exon] [-homol] [-all] [-batch]
     [-grail] [-genefinder] [-genie] [-genscan] [-xpound] [-genemark]
     [-genpept] [-est] [-repeats]
     [-promoters] [-splice] [-trnascan] [-orf]

[-h(elp)]:      print this help message
[seqfile ...]:  name of sequence file in plain, FASTA, or GenBank format
                (You may specify multiple sequence files.)
[-human or -drosophila]:  which organism your sequence is from (human is default).
[-none]:        start with no analysis boxes checked
[-nomail]:      don't send email upon completion  (default is to send email)
[-exit]:        exit upon completion  (default is not to exit)
[-d(ebug)]:     debug mode (for developers)--print what Genotator would do, but don't really do it.
[-noblast]:     try to reuse old BLAST output, but redo blast postprocessing
[-nomask]:      don't mask out repeats before BLASTing dbEST and GenPept
[-dir output_dir]:  store results in (subdirectory of) output_dir
[-exon]:        run genefinders only (in batch mode)
[-all]:         run all analyses in batch mode
[-batch]:       run some analyses in batch mode; analyses to run will be specified by other arguments

The remaining arguments are the analyses functions that can be specified in conjunction
with the -batch option:
[-grail] [-genefinder] [-genie] [-genscan] [-xpound] [-genemark]:  Run that gene finder
[-repeats]:     blastn against appropriate organism's database of repeats
[-est]:         blastn against EST database
[-genpept]:     blastx against GenPept
[-promoters]:   find promoters
[-splice]:      find splice sites (as well as start/stop codons)
                (When the browser is invoked, these are not displayed automatically,
                but they can be turned on from the Display menu.)
[-trnascan]:    look for tRNA genes
                (The results aren't shown in the browser, since there virtually never are any.)

Invoking the Genotator browser

After a sequence has been run through Genotator, the Genotator browser provides an interactive graphical view of the annotations. The main display in the browser shows a horizontal axis representing the sequence, with forward-strand annotations displayed above the axis and reverse-strand annotations below the axis. Each type of annotation (for example, GRAIL exons) is displayed in its own row, in its own color.

The Genotator browser can be invoked with the name of an annotated sequence file as an argument, e.g.:

    ~nomi/genotator/genotator-browser humtfpb
If it is invoked with no arguments, a list of annotated sequences is displayed, with the sequences annotated by the invoking user listed first. The other sequence directories are collapsed and indicated by ...:

To see the sequences in a collapsed directory, double-click on the directory name (e.g. "liepe...") and the sequences (or subdirectories) in that directory will appear in the list:

If there are a lot of sequence names in the list, you can use the Find button to help you search for the one you want. When you find the sequence name you are looking for, double-click it (or single-click and hit Select). The selection list will disappear, and the browser will load the annotations for the selected sequence.

Changing axis numbering

Two new (version 1.62) command-line arguments let you control the axis numbering. If you don't want the axis to be numbered at all, invoke the browser with the the -noaxis option, e.g.
    ~nomi/genotator/genotator-browser -noaxis humtfpb
If you want the axis numbering to start at some number other than 0, you can specify the start position with the -axis option, e.g.:
    ~nomi/genotator/genotator-browser -axis 35000 humtfpb
Be sure to specify the axis start position in bases, even though the axis is marked in kilobases (kb). Also note that you may have to scroll slightly to the left (using the scrollbar at the bottom of the map display) in order to see the whole number at the left edge of the axis.

Map display

Genotator's main display is called the map display. The map display shows color-coded sequence annotations for both strands. The display can be zoomed and scrolled to examine interesting regions in more detail. To zoom, you can drag the zoom bar with your mouse, or position the cursor next to the zoom bar and click to zoom in gradually. To scroll, drag the scroll bar that's under the map display.

In the figure below, the Genotator browser is shown displaying the annotations on HUMTFPB, a human tissue factor gene sequence obtained from GenBank.

Each colored rectangle represents a sequence region that has been annotated. The type of annotation is identified by the color of the rectangle and also the row in which it appears. The row labels on the left (e.g. "GenPept hits") can be clicked for more information about that row. Clicking on an annotation rectangle puts a black box around the selected rectangle and displays additional information about that particular annotation in the text window at the top of the browser. This includes the start and end positions of the annotation, possibly a score, and other information. For example, if a BLAST hit is clicked, the text window might say, "BLASTX GenPept hit from 864 to 1112 with sequence gp|K01228|HUMCG1PA1_1 (33% identity)". This concise description identifies the database sequence that was hit (gp|K01228|HUMCG1PA1_1 is its GenPept ID), the region that was found to be similar to this database sequence (bases 864 to 1112), and the percentage sequence identity for the hit (33%).

Viewing BLAST hits in more detail

BLAST hits can be double-clicked to view them in more detail. (Note that because of Tkperl's pickiness, you have to double-click pretty quickly and carefully. It may be easier to get the cursor centered in an annotation rectangle if you zoom in first.) For BLASTN hits (against nucleotide sequences), the complete alignment pops up in a separate window (which can be saved or printed):

BLASTX hits against GenPept can be viewed in Blixem, a BLAST hit viewer from the Sanger Centre. Blixem can be a useful tool for examining BLAST hits in more detail, but its user interface is kind of confusing.

When it comes up, Blixem shows black lines to represent hits in the region near where you clicked. The vertical position of the lines represents their percent identity. A blue box shows the region that is expanded below to show the actual hit alignments. You can move the blue box with your middle mouse button. Since BLASTX compares your DNA sequence with an amino acid database, the hits are shown in all three frames. The exact and similar matches are highlighted in color.

One of the least obvious things about Blixem is how to quit. Give up? If you click your right mouse button on some empty gray area in the Blixem display, it will pop up a menu, in which one of the choices is Quit.

Theoretically, Blixem should be able to pull up the GenBank records for subject sequences. Unfortunately, I haven't been able to get that to work; it seems to be configured specifically for the Sanger Centre. However, I recently enabled Genotator to pull up GenBank records for subject sequences; this function is available through the Display menu, and is discussed in that section.

Genotator browser functions: File menu

Genotator browser functions: Display menu

Sequence display

The Genotator browser can display the actual DNA sequence (or its complement) in a separate window; this is shown in the next figure. When a user selects an annotation in the map display, the corresponding region is highlighted in the appropriate color in the sequence display. Here, for example, the selected GenPept hit is highlighted in red in the sequence display.

One thing to keep in mind (I often forget this, so you might, too) is that if the forward strand is displayed in the sequence window, then clicking on annotations in the forward strand will highlight them in the sequence window, but clicking on annotations in the reverse strand will not. (If you are currently displaying the complement, of course, only annotations in the reverse strand will be highlighted.)

Interaction between the map and sequence displays is bidirectional: when a region is selected in the sequence display, it is boxed in the map display. Please be aware that when mousing out a region in the sequence display, if you happen to release the mouse when the cursor is midway between two rows, the boxed region in the map display will erroneously start at zero. This is a known Tkperl bug and is not fixable within Genotator.

Functions on the sequence display

Adding personal annotations

The Genotator browser allows users to add new annotations to either the map or the sequence display. These personal annotations are saved along with the precomputed annotations. The figure below shows the interface for dealing with personal annotations. In order to add a personal annotation to the map or sequence display, the user selects some region of the sequence, types the annotation text in the text box, and then clicks "Add Annotation to Map" or "Add Annotation to Sequence". The color of each personal annotation can be specified independently. Clicking on the button that says "forestgreen" brings up a menu of color choices. (Changing the annotation color only affects annotations you are about to add; it doesn't change those you've already added.) You may wish to use different colors to represent different types of personal annotations.

Annotations that refer to a sizable portion of the sequence are generally added to the map; those referring to a small region (such as a primer) are more appropriately added to the sequence. All personal annotations are saved in the database along with the automatically generated annotations. Examples of personal annotations can be seen in the map display figure ("Personal annotation" and "Reverse strand annotation") and the sequence display figure ("personal annotation in sequence"). To delete personal annotations in the map display, mouse-select a box around the annotation you wish to delete (this may be easier to do if you zoom in first). The annotation will disappear from the display; however, it is not permanently gone until you hit Save Annotations. (If you quit the browser without saving, you will be asked if you want to save your personal annotations.) There is no "undo" function for deletion, but if you mistakenly delete an annotation, you can use "Reload" to reload all the personal annotations that have been saved in the database (of course, any new ones you have not yet saved will be lost).

Because personal annotations in the sequence display may overlap, the procedure for deleting them is slightly different. If you select Delete Sequence Annotation, a window will pop up showing the positions and labels on the sequence annotations. If you single-click on one of the annotations, that annotation will be highlighted in cyan on the sequence display. Double-clicking, or pressing the Delete button, will delete that annotation. The sequence display window will vanish and reappear without the deleted annotation.

Primer selection

In order to help the user design primers for a region of interest, Genotator can call Primer3, a primer selection program developed at the Whitehead Institute. First select a sequence region (this can be done by mousing out a region in the map or sequence display, or by clicking on an annotation in the map display.) Then select "Design Primers" from the menu, and change any of the default Primer3 options if desired. Once the user is satisfied with the option settings, the best forward and reverse primers are printed to the terminal (so that they can be cut and pasted into a primer order form) and are also indicated in the sequence display, as shown below.

Changing axis numbering

Two command-line arguments let you control the axis numbering in the Genotator browser. If you don't want the axis to be numbered at all, invoke the browser with the the -noaxis option, e.g.
    /home/genotator/genotator-browser -noaxis humtfpb
If you want the axis numbering to start at some number other than 0, you can specify the start position with the -axis option, e.g.:
    /home/genotator/genotator-browser -axis 35000 humtfpb
Be sure to specify the axis start position in bases, even though the axis is marked in kilobases (kb). Also note that you may have to scroll slightly to the left (using the scrollbar at the bottom of the map display) in order to see the whole number at the left edge of the axis.

Which gene finders are displayed?

Genotator knows how to interact with six different gene finding programs (Grail, GeneFinder, Genie, GENSCAN, xpound, and GeneMark). The current default behavior is for Genotator to use the first four of those; however, this is configurable. If you want to change which gene finders the back end calls, you can edit the back end program (genotator) and change the line that says
@genefinders = ("GRAIL grail", "GeneFinder genefinder", "Genie genie", "GENSCAN genscan");
(The first thing in each pair is the "pretty" name by which the gene finder will be described to the user; the second is the name of the function that calls that gene finder.) For example, if you wanted Genotator to run GeneMark instead of Genie, you could change the line to say:
@genefinders = ("GRAIL grail", "GeneFinder genefinder", "GeneMark genemark", "GENSCAN genscan");
When the browser is invoked on an annotated sequence, it dynamically creates a row for any gene finder that was run--if it can't find output from a given gene finder, it doesn't create a row for it. The browser has a default order for displaying the gene finder results (Genie, GRAIL, GeneFinder, GENSCAN, xpound, GeneMark). To change the order, you can edit the code somewhere around line 466 of genotator-browser.

Genotator installation requirements

Genotator relies on a number of sequence analysis programs and other utilities. need.html has instructions on obtaining all of the programs Genotator needs. Some are included with the Genotator distribution; for others, you will have to contact the relevant authors.

Copyright (c) 1996-1998 The Regents of the University of California.
Genotator was written by Nomi Harris (nlharris@lbl.gov).

This document was updated for Genotator version 1.21 and Genotator browser version 1.65.
Last modified: May 3 2000