Genome Annotation Assessment Project - GASP1

Community Wide Experiment to Assess Gene Prediction on Long Eukaryotic Genomic Sequences: The Adh region (2.9 Mbases) in Drosophila melanogaster

Martin Reese, Nomi Harris, George Hartzell, Uwe Ohler and Suzanna Lewis

Drosophila Genome Center
Department of Molecular and Cell Biology
539 Life Sciences Addition
University of California, Berkeley
Berkeley, CA 94720-3200

The experiment has been renamed "Genome Annotation Assessment Project" (GASP1).

The goal of this experiment is to obtain an in-depth and objective assessment of the current state of the art in gene and functional site predictions in genomic DNA. To this end, participants will predict as much as possible about a sample genomic region that has been studied intensively in the past. All participants will be provided with datasets that can be used to help make predictions or to train computational methods. There will be no winners or losers. We are interested in seeing what level of genome annotation is achievable when the community works together. Results of the experiment will be made available through this web site after the ISMB '99 meeting.

The sample sequence that we are providing for the community to annotate is 2.9Mb of genomic DNA recently completed by the Berkeley Drosophila Genome Center. This sequence region has been studied and annotated extensively by Drosophila researchers. The analysis of this region has been submitted for publication (M. Ashburner, S. Misra, J. Roote, S. Lewis, R. Blazej, T. Davis, C. Doyle, R. Galle, R. George, N. Harris, D. Harvey, L. Hong, K. Houston, R. Hoskins, C. Martin, A. Moshrefi, M. Palazzolo, A. Spradling, G. Tsang, K. Wan, K. Whitelaw, B. Kimmel, S. Celniker and G. M. Rubin). We will compare the annotations made by participants in this experiment with the annotations in the Ashburner et al. paper.

About the annotation experiment
Data sets
    Sequence data, including sample 2.9Mbase Drosophila melanogaster sequence
    Standard data sets, released 7/30/99 and used for evaluating participants' submissions for the ISMB '99 annotation experiment.
    Transposon data, containing the 17 transposable elements in the Adh region (GFF format)
    12 groups participated in the annotation experiment.

    Annotations (All.gff) from all 12 submitting groups plus Standard data set annotations (GFF format). Annotations included coding sequences, splice sites, start codon, stop codon, untranslated exons, promoter elements including the transcription start site, repeat elements, homology assignments, EST/cDNA alignments and gene function associations.

