Sequence Data for the Genome Annotation Experiment

Sequence to be annotated

The sequence to be annotated is a finished Drosophila melanogaster genomic DNA sequence contig of high quality. The contig is on chromosome 3 and is 2.9 Mbases in length. It is generally referred to as the Adh region because it contains the "Adh" gene. A detailed analysis by our center found over 200 genes, some corresponding to previously known Drosophila genes and others novel.

Adh sequence in FASTA format: FTP:Adh.fa
gzipped: Adh.fa.gz

Additional Drosophila melanogaster sequence sets

The following data sets are provided to facilitate tool training on specific Drosophila melanogaster sequences.
1. Curated Drosophila nuclear DNA "coding sequences" (CDS)
This data set is provided by Takis Benos (EBI), Leyla Bayraktaroglu (Harvard) and Michael Ashburner (EBI & Cambridge) with help from Aubrey de Grey (Cambridge), Joe Chillemi (Harvard) and Martin Reese (LBNL). Additional information on this data set: README.v2.7. This dataset is under heavy current update and the version number might change during the experiment. Version 1.5 of this data set was used to train the EBI Genefinder.

FTP:nuclear_cds_set.embl.v2.8.5.Z

2. Curated Drosophila genomic DNA data (416 gene sequences)
This data set is provided by Martin Reese (LBNL) with help from Uwe Ohler (University of Erlangen), David Kulp (UCSC) and Andrew Gentles (Stanford). Additional information on this data set: README.

3. Drosophila 5' and 3' splice sites (from 275 unrelated multi-exon genes)
This data set is provided by Martin Reese (LBNL). Additional information on this data set: README.

4. Drosophila start codon sites (from 275 unrelated multi exon genes)
This data set is provided by Martin Reese (LBNL). Additional information on this data set: README.

5. Drosophila promoter sites (256 unrelated regions around the transcription start site)
This data set is provided by Uwe Ohler (University of Erlangen) and Martin Reese (LBNL). Additional information on this data set: README.

6. Drosophila repeat sequences
This data set is provided by Takis Benos (EBI), Leyla Bayraktaroglu (Harvard) and Michael Ashburner (EBI & Cambridge) with help from Aubrey de Grey (Cambridge), Joe Chillemi (Harvard) and Guochun Liao (UCB). Additional information on this data set: README.v2.7.

FTP:repeat_sequence_set.embl.v2.1.Z

7. Transposon site sequences
This data set is provided by Takis Benos (EBI), Leyla Bayraktaroglu (Harvard) and Michael Ashburner (EBI & Cambridge) with help from Aubrey de Grey (Cambridge), Joe Chillemi (Harvard) and Guochun Liao (UCB). Additional information on this data set: README.v2.7.

FTP:transposon_sequence_set.embl.v3.7.Z

8. Drosophila cDNA sequences
This data set was collected by Erwin Friese (UCB). Additional information on this data set: README.

FTP:na_gb.dros.cDNA.unique.fa.gz

9. Drosophila EST sequences
This data set is provided by the EST sequencing project at BDGP. Additional information on this data set: EST group.

FTP:na_EST.dros.fa.Z


[email protected]
Last modified: Sun Dec 26 15:07:36 PST 1999