We recently reported the analysis of >80,000 ESTs
to select a set of 5,849 non-redundant cDNA clones
(Rubin et al., Science 287:
2222). This is not a complete set for the Drosophila genome, which is
reported to have 13,601 genes,
but more cDNA
clones will be added to later versions of the DGC.
To facilitate the widespread distribution of these cDNA clones to the
community, we will donate copies of the collection to 50 laboratories selected by the National Drosophila Board. The 4534 clones in the pOT vector will be distributed before the end of June. The remaining clones in the pBluescript vector will follow in a few weeks. These clones will also be made available to commercial distributors. As soon as these laboratories and commercial distributors are determined, we will post their contact information on this site. Individual laboratories who wish to obtain a copy should read the letter from the Fly Board.
Commercial
distributors who wish to obtain a copy should contact cDNA@fruitfly.berkeley.edu.
We have also updated the table of Predicted
genes grouped by sequence similarity. These sets of similar genes, created by BLASTing all of the fly transcript sequences against each other, can be browsed by molecular function.
The BDGP and Celera Genomics report the sequencing and annotation of the euchromatic genome of Drosophila melanogaster. The results appear in the March 24 issue of the journal Science.
The annotation was done in a unique collaboration between Celera, BDGP, and other members of the scientific community. The
results of the annotation are stored in
GadFly, the FlyBase Genome Annotation Database of Drosophila.
This new database and chromosome arm sequence can be queried by
gene name, cytological region, molecular function, or protein domain. The annotated genome can also be browsed graphically with our new Java display tool
GeneScene.
The genomic sequence and annotations are preliminary.
The BDGP has a plan to systematically finish the sequence to high quality and
refine and improve the annotations with FlyBase over the next year.
At this time, ~92% of the genome is in contigs larger than 30kb, and ~78% in contigs greater than 100kb; most gaps are small (3kb or less) and due to genomic repeats, such as transposons. These contigs are ordered and oriented with respect to the genome: >95% of the euchromatic sequence is in 14 large scaffolds, and
is freely available on our sequence download page. The sequence of all the scaffolds (including heterochromatin), predicted transcripts, and predicted proteins are also available on that page.
The exon-intron structure of annotations will often be incorrect initially, but full-length sequencing of cDNAs corresponding to these genes should provide the correct gene structures. We recommend repeating sequence similarity searching yourself using BLAST.
Because the sequence will be in flux, we ask that you record short molecular sequence tags, e.g., 30bp of unique sequence from your region of interest, rather
than recording the coordinates of a region based on absolute numbers.
Our efforts will be greatly facilitated if the public reports changes to the sequence and annotation to us using our new Update Form; we will make these comments public in the annotation reports.
We are aware that many laboratories have sequences of genes that have not
yet been submitted to the nucleotide sequence databases. We encourage you
to submit these sequences. This will have three advantages:
You will get the credit for having identified and sequenced this gene
first ! We will be able to include the identification of these genes
on the annotated sequence when it is published. If your sequence is of
a cDNA then it will help us get the correct gene structure.
The BDGP has sequenced a complete 2.9-megabase region of the
Drosophila melanogaster genome and exhaustively analyzed it. The
complete results are available on this page, as well as a
Java applet for interactive browsing of the annotations on the Adh region.
The BDGP and Celera Genomics are working together to complete the sequence
of the Drosophila melanogaster genome. Celera has produced whole-genome
shotgun data to an average depth of 10X coverage. In addition to 26.5 Mb of
completed genome sequence, the BDGP is producing a BAC-based genome
physical map and defining a tiling path of overlapping BAC and P1 clones
spanning the euchromatic portion of the genome. BDGP is generating
low-coverage (~1.5X) shotgun sequence of each BAC and P1 clone in the
tiling path that has not already been sequenced to greater coverage or
completion. This "scaffold sequence" is being used to assist in assembly
and finishing of the whole-genome shotgun data.
These new tables show work in progress and will be updated regularly.
Results of the community wide experiment to assess gene prediction on long eukaryotic genomic sequences have been published, and are available on our web site. They were presented at ISMB99.
The BDGP and
Celera Genomics have signed a Memorandum of Understanding (MOU) outlining how they will work
together on sequencing the Drosophila Genome. This collaboration
was first announced in the June 5th issue of Science. The goal of
this collaboration is to produce a high-quality, publicly available
sequence of Drosophila euchromatin by the end of this year.
This page last updated on: 11/9/04
Please send comments or questions about the web site to bdgp@fruitfly.org