BDGP Resources
A P1-based Physical Map of the Drosophila Euchromatic
Genome
Genome Research 6:414-430, 1996
William Kimmerly*†, Karen Stultz*†, Suzanna
Lewis*§, Keith Lewis*†, Veronica Lustre*†, Raul
Romero*†, Julie Benke*†, Dan Sun*†, Gail
Shirley*†, Chris Martin*†, Michael
Palazzolo*†§
*Drosophila Genome Center, †Human Genome Center of the
Lawrence Berkeley National Laboratory
§Department of Molecular and Cell Biology, University of
California, Berkeley, 94720.
William Kimmerly's current address is: Human Genetics Department,
Glaxo Wellcome Research Institute, 5 Moore Dr., Research Triangle
Park, NC 27709
Please send all correspondence to: Michael J. Palazzolo, Human
Genome Center, Lawrence Berkeley National Laboratory, Building 74,
Berkeley, CA 94720, Tel: 510-486-6932, Fax: 510-486-6816
ABSTRACT
A PCR-based Sequence tagged site (STS) content mapping strategy has
been used to generate a physical map with 90% coverage of the 120
Mb euchromatic portion of the
Drosophila genome. To
facilitate map completion, the bulk of the STS markers was chosen
in a nonrandom fashion. To ensure that all contigs were localized
in relation to each other and the genome, these contig-building
procedures were performed in conjunction with a large-scale in situ
hybridization analysis of randomly selected clones from a
Drosophila genomic library that had been generated in a P1
cloning vector. At this point, the map consists of 649 contigs with
an STS localized on average every 50 kb. This is the first whole
genome that has been mapped based on a library constructed with
large inserts in a vector that is maintained in
Escherichi
coli as a single copy plasmid.
INTRODUCTION
The polytene chromosomes of
Drosophila served as the
original physical map (Bridges 1935). They allowed technical
approaches to correlate the genetic map with physical locations on
the chromosomes. With the development of in situhybridization
procedures, the polytene chromosomes became an organizing resource
that helped catalogue information concerning cloned sequences
(Pardue et al. 1977). The polytene chromosome also served
facilitating and validating roles in the initiation and error
correction of the initial positional cloning experiments (Bender et
al. 1984).
The development of clone-based physical maps in other organisms
identified some of the limitations of the polytene chromosomes
(Coulson et al. 1986; Olson et al. 1986; Kohara et al.
1987)
. The direct access to mapped genomic regions
in the clone-based maps eliminated the laborious and iterative
procedures of chromosome walking. In addition, with the ambition
toward whole genome sequencing, the clone-based maps promise to
serve as intermediate sources of templates. To facilitate the
experiments of the
Drosophila research community and as a
prelude to large-scale genomic sequencing, we set out to construct
a physical map of the
Drosophila genome.
All those attempting physical mapping experiments face a similar
set of strategic choices. The first choice concerns the type of
cloning vector into which the fragments to be mapped are inserted.
Common vector choices for library construction include
l (Kohara, Akiyama and Isono 1987), cosmids (Coulson et
al. 1986; Olson et al. 1986), P1 (Sternberg 1990), bacterial
artificial chromosomes (BACs; Kim et al. 1992), P1 artificial
chomosomes (PACs; Ioannu et al. 1994), and yeast artificial
chromosomes (YACs; Burke, et al. 1987). The second choice involves
the experimental mechanisms by which the overlaps will be detected,
such as fingerprinting (Coulson et al. 1986; Olson et al. 1986;
Sulston et al. 1992), in situ hybridization (Ajioka et al. 1991),
or sequence-tagged sites (STS) content mapping (Olson et al. 1989).
If the STS method of overlap determination is chosen for
contig-building procedures, then the source of the STS probes must
be decided.
The physical map presented in this paper is based on a genomic
library (Smoller, Petrov and Hartl 1991) constructed in a P1 vector
system (Sternberg 1990)
.The overlaps were detected
by a PCR-based STS content-mapping strategy (Olson et al. 1989;
Green and Olson 1990) in which the STSs were derived from the ends
of the cloned P1 inserts that were selected from the library for
STS generation in a non-random fashion (Palazzolo et al.
1991).
It is important to note that this work was done as part of a larger
collaboration within the
Drosophila Genome Center (DGC).
Other laboratories in the DGC include those of Gerald Rubin, Allan
Spradling, and Daniel Hartl. It was possible to take advantage of
other work in the center to provide experimental mechanisms to
connect the contig-building experiments to both chromosomal
locations and to the genetic map of
Drosophila while
simultaneously developing tools for error identification and
correction. While the work described in this paper was being
initiated, Hartl's laboratory was generating a framework map of the
genome by using a large number of randomly selected P1 clones as a
substrate for in situ hybridization to the
Drosophila
polytene chromosomes (Hartl et al. 1994). By using cytogenetically
mapped P1 clones as sources for STS markers, all contigs were
automatically assigned to specific euchromatic genomic locations.
In addition, inconsistencies between the map based on in situ
hybridization and the results derived from STS content mapping can
be identified immediately and used to direct subsequent error
correction experiments.
To provide a more direct link between the physical map and the
genetic map the organized set of P1 clones in this physical map is
being used for additional STS content-mapping experiments using STS
markers that are also genetically mapped. Hartl's group is using
known genetic markers that have been cloned and sequenced by the
research community. They are to design PCR primers used in STS
mapping assays. In addition, Rubin and Spradling have collected a
large number fly lines, each with an independently generated
P-element transposition event as part of a large-scale gene
disruption experiment (Spradling et al. 1995). These P-elements not
only cause mutations but also carry easy-to-score dominant eye
color mutations. The sites of P-element insertion are being used as
sources of STS probes in an effort to increase the density of
markers that link the genetic and physical maps.
Taken together, these reagents promise to provide a useful set of
tools for the
Drosophila research community. The bulk of
the euchromatic genome is now represented by mapped P1 contigs that
have been localized to the genome and related to each other using
molecular and genetic methods. Finally, large-scale sequencing of
the genome is under way using these P1-based contigs as a source
for template generation.
RESULTS
The mapping project has passed through three stages: The first was
the preliminary acquisition of resources; the second phase was the
establishment of the production environment; and the third stage
was the process of map building. Each stage is described in detail
below.
Stage 1: Acquisition Of Resources
The project was initiated with the acquisition of three resources.
The first of these was a multihit P1 library that was constructed
using genomic DNA prepared from adult flies (Smoller, Petrov and
Hartl 1991). The second tool was a P1-based framework map generated
by an in situ hybridization analysis of a large number of randomly
selected P1 clones (Hartl et al. 1994). The third tool was the
implementation of a strategy that with a minimum number of mapped
STS markers rapidly moves towards complete coverage (Palazzolo et
al. 1991).
Construction Of A Bacteriophage P1 Genomic Library With
Drosophila Inserts.
A
Drosophila melanogaster genomic library was constructed
(Smoller et al. 1991) using genomic DNA isolated from a mixture of
both male and female flies from the isogenic strain
iso1
(
y; cn bw sp). Recombinant P1 clones were generated in two
separate ligation experiments using two similar P1 cloning vectors,
pNS583tet14Ad10 and pAd10sacBII (Fig. 1). A total of 3840 plates
was isolated from the ligation of insert with the former vector. A
second library, containing >18,000 clones, was generated in the
latter vector. The average insert size of clones in the two
libraries has been determined to be slightly larger than 80 kb by
analyzing plasmid DNA prepared from a random sample of clones
followed by restriction digestion and contour-clamped homogenous
electric field (CHEF) gel analysis.
Figure
1. Two types of P1 vectors. The P1 mapping library
consists of clones generated in two related P1 cloning vectors. As
described in the text, 3,840 clones were inserted into the vector
known as pNS583tet14Ad10 and 5,376 clones were from the ligation
mixture using the vector called pAd10sacBII. Both vectors contain
genes that allow resistance to the antibiotic kanamycin, a plasmid
origin of replication that provides for a single copy, a lytic
origin of replication that can be induced by IPTG to allow 20-fold
amplification of the plasmid, a
loxP site (the
cis-acting site specific recognition signal for the P1
recombinase), and a gene that allows resistance to tetracycline.
Inserts cloned in the vector pNS583tet14Ad10 have been introduced
into the
BamHI site in the gene allowing tetracycline
resistance. A sacBII cassette was introduced by Sternberg (1990) as
a means of selecting for clones with inserts. The cloned inserts in
this vector system are introduced between SP6 and T7 RNA polymerase
promoters that are positioned of the
Bacillus subtilis
genes found in the
SacBII cassette. The sequences between
the
loxP sites (containing the Ad10 sequences) are deleted
during library construction and thus are not present when the
clones are mapped and eventually sequenced.
A subset of the clones was then selected for the mapping
experiments. This subset included all the clones from the first
ligation and 5376 clones from the second ligation. These 9216
clones were arrayed in 96-well microtiter plates and provide an
estimated five- to sixfold coverage of the genome. Assuming random
cloning efficiency for each region, it is reasonable to expect that
the assignment of each of these clones to mapped contigs would
provide coverage approaching 99% of the euchromatic genome. The
remaining clones are being held in reserve and will likely be used
in map closure experiments that aim towards mapping
under-represented regions.
A Framework Genome Map Based On In Situ
Hybridization.
With the development of the
Drosophila P1 library the
Hartl laboratory embarked on the generation of a map based on the
cytogenetic localization of 2653 clones randomly selected from the
mapping library (Hartl et al. 1994). This mapping resource provided
three advantages: First, it tested the quality and randomness of
the P1 library; second, it provided rapid P1-based coverage of ~80%
of the genome to the
Drosophila research community; and
third, it established a large set of clones that could be used to
generate an STS content map in which every third clone had already
been localized to the genome by an independent experimental
approach.
The results from Hartl's laboratory suggest that the clones
analyzed by in situ hybridization can be distributed into two
different categories. The larger class contained 2317 members, each
of which hybridized strongly to a single (or occasionally two or
three) euchromatic sites. These results are compatible with the
view that the bulk of the genome is organized into unique
euchromatic regions. The presence of multiple hybridization
signals, with one of the signals of significantly greater strength,
is compatible with what might be obtained if a clone used as a
hybridization probe carries one or more dispersed repetitive
elements (a situation that should be common in a library that
represents the
Drosophila genome). The smaller class of
clones contained 336 members, each of which hybridized to the
chromocenter (the pericentric region and the Y chromosome,
typically underreplicated in salivary gland cells) and/or multiple
strongly hybridizing euchromatic sites. Clones that hybridize to
the chromocenter are most likely to represent heterochromatic
regions of the genome.
A Non-Random Strategy For STS Selection Coupled To The
Framework Map.
One advantage of using a PCR-based STS content mapping strategy for
the construction of physical maps is that the data from the ongoing
project can be used in an effective fashion to guide the project
towards completion. Specifically, computer simulations suggest that
if, as the mapping project progresses, STSs are derived exclusively
from the ends of the shrinking set of mapping clones that have not
yet been assigned to contigs, then complete coverage is achieved
through the mapping of three- to fourfold fewer STSs than if the
STSs are selected randomly (Palazzolo et al. 1991) Experiments that
coupled this nonrandom STS strategy with the framework map based on
in situ hybridization analysis promise to lead directly to a map
that provides nearly complete coverage of the euchromatic genome in
which all contigs are oriented relative to one another and to
specific locations on the polytene chromosome map.
Stage 2: Establishing a production
environment
To approach the construction of a whole genome physical map it was
important to develop a set of robust procedures that integrated the
necessary biological, automation, and computational components. The
biological requirements included developing effective procedures
for the preparation of PCR templates from the P1 clones, PCR-based
screens that employ an economical pooling strategy, establishing
criteria that assess the quality of the mapping library, and a
means to sequence the ends of the cloned P1 inserts to generate the
STS markers. The automation needs included mechanisms to acquire,
store, and retrieve rapidly the data associated with the agarose
gel analysis of the PCR-based STS assays. The computational aspects
of the project demanded that software tools be developed to manage
and organize the data involved with this iterative mapping strategy
in a logical framework, analyze the results of the mapping assays
and organize the data into an accurate representation of the
overlapping clones (or contigs), and integrate the data developed
by the other projects in the DGC.
Biological aspects of the production
environment
P1 DNA pools for PCR-based STS content
mapping.
The pooling strategy is a two-tiered one. The first level of
screening uses 96 plate pools. Each plate pool contains the pooled
P1 DNA from all 96 clones in a library plate. The second tier of
pools consists of eight 12-member row pools and twelve 8-member
column pools, a total of 20 secondary pools, per plate. To
facilitate library screens that provide complete clonal
identification, 2016 DNA pools were prepared. With this approach
the mapping of a typical STS marker with five hits in the library
requires 196 PCR reactions: 96 for the primary plate pool screen,
followed by five sets of 20 second-round row and column pools to
identify the five individual clones that share the STS marker (Fig.
2A).
Figure 2A. Example
of STS mapping data
The two images in this figure are of agarose gels stained with
ethidium bromide that are representative of the data generated in
the course of the PCR-based STS content mapping assays.
(
Top) A gel of the PCR assays used to screen the first
tier of the plate pools. There are 96 plate pools, each containing
96 clones. Each of the 96 assays are placed in an individual well
of a single triple combed gel that can accommodate > 100 lanes.
Lane 17 in all three panels of the gel is a genomic DNA marker. In
this image seven of the plate pools are positive. (
Bottom)
Second tier of the STS screens. The row-column assays are performed
only on the positive pools identified in the first round. For each
positive pool it is necessary to screen 8 row pools (containing DNA
prepared from 12 clones) and 12 column pools (containing DNA
isolated from 8 clones). The signals used to identify an individual
clone are always part of a matching pair of positive hits - one in
a given row pool and one in a given column pool. In this image,
each panel of the gel represents a row-column assay. Using the
Angel analysis program, each assay identifies a row and a
column hit lining up with the genomic marker in lane 17.
The DNA PCR template pools were generated by growing up the clones
of an individual microtiter plate in a titer tube box. Each tube
contained 0.4 ml Terrific Broth (TB) plus kanamycin. The titer tube
boxes were incubated at 37°C overnight with shaking. Each
clone was grown individually with good aeration, prior to pooling,
to avoid sib competition. Assays using templates prepared from
growth without agitation resulted in an inordinate number of
false-negative results in control experiments. The pools from the
aerated cultures were made by combining the appropriate clones. The
DNA from the large insert plasmids was obtained by following a
standard Triton-lysozyme boiling DNA preparation protocol (Smoller
et al. 1993). The amounts of purified template in each of the pools
were sufficient to allow the performance of > 15,000 STS
screens.
Pilot scale tests to assess the quality of the library and
the effectiveness of the PCR-based STS assays.
Initially the validity of the pooling and PCR mapping procedures
were tested on a small scale by building contigs that covered two
Drosophila homeo box gene clusters: the
Bithorax
complex (BX-C) and the
Antennapedia complex (Ant-C) (Fig.
2B). The initial approach to building a contig covering the BX-C
involved designing STS markers using published cDNA and exon
sequences from the regions that were available in GenBank. We used
cDNA sequences encoding the 5' and 3' exons of
abd-A
(Karch, et al. 1990), the 5'-most and 3'-most exons of
Ubx
(Kornfeld et al. 1989) and an exon from
abd-B (Celniker et
al. 1989)
to design primers for PCR. These primer
pairs were then used to screen the P1 library to identify P1 clones
containing the STS sequences. After screening the library with
these STS markers, a contig of four P1 clones was constructed that
covered the genomic region encoding
abd-A and
Ubx. However, a gap remained between this four-member
contig and a P1 containing the
abd-B gene. To close this
gap, we designed two additional STS markers to the ends of two P1
clones flanking the gap. When these STS markers were used to screen
the library an additional P1 clone was identified which contained
both STS markers. Thus the resulting minimal tiling path contig
contained six P1 clones (out of a total of 21 P1s identified by all
screens) and was defined by six STS markers. Using a similar
strategy, we next built a contig covering the ANT-C, a related
multi-gene complex located on chromosome 3R at polytene divisions
84A-B (Wakimoto et al. 1984). We designed a total of nine STS
markers to the following genes:
Antp (3 exons),
pb (2 exons),
ftz,
lab,
Scr,
and
bcd. These nine STS markers identified 21 clones. A
subset of these P1 clones formed a minimal tiling path that
represented both Hox clusters. Clones from this minimal set were
used to generated a probe used in in situ hybridization assays to
the polytene chromosomes. All clones tested in this fashion were
localized in the appropriate unique genomic regions, simultaneously
verifying the quality of the library, the template pools, and the
effectiveness of the PCR-based STS mapping assays.
These fourteen P1 clones were selected for directed genomic
sequencing. The entire
Bithorax complex (Martin et al.
1995) and > 95% of the
Antennapedia complex has now
been sequenced at the
Drosophila Genome Center. All of the
mapping and sequencing results to date are completely consistent
with the notion that the P1 clones in these contigs accurately and
faithfully represent these two extensively studied regions. This
data will be described more fully in a manuscript currently being
prepared that describes the sequencing approach that we are
using.
Figure 2B.
Example of the Ant-C contig
A diagrammatic representation of the contig developed to represent
the
Antennapedia complex. Only the minimal tiling path of
P1 clones is presented by the solid rectangles in the upper part of
the drawing. Each P1 clone is identified by its unique number
(DS#). The numbers below the shaded rectangles (Dm#) correspond to
the names of the STS sequences used to screen the library to
identify each of the P1 clones. In parentheses below each of the
named STS are the number of P1 clones identified by each STS. For
example, STS Dm0073 identified six P1 clones. The scale bar is in
kbp. The drawings at the
bottom of the figure represent
the known exonic sequences, from which most of the STS markers were
derived.
Developing end-specific STS markers.
A crucial aspect of using a strategy based on the acquisition of
STS probes from the ends of cloned P1 inserts is the ability to
elucidate the sequences of the inserts immediately adjacent to the
vector cloning sites. It is also important to note that successful
primer design is dramatically enhanced if the derived sequence is
of relatively high quality. With this in mind, we have attempted to
develop approaches that aim towards the single pass generation of
at least 350 bp at each end of the cloned insert with an accuracy
rate of ~ 98%.
Four different protocols have been developed and implemented within
DGC to generate end sequence-specific STSs. The first procedure is
based on the direct sequencing of P1 DNA purified from a 15-ml
overnight culture using an alkaline lysis methodology (Kimmerly et
al. 1994). This approach is most effective for the clones inserted
into the pAd10sacBII version of the vector in which the cloning
sites are flanked by SP6 and T7 primer binding sites. We have not
yet been able to develop effective primers for direct sequencing of
pNS583tet14Ad10 P1 clones.
The second procedure developed for end sequencing is a variation on
the first. It is based on the finding that more robust templates
can be produced after transfer of the P1 plasmid to
Escherichia
coli strain DB>0B from the NS3529 strain. Post-transfer
processing of plasmid DNA was identical to the alkaline lysis
plasmid preparation procedure mentioned above. The transfer has
been effected via several mechanisms, but the technique judged most
straightforward on a large scale requires two bacterial mating
steps: The first introduces an F factor into NS3529 containing a P1
clone of interest, and the second transfers both the P1 clone and
the F factor to DB>0B. There are two possible reasons for the
difference in template quality of P1 DNA isolated from DB>0B
versus the library host strain NS3529. First the
endA
mutation has been reported to increase the quality of DNA
sequencing templates prepared in such strains (Taylor et al. 1993).
In contrast to NS3529, DB>0B carries an
endA mutation.
Another potentially significant difference is that NS3529 carries
the
cre recombinase, an enzyme that is not present in
DB>0B. It has been noted that plasmids that contain
lox
sites (as the P1 cloning vectors do) can be difficult to isolate
from strains that express the
cre recombinase (Palazzolo
et al. 1990).
The third technique used to sequence insert ends is that of
"bubble" PCR (Riley et al. 1990; Smith 1992; Nurminsky and Hartl
1993; Hartl et al. 1994). Ligation of a double-strand linker to
fragments of restriction digested P1 DNA followed by PCR using a
primer to the vector junction region and a primer complementary to
the linker promotes the selective amplification of fragments that
are adjacent to known vector sequences. These amplified molecules
can then be sequenced in either orientation using primers that bind
to sites in the vector or primers that bind to the linker.
A fourth approach is a technology based on bacterial transposon
insertion coupled to PCR. After F'-mediated transfer of a P1 clone
to another host strain, a transposon is acquired and stably
maintained as an insert in the P1 plasmid. PCR with a primer
specific to the P1 vector sequence near the cloning site and a
inverted repeat primer can generate a product with a pool of mating
transductants as template. The smallest products typically
outcompete larger ones so that in the majority of cases an
end-specific fragment, a few hundred base pairs in size, can be
amplified from a pool of a few hundred P1 clones each of which
contain a different transposition event. Direct sequencing of PCR
products generated by either of these two schemes allows the
generation of STS markers from P1 clones that cannot be sequenced
directly.
These four approaches to end sequencing differ both in degree of
difficulty and reliability. The easiest protocol is the one based
on direct sequencing. However, this protocol only produces
sequences that can be routinely used to design successful PCR
primers about one-third of the time. In contrast, the same protocol
can be used to successfully prepare effective templates for direct
sequencing of P1 clones ~ 70% of the time if the P1 plasmid has
been transferred to the DB>0B strain. The bubble PCR approach is
attempted to obtain end sequence from the ~30% of clones that are
refractory to a direct sequencing approach. Templates that fail to
generate useful data when sequencing attempts using both direct and
ligation-mediated PCR approaches are examined using the
transposon-facilitated approach.
As mentioned above, it is noteworthy that we have been able to use
the direct sequencing protocols only on the clones in the
pAd10sacBII vector. For this reason, the bulk of the STSs in the
map was derived from this subset of the library. Bubble PCR-based
sequencing is the first choice protocol of clones in the
pNS583tet14Ad10 vector. To date, we have developed 2394 STSs from
the ends of P1 clone inserts. There are only 40 targets (40/3632
preps or 1%) that have shown to be refractory to all of the
potential methods for end sequencing described here.
Automation aspects of the production
environment.
The map generated at this point has required the performance of
~3,500 sequencing reactions, approximately synthesis of ~2,470
oligonucleotides, ~605,000 PCR reactions, and ~8,250 agarose gels.
At the beginning of this project fluorescent sequencers and
thermocyclers were available commercially. Although oligonucleotide
synthesizers were also available commercially, custom services were
also an option. We decided to use this available instrumentation to
meet our production needs. Specifically, the sequencing reactions
used to develop all the STS markers have been acquired using an ABI
373 sequencer, the PCR reactions were all performed on Perkin Elmer
9600 thermocyclers, and the oligonucleotides were obtained
commercially from Genset Corp (La Jolla, CA).
Agarose Gel Imaging Station.
Our mapping strategy entailed the analysis of thousands of agarose
gels. The major missing piece of automation was an instrument that
could acquire and store retrievable images of the agarose gel
analyses of the PCR-bases STS mapping assays. To solve this problem
the Automation group at the Lawrence Berkeley National Laboratory
(LBNL) Human Genome Center, under the direction of Joseph Jaklevic,
developed an automated gel imaging station. This instrument will be
described in detail elsewhere, but is summarized briefly here. The
image station is a computer coordinated system that includes a
cooled charge-coupled device (CCD) camera, an UV light source, and
associated instrument control and image analysis software. A
digitized image of an agarose gel can be acquired in a few seconds
by Optimas imaging software. The image files are initially stored
in a proprietary PMI format and later converted to GIF format for
analysis on a Sun workstation using the gel analysis program,
Angel (see below). An image annotation program written in
Visual Basic by Terri Fleming of the LBNL HGC Informatics (HGCI)
group is associated with image acquisition. The program provides
data fields for the entry of information that describes each image
such as plate pool versus row/column experiment, STS information,
PCR parameters of the mapping experiment, and the identity of the
individual who performed the experiment. This information is
retained as a text file that remains associated with the image file
for subsequent analysis.
Custom Agarose Gel Hardware.
Automated and semiautomated analysis of agarose gels requires that
the gels are always generated in a common fashion with a fixed lane
and sample loading format. This system was also developed by the
LBNL Automation group and will be described in detail elsewhere.
However, its utility for the physical mapping project is discussed
below.
The agarose gels used to analyze the STS mapping experiments were
molded using custom 14 x 14-cm gel casting trays, fitted Plexiglas
support plates, and a custom triple-comb assembly built by the
Automation group at LBNL. The triple-comb assembly divides each gel
into three panels, each containing 33 lanes. Each panel therefore
has sufficient lanes to load 32 plate pool samples plus a control
sample, which is the PCR product generated by assaying
Drosophila genomic DNA for the given STS. Therefore, on a
single agarose gel, an entire plate pool experiment representing
the 9,216-member P1 clone library can be analyzed. Agarose gels for
the analysis of row/column experiments were cast in the same
format. The PCR reactions representing the twenty row and column
pools and a genomic DNA standard were loaded on a single panel of
these gels. Thus in contrast to a plate pool gel, a row/column gel
treated each panel as a separate experiment; each panel represented
a different set of row and column pools, and perhaps a different
STS marker. The gels were loaded manually using either 8 channel
multi-pipettes for plate pools or 12 channel multipipettes for
row/column gels directly from the Perkin Elmer 9600 trays. The 14 x
14-cm gel size is compatible with the commercially available
electrophoresis boxes from Biorad (Hercules, CA). The lids of these
gel boxes were customized by the Automation group to provide fans
for cooling. The use of fans allows the gels to be run at much
higher voltages without the generation of heat-induced mobility
artifacts.
Informatics issues in the production
environment.
As mentioned above, the informational needs of the project are
numerous and varied (Fig. 3). The first task is the maintenance and
integration of the data from the different projects in a fashion
that is accessible to all the researchers on the project. Second,
it is important to be able to deal with the P1-end sequence data
and convert these sequences into STS markers that can be used
efficiently in contig-building experiments. Third, it is important
to analyze rapidly and correctly the PCR-based mapping assays and
correctly report the results to the mapping database. Furthermore,
it is useful to re-examine data in those cases where conflicts are
apparent. The fourth and final data analysis task is to bring all
the information together and develop a graphical representation of
how all the data from the mapping and sequencing projects fits
together in a unified vision of the current status of the
map.
Figure
3. Data flow
The data flow in the mapping project has to allow the introduction
of sequencing trace files from the Macintosh computer associated
with the ABI 373 sequencers and the gel image files associated with
the Gateway PCs that drive the image acquistion on the LBNL gel
imaging station. The information flow is described in detail in the
text.
Flydb
Suzanna Lewis developed a database for the
Drosophila
Genome Project called Flydb. Flydb is a variant of ACeDB
(
a Caenorhabditis elegans data
base), which was originally written and
developed by Richard Durbin (Sanger Center, Hinxton, England) and
Jean Thierry-Mieg (CNRS Montpellier, France) to support work in
C. elegans (Dunham et al. 1994). The initial purpose of
Flydb was to provide support to the four laboratories in the
mapping collaboration. To achieve this goal, Flydb supports the
collection of data from each contributor, curation and
consolidation of this data into a master database, summarization of
these results in concise graphical displays, and distribution to
the collaborating labs. The customization required a new set of
graphical displays, modification and enhancement to graphic
utilities and tools to allow programmatic access to the
database.
One chromosome arm at a time is represented in the graphical
display, shown both as a simple line that functions as a scroll bar
that allows users to visualize a specific region in detail. In
addition, the display can be positioned simply by clicking on a
particular chromosomal division. A graphical representation of the
polytene chromosome bands in which the length of a band is
proportional to its DNA content is also displayed. The remainder of
the screen shows a variety of genetic and clone markers. Users can
select the sets of markers that are displayed. In the original
version of Flydb that served as our lab notebook the choice of
markers included STSs, YACs, P1 clones, P1 contigs and P elements.
In addition, users can retrieve more detailed descriptions of the
data.
STS primer design.
Fly_by_night is software developed by Suzanna Lewis, Henry
Cobb, Gregg Helt (Drosophila Genome Center), and Sam Pitluck
(Lawrence Berkeley National Laboratory) that manages the data flow
associated with the process of the conversion of P1-insert end
sequence into PCR primers that can be used to screen the mapping
library. Using this software, trimmed and edited trace files
associated with each new sequence are searched using the BLASTN
algorithm (Altschul et al. 1990) against a local database that
contains all the previously designed STS markers, all known fly
repetitive elements, and members of gene families such as histones
or tubulins. If an identity is found in the local database, the
sequence is filtered out and an STS is not generated from it. Novel
sequences are presented to the
Primer 0.5 program
developed at the Whitehead Genome Center (S. E. Lincoln, M. J.
Daly, and E.S. Lander). The program identifies primers that can be
used to generate a PCR product in the range of 15-300 bp. Once
primers have been designed,
fly_by_night assigns an STS
number chronologically to each primer pair/sequence set. The
software then searches GenBank using BLASTN to identify any
characterized previously fly sequences. About 5% of the P1 end
sequences match a
Drosophila sequence that has been
previously deposited in the public databases. Next the script
archives all files associated with each STS to appropriate storage
directories and all data concerning the STS sequences, primers, and
BLASTN search results are read into Flydb. Finally, PCR primer
orders can automatically e-mailed to a commercial oligonucleotide
supplier or to the LBNL oligonucleotide synthesizer (Sindelar and
Jaklevic 1995).
Analysis of STS mapping assays
Angel is a gel image analysis program, adapted by Sam
Pitluck and Terri Fleming of the LBNL HGCI group, to allow a
computer-assisted mechanism for scoring and interpretation of the
STS mapping gels.
Angel can operate in either a plate pool
or row/column pool analysis mode. By examining the digitized images
from the image station, the user identifies the positive control
(the PCR product generated from genomic DNA) on each panel of the
gel. Corresponding positive signals are then identified by clicking
on each band with the mouse. The only automated function
Angel provides is lane finding. Specifically, the position
identified by the operator is translated, with reference to an
underlying X-Y coordinate map of the gel image, to the appropriate
pool that corresponds to the lane in question. The deconvolution of
the pool information is written to a text file along with
appropriate attributes. These files are read directly into Flydb
where pool and clonal identity are calculated. The user-assisted
analysis mode facilitates the correct scoring of weak signals and
also affords the trained eye an opportunity to sort through
specific results in mapping experiments where nonspecific bands are
also present.
Contig building. The contig assembly algorithm
utilized in the
Drosophila project is called
Spam
and was developed by Suzanna Lewis. The difficult computational
issue in physical mapping is reconstructing a representation of
each chromosome given data describing likely overlaps among members
of a fragment library derived from genomic DNA. The essential
algorithm relies upon the observation that any set of STSs hitting
an individual clone should appear consecutively in the finished
ordering. Conceptually, the input data are a matrix in which an
individual STS is a row, an individual clone is a column, and the
value of each cell indicates the positive or negative result for
that STS. If the data are error-free then there must exist some
ordering of the rows (STSs) such that the positive scores appear
consecutively, without any gaps, for every clone column. As the
data are not error-free the goal is to recover the most likely
underlying order and enumerate and describe those data that prevent
this map from being perfectly ordered. The current approach uses
the PQ free approach of Booth and Leuker (1976) to set the upper
bound for a subsequent branch and bound algorithm. This restricted
set of possible errors is thus available to the biologists for
their use in choosing the most informative experiments to perform
to resolve these discrepancies in the data.
Two additional types of information are available and incorporated
into our map-building algorithms. Almost 30% of the P1 clones have
been localized by in situ hybridization. The level of resolution of
this analysis is about ~50-100 kb. A position can be assigned to
each localized clone that is expressed in kb and is derived from
the polytene chromosome assignment, which has been translated using
estimates of DNA content per band (Sorsa 1988) to a distance
measured from one end of the chromosome arm. These localization
data are used to screen for potential false positives, chimeric
clones, and repeat sequences. The second type of information
available is that most of the STS markers is paired. Because both
ends of a P1 clone are sequenced, most of the STS probes
originating from the library have a corresponding STS that shares
the same P1 source and these two STSs are separated by ~80 kb. This
relationship is useful for comparisons, as one can expect these
paired STSs to identify clones that represent a common chromosomal
region.
Stage 3: The current state of the P1-based STS content map
of the Drosophila euchromatic genome.
Strategy and expectations.
As described above, one of the key strategic decisions made in the
experimental design was to couple the results of the framework in
situ hybridization map with a nonrandom STS content mapping
approach. Specifically, a list of the euchromatic clones localized
by Hartl's group was maintained throughout the project. Clones
shown by their polytene localizations to be nonoverlapping were
selected as STS sources, and the ends of these cloned inserts were
sequenced. PCR primers were designed on these sequences, and the P1
library was then screened. This process was repeated in an
iterative fashion, excluding as STS sources all clones already
assigned to contigs by the previous PCR-based mapping assays.
Since the library is approximately five hit, each STS assay should
identify 5-6 additional clones, and 11-13 member contigs should be
generated with each paired mapping assay. Additionally, each of
these contigs should cover, on average, ~200 kb. The directed
nature of the STS selection scheme continually forces the mapping
of regions that are not yet covered by contigs. Furthermore, all
contigs are genomically localized because every P1 clone that is
used as a source of paired STS markers has already been localized
by in situ hybridization. Computer simulations of the strategy,
based on the assumptions that the euchromatic genome is 120 Mb and
that the cloned inserts are ~80 kb in size, suggest that 1800
paired, non-random STS mapping assays would provide coverage that
slightly exceeds 99%.
The current state of the mapping effort is summarized in Table 1.
The main features summarized include the number of STS markers
mapped, the number of clones localized to contigs, the number and
size of contigs, and the fractional coverage of the genome provided
by the mapped contigs. Each of the results is described in more
detail in Table 1 and below.
Table 1 Summary of the current status of the physical map
of the Drosophila genome
| Results |
Experimental category. |
| 2155 |
Number of P1-end STS markers mapped (1622 paired
and 533 unpaired) |
| 2352* |
Number of euchromatic clones localized by in situ
hybridization |
| 261 |
Number of euchromatic clones cytogenetically
localized but not yet assigned to contigs by STS content
mapping |
| 336 |
Number of P1 clones identified by in situ
hybridization to chromocentric regions of the polytene
chromosomes |
| 6384 |
Total number of clones assigned to cytogenetically
localized contigs |
| 2832 |
Total number of clones not yet assigned to
contigs |
| ~ 110Mb |
Estimated euchromatic coverage |
| 649 |
Estimated number of contigs |
| 170 kb |
Average contig size |
| 5.04 |
Total hit average |
| 5.2 |
Autosomal hit level |
| 3.8 |
X-chromosome hit level |
*35 of the clones localized by in situ hybridization were done by
Todd Laverty (BDGP).
STS markers and contigs.
To this point, 2397 STS markers have been mapped completely. These
probes have been developed from the sequences at the ends of 1344
cloned P1 inserts that had previously been identified as
euchromatic. Most of the markers (1622) are paired -- derived from
opposite ends of the same cloned insert. The remainder (533)
represent sequences at one end of a cloned P1 insert. We made
significant efforts to work exclusively with paired STS markers.
However, in some instances the ends were resistant to sequencing,
the developed PCR primers failed after two attempts, or repeat
sequences were identified at one of the ends. A major feature of
the map is that there is now a sequenced and mapped STS on average
every 50 kb in the
Drosophila euchromatic genome.
A major initial goal of the mapping project was to assign all the
euchromatic clones to contigs by STS mapping. Of the 2317 clones
identified in the course of the framework effort of the Hartl
group, 2091 have now been positioned into sets of overlapping
clones with the detection of molecular overlaps. Only 88 clones
from this euchromatic collection remain as localized singlets.
Repeated efforts to develop unique STS markers from the ends of the
inserts in these clones have failed.
It is important to note that the majority of the clones in the
canonical mapping library were unselected by the Hartl group in the
generation of their map based on in situ hybridization. A large
fraction of these clones have now been localized as part of the
contig-building experiments. Specifically, of 9216 clones, 6384 are
now members of localized contigs, whereas 2832 clones remain
unlocalized by STS mapping assays. Of these clones, 197 were
characterized as heterochromatic in the in situ-based mapping
experiments. There remain 2017 clones that are currently
unassociated with any mapping data. That is, these 2017 clones have
not been used as probes for in situ mapping, as sources of STS
markers, or been hit by any of the STS markers mapped to
date.
Estimating contig sizes and coverage in STS-based mapping projects
can be problematic. In constrast to fingerprinting mapping
techniques, STS content procedures provide little insight into
physical size between markers. For this reason, contig size and
coverage are frequently based largely on statistical
considerations. At this point 6,384 clones have been assigned to
localized contigs. If one assumes that the euchromatic genome is
120 Mb and all the mapped clones are 80 kb in length, then the
total DNA content of the mapped clones provides represensation of
more than four genome equivalents. If one makes the further
assumption that the mapped clones represent the genome in a Poisson
fashion, then the coverage might exceed 97%. However there are a
number of reasons why the mapped clone collection might deviate
from randomness. For example, most cloning systems suffer a
representation bias based on biological mechanisms. Thus, we
believe our current map to be less than the statistical ideal but
think it unlikely that it does not provide coverage of 90% of the
euchromatic genome at this point.
The contig assembly program outlined above, uses the STS mapping
data to assemple P1 contigs of overlapping clones. According to
this analysis, the current physical map is distributed in 649
contigs, 4 of which are > 500 kb. If, as suggested above the
estimated coverage is ~110 Mb then the average contig size is ~170
kb.
Heterochromatin in the P1 Library. The in situ
hybridization analysis of 2653 euchromatic P1 clones by Hartl and
co-workers identified 336 P1s that exhibited a repetitive or
chromocentric staining pattern. As the data suggest these clones
are likely to be heterochromatic, they were intentionally not
selected in the initial mapping effort as sources for STS markers.
However, these clones are members of the canonical library from
which the template pools were generated for STS content mapping.
Most of these clones were not hit in the screens. This is not
surprising as the STS markers were selected from clones thought to
be euchromatic. However, the data from the STS assays suggest that
151 of these clones may be euchromatic as they are hit by at least
one presumptive euchromatic STS. In addition, 44 of these
repetitive P1 clones appear to be convincingly euchromatic because
they are hit by at least two STS markers, and in each case both
markers exhibited a consistent genomic localization. The true
fraction of clonable heterochromatin in the library and the ability
to assign these clones to unambiguous contigs is the focus of
ongoing experiments (see Discussion).
False Positives, False Negatives, and Chimeric
Clones. Like most experimental procedures, physical mapping is
prone to both false negatives and false positives. As the STS
markers have been derived from identified clones the false-negative
rate can be estimated. Specifically, an estimate of the false
negative rate is derived as the number of cases where an STS
designed to a P1 insert end does not identify its source divided by
the total number of mapped P1 end-derived STS markers. The current
figure for this event is 12% (252/2155). We believe this figure to
be a maximum estimate of the false negative rate because there are
other explanations of the failure of a P1 end STS to identify its
source. One cause of over-estimation of the false-negative rate is
attributable to misidentity of the source clone. Human error
occasionally results in a mapping clone being misnamed. When
mapped, the STS markers derived from the clone would not hit the
presumptive source, but, instead hit the actual source. These
errors are often easy to identify, and their correction is an
ongoing process.
One way to estimate the false-positive rate in the STS mapping
experiments relies on the cross-correlation of in situ
hybridization data and contig placement data to accept or reject
hits used for contig building. An estimate of the false positive
rate in the STS mapping experiments is obtained by dividing the
total number of unused (rejected) hits among all STS markers by the
total number of hits. This figure is currently 1.7% (181/10,812).
In this case we believe the figure to be an underestimate of the
true false-positive rate because the identification of
false-positives often relies on in situ hybridization data, and
such data exist for only 26% (2352/9216) of the clones in the
library.
Chimeric clones result from genomic DNA fragments representing two
unlinked regions of the genome ligated into a single P1 vector.
Such chimerism has often plagued mapping projects based on YAC
libraries. As many as 50% of the clones in some human YAC libraries
are thought to consist of such artifactually jointed inserts. The
biological constraints imposed by the P1 packaging extracts should
limit the frequency of the formation of chimeric clones. Our
results confirm this hypothesis.
It is difficult to define unambiguously and precisely the rate of
chimerism at this stage in the mapping project. However, some of
the results can be used to gain a preliminary estimate. We are
currently using the operational definition that a clone is likely
to be chimeric if it meets the following criteria: (1) The in situ
hybridization results identify two unlinked hybridization sites and
(2) STS markers developed from opposite ends of the same cloned
insert identify P1 clones that clearly belong to unlinked contigs.
To date, we have identified only 10 clones that meet these
criteria. This is a small fraction of the ~1500 clones that have
been both mapped by in situ hybridization and used as a source of
paired STS markers.
Redundancy in the library.
The early estimates predicted that each genomic region was likely
to be represented, on average, five to six times, based on average
insert size, the number of clones, and the estimated size of the
euchromatic genome. Thus far, an average of 5.04 hits/STS has been
observed based on a sample of 2397 completely mapped STS markers.
Two factors might cause this assessment of genomic coverage to be
incorrect: First, the STSs are derived from clones in the library
and thus might be biased towards regions that clone with greater
frequency; and second, the redundancy in the library should vary
for the autosomes and the sex chromosomes. As the library was
generated from an equal mixture of males and females, those
sequences on the sex chromosomes should be represented at a
slightly lower frequency when compared to regions on the second and
third chromosomes.
Nature of the contigs and mechanisms of error
correction.
A major drawback to physical mapping experiments is that the data
are not associated with figures of merit. This is in marked
contrast to the results obtained with other mapping methods,
specifically, genetic recombination mapping and radiation hybrids.
We have attempted to minimize these limitations by using approaches
that obtain redundant information using multiple and distinct
experimental methods. There are two types of experimental objects
in the mapping experiments described in this paper: P1 clones and
STS markers.
The quality control of the data associated with these objects can
be seen by an examination of an example contig presented in Figure
4. This contig consists of 48 individual P1 clones. Of these
clones, 19 have been assigned to this region of the genome by both
STS content mapping and in situ hybridization. All of the STS
markers have been derived from sources that have been localized by
the chromosomal hybridizations. Sixteen of the STS markers in this
contig were derived from the ends of cytogenetically localized P1
clones and one comes from a cytogenetically localized P
element.
Figure
4. Example of an individual contig
This contig is found at region 58 on the Drosophila polytene map.
The rectangles (
top )have been drawn to resemble the
cytogenetic regions of this portion of the right arm of the second
chromosome. The long-thin rectanges with solid circles represent
individual P1 clones that have been localized to this region of the
genome. The P1 rectangles that are filled in have been assigned by
both in situ hybridization and by STS content mapping. The open P1
rectangles represent P1 clones that have been assigned by STS
assays alone. The cirles represent STSs assigned to this region.
All the circles that line up in a vertical row correspond to the
same STS marker. The solid circles correspond to STS markers that
hit the clones positioned in this genomic region. The open circles
correspond to STS markers that are likely to be false negatives.
The inverted triangle represents the site of a P-element insertion
that has been localized to this region by both cytogenetic means
and by PCR-based STS-content mapping.
Inconsistencies in the mapping data become obvious immediately when
the two complementary mapping methods are used on the same set of
mapping clones. Clones assigned to one region by one technique and
to a second by another technique can be identified readily. Such
clones are discarded by the contig building algorithm. However, the
data are tracked and can be retrieved as they may be useful in
subsequent error correction experiments.
Tracking the data associated with the STS markers can also be
helpful in eliminating both experimental errors or artifactual
results. Experience suggests that the most prevalent source of
error associated with these markers is a mishandling of the data or
the reagents during the series of procedures that are involved
during the journey from a clone in a microtiter well to a sequence
to a map position. For example, clones can be incorrectly selected
from the microtiter plates prior to sequencing, clones can be
mislabeled at any step along the way, and misordering of the PCR
primers can occur. However, the nature of our experimental
organization allows a specific prediction to be made: STSs derived
from the ends of P1 clones should hit their source clones when the
library is screened. STSs that identify their source when used to
screen can be assigned a higher confidence level than those that do
not. Each of the STS probes used to develop the contig presented as
an example in figure 4 hit their source clone and provides
confirming data that validate the manipulations used to develop and
map each of these STSs. The combination of multiple independent
mapping mechanisms and the requirement that the end-derived STS
markers hit their source clones strengthens the overall quality of
the map developed in the course of this work.
DISCUSSION
Strategic choices
In the introductory section, several mapping choices were
mentioned, that is, vector type, method of overlap detection, and
source of STS markers. There are several criteria upon which these
mapping decisions can be considered. Minimal benchmarks include a
comparison of the fraction of the genome covered by the ordered
arrays (or contigs), the average size of the contigs, and the cost,
in terms of time and resources, required to generate the map. Other
standards include the flexibility of the map, community access, the
potential to use the map to provide substrates for large-scale
genomic sequencing, the biological content of the map, and the
correlation of the clone-based physical map with the genetic map
and the polytene map.
When we began these experiments, maps of the
Drosophila
genome based on libraries in yeast artificial chromosomes (YACs)
had been developed (Ajioka et al. 1991;Cai et al. 1994).A
deliberate decision was made to not pursue closure with these maps
and libraries. Some cloned inserts in YACs are known to be
predisposed to certain classes of artifacts, such as instability
and chimerism. In addition, it is important to note that the insert
sequences in YAC clones have proven difficult to purify in amounts
comparable to what can be obtained from
E. coli -based
plasmid sources. This is a key consideration if the clones from the
physical map are going to be used as a source of templates for
large-scale genomic sequencing.
Cosmids have been the most common choice for physical mapping
efforts associated with genomic sequencing projects. The cloned
inserts in cosmids are less than half the size of the fragments in
P1 clones. Thus, if we had decided to base our mapping project on
cosmid libraries then the mapping set would have to be more than
twice as large to provide the same degree of coverage, twice as
many STSs would have to be mapped in order to assign all clones in
the library to contigs, and the average size of each contig would
be half as large. Furthermore, if the mapping library is generated
in a P1 vector, fewer clones will have to be purified and subcloned
in the subsequent large-scale sequencing effort.
Evidence accumulated from other laboratories that indicates that
large inserts are more stable in single copy plasmids (such as P1
clones, PACs, and BACs) than they are when cloned into multicopy
plasmids (such as cosmids) (Kim et al. 1992). Our experiments do
not directly address these issues. However, the work to date has
yet to identify even a single unstable clone. With this issue in
mind, it is important to point out that all pooling, library
replication, and DNA preparation protocols were carried out on
saturated overnight cultures. In contrast cosmids are prone to
instability with such "rough" handling.
The completion of this stage of the
Drosophila map
verifies the utility of the non random STS selection scheme that we
devised several years ago. Similar results were obtained in the
generation of the map for
Schizosaccharomyces pombe using
a similar mapping strategy (Mizukami et al. 1993). Three to four
times as many STS markers would have been needed to reach this same
state of completion if the STSs had been chosen in a random
fashion.
Data accessibility
These mapping experiments were conducted not only to provide
templates for large-scale sequencing but also to promote the
positional cloning efforts of the
Drosophila research
community. Sixteen copies of the mapping library were generated at
LBNL using the library replication system developed by Joe
Jaklevic's LBNL HGC Automation group. These copies were then mailed
to individual laboratories in diverse geographic locations that
agreed to assume the responsibility for distributing clones to
researchers in their region. The distribution centers cover North
America, Europe, and Asia. The information concerning P1 in situ
hybridization results and STS content mapping has been made
available to the research community developed in a collaboration
between Flybase and the Berkeley DGC.
Further, we decided to use an STS content mapping strategy to
identify clone overlaps were because this approach would provide a
benefit to the community. STS content maps are not only based on a
set of clones, but on the information content of the genome. All of
the mapped sequence tags have also been made available to the
community. For almost every genomic region it is now possible to
find small patches of sequence that can be used to screen a genomic
library made in any vector.
Future directions
To date, almost all of the clones identified as euchromatic by
Hartl's hybridization experiments have now been assigned to contigs
by STS content mapping. This has resulted in 649 P1-based contigs
that together probably represent > 110 Mb of the 120-Mb
euchromatic genome. Still, 2832 clones remain unmapped. This class
includes clones about which there is, as of yet, no mapping
information. It is likely that this collection of unassigned clones
represents the remaining unmapped euchromatic regions as well as
regions of cloneable heterochromatin. Experiments to incorporate
these clones into the map are currently underway. These map
completion experiments will continue the utilize the paired STS
strategy that takes advantage of nonrandom approach for STS
selection and in situ hybridization to the polytene chromosomes of
Drosophila.
METHODS
Generation of DNA Pools For PCR-Based
Mapping.
The pools were generated by growing up the clones of an individual
microtiter plate in a titer tube box containing in each tube 0.4 mL
TB plus 25 g/ml of kanamycin at 37°C overnight. Thus, each
clone was grown separately prior to pooling to avoid sib
competition. The pools were then made by combining the appropriate
clones in a box after growth. For the generation of a plate pool,
one titer tube box per plate was grown. For the generation of the
row and column pools, two titer tube boxes were grown for any one
plate. One box was used to make the 12-row pools, and the other was
used to make the 8-column pools. Once the clones were pooled
appropriately, the cells were sedimented, washed with sterile
water, and resedimented. For preparation of the plate pools we
followed a standard triton-lysozyme boiling DNA preparation
protocol (Smoller et al. 1993). The crude DNA preparation was
further treated with RNase A to degrade contaminating RNA, then
extracted with phenol:chloroform to remove any residual proteins.
After a final ethanol precipitation the plate-pool DNA was
suspended in 0.5 mL 10 m
M Tris-HC1,
1m
M EDTA (TE). For the preparation of the
row and column pools, essentially the same protocol was followed
except the phenol:chloroform extraction was omitted and the row and
column pools were suspended in 0.1 ml TE rather than 0.5 mL as for
the plate pools. The amounts of DNA pools generated are sufficient
for >15,000 STS screens. This represents a vast excess of the
number of screens needed to complete the clone-limited phase of the
mapping project, and likely the map closure phase of the project as
well.
STS Content Mapping By Polymerase Chain
Reaction.
To map an STS marker the PCR screening procedure is organized into
two levels. The top level screen uses the 96-plate pool samples.
Using the data derived from this first screen the appropriate set
of 20-row and -column pools is then selected to analyze for the
second level of screening. This pooling strategy is designed to
retain the 8 x 12 microtiter format which allows for more
opportunities to incorporate informatics and automation solutions.
In addition, the pooling strategy lowers the number of PCRs
required to map a single STS marker. In a five-hit library pooled
as described, a typical STS requires 196 PCRs to complete: 96 for
the top level plate pool and five sets of 20 PCRs for the ensuing
row/column experiments.
The PCR reactions are run in the Perkin Elmer 9600 thermocyclers.
The PCR reactions are in a final volume of 15 µL that contains
0.4 µM each forward and reverse primer, 0.2 mM dNTPs, and 0.04
U/µL
Taq polymerase. The sample of P1 pool DNA used
is equivalent to a 1:500 dilution of stocks generated as described
above. The reactions are first denatured at 95°C for 2.5
minutes then 30-35 cycles using the following parameters are
carried out: 96°C for 15 sec, 58°C for 15 sec, 72°C
for 30 sec. Occasionally some primer pairs work better at an
annealing temperature of 55°C or 52°C. This parameter is
determined empirically for each primer pair by testing them in PCR
using fly genomic DNA as template. After PCR is carried out using
the P1 DNA pools as template, the samples are analyzed by agarose
gel electrophoresis using 2% agarose gels and standard
Tris/borate/EDTA buffer to determine which plate pools generate a
PCR product of the appropriate size.
Acknowledgements
We would like to thank all of our colleagues in the HGC of the LBNL
and the Berkeley-based DGC. Special thanks to Joe Jaklevic and the
LBNL Automation Group, Frank Eeckman and the LBNL Computation
group, and the Informatics group of the DGC. We acknowledge Todd
Laverty for performing the in situ hybridization experiments that
verified the positions of some of the mapped contigs. Thanks go to
Mohan Narla and Gerald Rubin the DGC Directors. We would also like
to acknowledge superb administrative assistance of Joyce Pfeiffer.
This work is part of the consortium known as the DGC supported by
grants from the National Center for Human Genome Research
(P50-HG00750 to Gerald Rubin, Allan Spradling, M.J. P. and C.H.M.)
and HGC of the LBNL which is supported by the U.S. Department of
Energy under Contract no. DE-AC03-76SF00098. M.J.P. is a Lucille P.
Markely Scholar and his effort was funded in part by the Lucille P.
Markey Charitable Trust. Finally we thank Gerry Rubin and Allan
Spradling for a critical reading of the manuscript prior to
publication.
REFERENCES
Ajioka, J. W., D. A. Smoller, R. W. Jones, J. P. Carulli, A. E.
Vellek, D. Garza, A. J. Link, I. W. Duncan and D. L. Hartl, 1991
Drosophila genome project: one-hit coverage in yeast artificial
chromosomes. Chromosoma
100: 495-509.
Altschul, S. F., W. Gish, W. Miller, E. W. Myers and D. J. Lipman,
1990 Basic local alignment search tool. J. Mol. Biol.
215: 403-410.
Bender, W., M. Akam, F. Karch, P. A. Beachy, M. Peifer, Spierer, P.
and E. B. Lewis, 1983 Hogness, D.S. Molecular genetics of the
bithorax complex in Drosophila melanogaster. Science
221: 23-29.
Booth, K., and G. Leuker, 1976 Testing for the consecutive ones
property, interval graphs and graph planarity using PQ- algorithms.
Journal of Computer ans System Sciences
:
335-379.
Bridges CB, 1935 Salivary Chromosome Maps. Heredity
26: 60-64
Burke, D. T., G. F. Carle and M. V. Olson, 1987 Cloning of large
segments of exogenous DNA into yeast by means of artificial
chromosome vectors. Science
236: 806-812.
Cai, H. J., P. Kiefel, J. Yee and I. Duncan, 1994 A yeast
artificial chromosome clone map of the drosophila genome. Genetics
4: 1385-1399.
Celniker, S. E., D. J. Keelan and E. B. Lewis, 1989 The molecular
genetics of the bithorax complex of Drosophila: characterization of
the products of the Abdominal-B domain. Genes Devel.
3: 1425-1437.
Coulson, A. R., J. Sulston, S. Brenner and J. Karn, 1986 Toward a
physical map of the genome of the nemato de caenorhabditis elegans.
Proc. Natl. Acad. Sci. USA
83: 7821-7825.
Dunham, I., R. Durbin, J. Thierry-Mieg and D. R. Bentley, 1994
Physical Mapping Projects and ACEDB in
Guide to Human Genome
Computing (ed. M. J. Bishop), pp. 111-158. Academic Press, San
Diego, CA.
Green, E. D., and M. V. Olson, 1990 Systematic screening of yeast
artificial-chromosome libraries by use of the polymerase chain
reaction. Proc. Natl. Acad. Sci. USA
87:
1213-1217.
Hartl, D. L., D. I. Nurminsky, R. W. Jones and E. R. Lozovskaya,
1994 Genome structure and evolution in Drosophila: applications of
the framework P1 map. Proc. Natl. Acad. Sci.
91:
6924-6829.
Ioannu, P. A., C. T. Amemiya, J. Garnes, P. M. Kroisel, H. Shizuya,
C. Chen, M. A. Batzer and D. P. J., 1994 A new bacteriophage
P1-derived vector for the propogation of large human DNA fragments.
Nature Genetics
6: 84-89.
Karch, F., W. Bender and B. Weiffenbach, 1990 abdA expression in
Drosophila embryos. Genes and Dev.
4:
1573-1587.
Kim, U., H. Shizuya, P. J. d. Jong, B. Birren and M. I. Simon, 1992
Stable propagation of cosmid sized human DNA inserts in an F factor
based vector. Nucleic Acids Res.
20:
1083-1085.
Kimmerly, W. J., A. L. Kyle, V. M. Lustre, C. H. Martin and M. J.
Palazzolo, 1994 Direct sequencing of terminal regions of Genomic
P-1 clones. GATA
11(5-6): 117-128.
Kohara, Y., K. Akiyama and K. Isono, 1987 The physical map of the
whole E. coli chromosome: application of a new strategy for rapid
analysis and sorting of a large genomic library. Cell
50: 495-508.
Kornfeld, K., R. B. Saint, P. A. Beachy, P. J. Harte, D. A. Peattie
and D. S. Hogness, 1989 Structure and expression of a family of
Ultrabithorax mRNAs generated by alternative splicing and
polyadenylation in Drosophila. Genes Dev.
3:
243-258.
Martin, C. H., C. A. Mayeda, C. A. Davis, C. L. Ericsson, J. D.
Knafels, D. R. Mathog, S. E. Celniker, E. B. Lewis and M. J.
Palazzolo, 1995 Complete sequence of the bithorax complex of
Drosophila. Proc Natn'l Acad Sci USA
92:
8398-8402.
Mizukami, T., W. I. Chang, I. Garkartsev, N. Kaplan, D. Lombardi,
T. Matsumoto, O. Niwa, A. Kounosu, M. Yanagida, T. G. Marr and et.
al., 1993 A 13 kb resolution cosmid map of the 14 Mb fission yeast
genome by nonrandom sequence-tagged site mapping. Cell
73: 121-132.
Nurminsky, D. I., and D. L. Hartl, 1993 Amplification of the ends
of DNA fragments cloned in bacteriophage P1. Biotechniques
15: 201-202, 206-208.
Olson, M., L. Hood, C. Cantor and D. Botstein, 1989 A common
language for physical mapping of the human genome [see comments].
Science
245: 1434-1435.
Olson, M. V., J. E. Dutchik, M. Y. Graham, G. M. Brodeur, C. Helms,
M. Frank, M. MacCollin, R. Scheinman and T. Frank, 1986
Random-clone strategy for genomic restriction mapping in yeast.
Proc. Natl. Acad. Sci. USA
83: 7826-7830.
Palazzolo, M. J., B. A. Hamilton, D. Ding, C. H. Martin, D. A.
Mead, R. C. Mierendorf, K. V. Raghavan, E. M. Meyerowitz and H. D.
Lipshitz, 1990 Phage lambda cDNA cloning vectors for subtractive
hybridization, fusion-protein synthesis and Cre-loxP automatic
plasmid subcloning. Gene
88: 25-36.
Palazzolo, M. J., S. A. Sawyer, C. H. Martin, D. A. Smoller and D.
L. Hartl, 1991 Optimized strategies for sequence-tagged-site
selection in genome mapping. Proc. Natl. Acad. Sci.
88: 8034-8038.
Pardue, J. L., L. H. Kedes, E. S. Weinberg and M. L. Birnsteil,
1977 Localization of sequences coding for histone messenger RNA in
the chromosomes of Drosophila melanogaster. Chromosoma,
63: 135.
Riley, J., R. Butler, D. Ogilvie, R. Finniear, D. Jenner, S.
Powell, R. Anand, J. C. Smith and A. F. Markham, 1990 A novel,
rapid method for the isolation of terminal sequences from yeast
artificial chromosome (YAC) clones. Nucleic Acids Res.
18: 2887-2890.
Sindelar, L. E., and J. M. Jaklevic, 1995 High-throughput DNA
synthesis in a multichannel format. Nucleic Acids Research
23: 982-987.
Smith, D. R., 1992 Ligation-mediated PCR of restriction fragments
from large DNA molecules. PCR Meth. Appl.
2:
21-27.
Smoller, D. A., W. J. Kimmerly, O. Hubbard, C. Ericsson, C. H.
Martin and M. J. Palazzolo, 1993 A Role for the P1 Cloning System
in Genome Analysis. In
Automated DNA Sequencing and Analysis
Techniques. pp. 89-95. Academic Press, New York, NY.
Smoller, D. A., D. Petrov and D. L. Hartl, 1991 Characterization of
bacteriophage P1 library containing inserts of Drosophila DNA of
75-100 kilobase pairs. Chromosoma
100:
487-494.
Sorsa, V., 1988
Chromosome Maps of Drosophila. CRC Press,
Boca Raton, Florida.
Spradling, A. C., D. Stern, I. Kiss, J. Roote and G. M. Rubin, 1995
Gene Disruption using P transposable elements: An integral
component of the Drosophila genome project. Proc. Natl. Acad. Sci.
USA
92: 10824-10830.
Sternberg, N., 1990 Bacteriophage P1 cloning system for the
isolation, amplification, and recovery of DNA fragments as large as
100 kilobase pairs. Proc. Natl. Acad. Sci. USA
87:
103-107.
Sulston, J., Z. Du, K. Thomas, R. Wilson, L. Hillier, R. Staden, N.
Halloran, P. Green, J. Thierry-Mieg, L. Qiu and e. al., 1992 The C.
elegans genome sequencing project: a beginning [see comments].
Nature
356: 37-41.
Taylor, R. G., D. C. Walker and R. R. McInnes, 1993 E. coli host
strains significantly affect the quality of small scale plasmid DNA
preparations used for sequencing. Nucleic Acids Res.
21: 1677-1678.
Wakimoto, B. T., F. R. Turner and T. C. Kaufman, 1984 Defects in
embryogenesis in mutants associated with the antennapedia gene
complex of Drosophila melanogaster. Developmental Biology
102: 147-172.