![]() |
![]() | |
|
Searches | ||
|
Read Abstract
About the neural network methodNNPP is a method that finds eukaryotic and prokaryotic promoters in a DNA sequence. The function of the promoter as a initiator for transcription is one of the most complex processes in molecular biology. It has been shown that multiple functional sites in the primary DNA are involved in the polymerase binding process. These elements, such as the TATA-box and the transcription start site ("Initiator") for eukaryotes, are known to function as binding sites for Polymerase II, transcription factors, and other proteins that are involved in the transcription initiation process. These promoter elements are present in various combinations separated by various distances in the sequence.The basis of the NNPP program is a time-delay neural network (see further References for details). The time-delay network consists mainly of two feature layers, one for recognizing the TATA-box and one for recognizing the "Initiator", which is the region spanning the transcription start site. Both feature layers are combined into one output unit, which gives output scores between 0 and 1. The neural network method is described in detail in
(1) Reese, M.G.
(2) Reese, M.G. and Eeckman, F.H. (1995)
(3) Reese, M.G., Harris, N.L. and Eeckman, F.H. (1996) Please cite these when quoting NNPP output.
Estimated accuracy of prediction
EukaryotesA careful 4-fold cross validation test on 429 eukaryotic RNA Polymerase II promoters from the Eukaryotic Promoter Database (EPD, version 50)
eukaryotic POL II promoter sequences. Nucl. Acids Res. 14, 10009-10026.
Polymerase II Promotor Elements Derived from 502 Unrelated Promotor Sequences. J. Mol. Biol. 212, 563-578. and on 305 unrelated genes with less than 50% pairwise sequence identity (gene data set) gave the following results (results averaged over both test sets):
+------------+-----------+------------+------------+
| threshold | % | | correlation|
| | promoters | false | coefficient|
| | recognized| positives | (CC) |
+------------+-----------+------------+------------+
| | | | |
| 0.99 | 10% | 0.0% | 0.38 |
| | | | |
+------------+-----------+------------+------------+
| | | | |
| 0.97 | 20% | 0.0-0.1% | 0.38 |
| | | | |
+------------+-----------+------------+------------+
| | | | |
| 0.92 | 30% | 0.1-0.3% | 0.50 |
| | | | |
+------------+-----------+------------+------------+
| | | | |
| 0.85 | 40% | 0.1-0.4% | 0.60 |
| | | | |
+------------+-----------+------------+------------+
| | | | |
| 0.70 | 50% | 0.8-1.0% | 0.65 |
| | | | |
+------------+-----------+------------+------------+
| | | | |
| 0.38 | 60% | 1.0-3.1% | 0.61 |
| | | | |
+------------+-----------+------------+------------+
| | | | |
| 0.20 | 70% | 2.2-5.3% | 0.58 |
| | | | |
+------------+-----------+------------+------------+
| | | | |
| 0.12 | 80% | 5.1-12.5% | 0.52 |
| | | | |
+------------+-----------+------------+------------+
These percentages are defined by:
predicted promoters
promoters recognized = -------------------------
all observed promoters
predicted promoters
false positives = -------------------------
all observed non-promoter
(TPxTN)-(FNxFP)
correlation coefficient (CC) = ------------------------------------
________________________________
V (TP+FN)x(TN+FP)x(TP+FP)x(TN+FN)
TN = true negative = non-promoters recognized FP = false positive = observed non-promoters predicted as promoters FN = false negatives = observed promoters predicted as non-promoters
ProkaryotesA careful cross validated test on 272 prokaryotic E. coli promoters collected and described in
sequences. Nucl. Acids Res. 15, 2343-2361. gave the following results: +------------+-----------+------------+------------+ | threshold | % | | correlation| | | promoters | false | coefficient| | | recognized| positives | (CC) | +------------+-----------+------------+------------+ | | | | | | 0.9 | 50% | 0.3% | 0.71 | | | | | | +------------+-----------+------------+------------+ | | | | | | 0.8 | 60% | 0.4% | 0.72 | | | | | | +------------+-----------+------------+------------+ | | | | | | 0.65 | 70% | 0.9% | 0.73 | | | | | | +------------+-----------+------------+------------+ | | | | | | 0.55 | 75% | 1.3% | 0.72 | | | | | | +------------+-----------+------------+------------+ | | | | | | 0.35 | 80% | 1.7% | 0.72 | | | | | | +------------+-----------+------------+------------+ | | | | | | 0.15 | 90% | 2.7% | 0.70 | | | | | | +------------+-----------+------------+------------+ | | | | | | 0.03 | 95% | 4.7% | 0.63 | | | | | | +------------+-----------+------------+------------+ The performance per base position was tested on the pBR322 vector: +------------+-----------+------------+------------+ | threshold | % | | correlation| | | promoters | false | coefficient| | | recognized| positives | (CC) | +------------+-----------+------------+------------+ | | | | | | 0.96 | 30% | 0.03% | 0.38 | | | | | | +------------+-----------+------------+------------+ | | | | | | 0.92 | 50% | 0.11% | 0.48 | | | | | | +------------+-----------+------------+------------+ | | | | | | 0.89 | 80% | 0.16% | 0.51 | | | | | | +------------+-----------+------------+------------+
Another promoter finder on the Web There exists an additional program SIGNALSCAN developed by Dr. Dan Prestridge which can be used to search for transcription factor binding sites in promoter regions. The program can be accessed at 2 different WWW sites: SIGNALSCAN at NIH or SIGNALSCAN in Singapore. | |||||
|
Please send comments or questions about the web site to bdgp@fruitfly.org | |||||