see http://stag.sf.net for more details.
Click here to see the SQL table creation statements;
gene
gene_function
transcript
protein
transcript_to_exon
exon
Protein-centric tree:
protein
transcript
gene
gene_function
transcript_to_exon
exon
SELECT *
FROM
(gene NATURAL JOIN gene_function)
NATURAL JOIN
(transcript NATURAL JOIN protein)
NATURAL JOIN
exon_to_transcript NATURAL JOIN
exon;
|
The resulting relation looks like this:
| gene_acc | symbol | cytology | mol_function | transcript_acc | mrna_seq | protein_acc | protein_seq | exon_acc | seq_start | seq_end | seq_strand | chromosome_arm |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| gn00002 | foo | 35A7 | transmembrane receptor | gn00002A | ATGAATTGCGGCTTNNNNNNNATT | gn00002P1 | MQVVQVQVQVQSCSCS | gn00002X1 | 1 | 100 | 1 | 2L |
| gn00002 | foo | 35A7 | transmembrane receptor | gn00002A | ATGAATTGCGGCTTNNNNNNNATT | gn00002P1 | MQVVQVQVQVQSCSCS | gn00002X2 | 201 | 200 | 1 | 2L |
| gn00003 | bar | 60F1 | glutamine-tRNA ligase | gn00003A | ATGCCGTGATAGCTAGCGTA | gn00003P1 | MSVXXXXXQLQLQ | gn00003X1 | 1001 | 1200 | 1 | X |
| gn00003 | bar | 60F1 | ATP binding | gn00003A | ATGCCGTGATAGCTAGCGTA | gn00003P1 | MSVXXXXXQLQLQ | gn00003X1 | 1001 | 1200 | 1 | X |
| gn00003 | bar | 60F1 | glutamine-tRNA ligase | gn00003B | ATGCCGTGATAGTTTTGGGGGG | gn00003P2 | MSVXXXXXSSSSSSSVVV | gn00003X1 | 1001 | 1200 | 1 | X |
| gn00003 | bar | 60F1 | ATP binding | gn00003B | ATGCCGTGATAGTTTTGGGGGG | gn00003P2 | MSVXXXXXSSSSSSSVVV | gn00003X1 | 1001 | 1200 | 1 | X |
| gn00003 | bar | 60F1 | glutamine-tRNA ligase | gn00003A | ATGCCGTGATAGCTAGCGTA | gn00003P1 | MSVXXXXXQLQLQ | gn00003X2 | 2001 | 2200 | 1 | X |
| gn00003 | bar | 60F1 | ATP binding | gn00003A | ATGCCGTGATAGCTAGCGTA | gn00003P1 | MSVXXXXXQLQLQ | gn00003X2 | 2001 | 2200 | 1 | X |
| gn00003 | bar | 60F1 | glutamine-tRNA ligase | gn00003B | ATGCCGTGATAGTTTTGGGGGG | gn00003P2 | MSVXXXXXSSSSSSSVVV | gn00003X3 | 1401 | 1600 | 1 | X |
| gn00003 | bar | 60F1 | ATP binding | gn00003B | ATGCCGTGATAGTTTTGGGGGG | gn00003P2 | MSVXXXXXSSSSSSSVVV | gn00003X3 | 1401 | 1600 | 1 | X |
When you feed the same query to DBStag, it will take the results of the natural join, and decompose it into a hierarchically structured tag-value pairs, which can be expressed in XML.
The following piece of code:
$stag = $dbh->selectall_stag("SELECT * FROM (gene NATURAL JOIN
gene_function) NATURAL JOIN (transcript NATURAL JOIN protein) NATURAL
JOIN exon_to_transcript NATURAL JOIN exon");
print $stag->xml;
|
| Click here to download |
Produces this output:
<dataset>
<gene>
<gene_acc>gn00002 </gene_acc>
<symbol>foo</symbol>
<cytology>35A7</cytology>
<gene_function>
</gene_function>
<transcript>
<transcript_acc>gn00002A </transcript_acc>
<gene_acc>gn00002 </gene_acc>
<protein_acc>gn00002P1 </protein_acc>
<mrna_seq>ATGAATTGCGGCTTNNNNNNNATT</mrna_seq>
<protein>
<protein_acc>gn00002P1 </protein_acc>
<molecular_weight>62</molecular_weight>
<protein_seq>MQVVQVQVQVQSCSCS</protein_seq>
</protein>
<exon_to_transcript>
<exon>
<exon_acc>gn00002X1 </exon_acc>
<seq_start>1</seq_start>
<seq_end>100</seq_end>
<seq_strand>1</seq_strand>
<chromosome_arm>2L</chromosome_arm>
</exon>
<exon>
<exon_acc>gn00002X2 </exon_acc>
<seq_start>201</seq_start>
<seq_end>200</seq_end>
<seq_strand>1</seq_strand>
<chromosome_arm>2L</chromosome_arm>
</exon>
</exon_to_transcript>
</transcript>
</gene>
<gene>
<gene_acc>gn00003 </gene_acc>
<symbol>bar</symbol>
<cytology>60F1</cytology>
<gene_function>
</gene_function>
<transcript>
<transcript_acc>gn00003A </transcript_acc>
<gene_acc>gn00003 </gene_acc>
<protein_acc>gn00003P1 </protein_acc>
<mrna_seq>ATGCCGTGATAGCTAGCGTA</mrna_seq>
<protein>
<protein_acc>gn00003P1 </protein_acc>
<molecular_weight>55.8</molecular_weight>
<protein_seq>MSVXXXXXQLQLQ</protein_seq>
</protein>
<exon_to_transcript>
<exon>
<exon_acc>gn00003X1 </exon_acc>
<seq_start>1001</seq_start>
<seq_end>1200</seq_end>
<seq_strand>1</seq_strand>
<chromosome_arm>X </chromosome_arm>
</exon>
<exon>
<exon_acc>gn00003X2 </exon_acc>
<seq_start>2001</seq_start>
<seq_end>2200</seq_end>
<seq_strand>1</seq_strand>
<chromosome_arm>X </chromosome_arm>
</exon>
</exon_to_transcript>
</transcript>
<transcript>
<transcript_acc>gn00003B </transcript_acc>
<gene_acc>gn00003 </gene_acc>
<protein_acc>gn00003P2 </protein_acc>
<mrna_seq>ATGCCGTGATAGTTTTGGGGGG</mrna_seq>
<protein>
<protein_acc>gn00003P2 </protein_acc>
<molecular_weight>57.2</molecular_weight>
<protein_seq>MSVXXXXXSSSSSSSVVV</protein_seq>
</protein>
<exon_to_transcript>
<exon>
<exon_acc>gn00003X1 </exon_acc>
<seq_start>1001</seq_start>
<seq_end>1200</seq_end>
<seq_strand>1</seq_strand>
<chromosome_arm>X </chromosome_arm>
</exon>
<exon>
<exon_acc>gn00003X3 </exon_acc>
<seq_start>1401</seq_start>
<seq_end>1600</seq_end>
<seq_strand>1</seq_strand>
<chromosome_arm>X </chromosome_arm>
</exon>
</exon_to_transcript>
</transcript>
</gene>
<gene>
<gene_acc>gn00004 </gene_acc>
<symbol>whee</symbol>
<cytology>58C2</cytology>
<gene_function>
</gene_function>
<transcript>
<transcript_acc>gn00004A </transcript_acc>
<gene_acc>gn00004 </gene_acc>
<protein_acc>gn00004P1 </protein_acc>
<mrna_seq>GTGATGTGGCTGNNNNNTTGTTAA</mrna_seq>
<protein>
<protein_acc>gn00004P1 </protein_acc>
<molecular_weight>82.3</molecular_weight>
<protein_seq>MLSTRQQCSCSXXXXLV</protein_seq>
</protein>
<exon_to_transcript>
<exon>
<exon_acc>gn00004X1 </exon_acc>
<seq_start>4389</seq_start>
<seq_end>4418</seq_end>
<seq_strand>1</seq_strand>
<chromosome_arm>3L</chromosome_arm>
</exon>
<exon>
<exon_acc>gn00004X2 </exon_acc>
<seq_start>4890</seq_start>
<seq_end>5101</seq_end>
<seq_strand>1</seq_strand>
<chromosome_arm>3L</chromosome_arm>
</exon>
<exon>
<exon_acc>gn00004X3 </exon_acc>
<seq_start>5812</seq_start>
<seq_end>6043</seq_end>
<seq_strand>1</seq_strand>
<chromosome_arm>3L</chromosome_arm>
</exon>
</exon_to_transcript>
</transcript>
</gene>
</dataset>
Which can be expressed visually like this:
| gene_acc | symbol | cytology | gene_function | transcript | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| gn00002 | foo | 35A7 |
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| gn00003 | bar | 60F1 |
|
|
DBStag comes with a program selectall_html that produces nested HTML tables from SQL queries.
For example, the following script will query and pretty-print genes from the database:
$stag = $dbh->selectall_stag("SELECT * FROM (gene NATURAL JOIN
gene_function) NATURAL JOIN (transcript NATURAL JOIN protein) NATURAL
JOIN exon_to_transcript NATURAL JOIN exon");
foreach $gene ($stag->get_gene) {
print "*****\n";
printf "Gene:%s\nCytology:%s\n\n",
$gene->get_symbol,
$gene->get_cytology;
foreach $func ($gene->get_gene_function) {
printf " Function: %s\n", $func->get_mol_function;
}
foreach $transcript ($gene->get_transcript) {
printf " Transcript:%s\n",
$transcript->get_transcript_acc;
printf " Transcript Seq:%s\n",
$transcript->get_mrna_seq;
my $protein = $transcript->get_protein;
printf " Protein:%s\n",
$protein->get_protein_acc;
printf " Transcript Seq:%s\n",
$protein->get_protein_seq;
printf " Exons:\n";
my @exons = $transcript->find_exon;
foreach my $exon (@exons) {
printf " [%d, %d] strand:%s on %s\n",
$exon->get_seq_start,
$exon->get_seq_end,
$exon->get_seq_strand,
$exon->get_chromosome_arm;
}
}
}
|
|
Autogenerared method calls are highlighted
Click here to download |
This script produces this as output:
*****
Gene:foo
Cytology:35A7
Function: transmembrane receptor
Transcript:gn00002A
Transcript Seq:ATGAATTGCGGCTTNNNNNNNATT
Protein:gn00002P1
Transcript Seq:MQVVQVQVQVQSCSCS
Exons:
[1, 100] strand:1 on 2L
[201, 200] strand:1 on 2L
*****
Gene:bar
Cytology:60F1
Function: glutamine-tRNA ligase
Function: ATP binding
Transcript:gn00003A
Transcript Seq:ATGCCGTGATAGCTAGCGTA
Protein:gn00003P1
Transcript Seq:MSVXXXXXQLQLQ
Exons:
[1001, 1200] strand:1 on X
[2001, 2200] strand:1 on X
Transcript:gn00003B
Transcript Seq:ATGCCGTGATAGTTTTGGGGGG
Protein:gn00003P2
Transcript Seq:MSVXXXXXSSSSSSSVVV
Exons:
[1001, 1200] strand:1 on X
[1401, 1600] strand:1 on X
*****
Gene:whee
Cytology:58C2
Function: GPCR
Function: taste receptor
Transcript:gn00004A
Transcript Seq:GTGATGTGGCTGNNNNNTTGTTAA
Protein:gn00004P1
Transcript Seq:MLSTRQQCSCSXXXXLV
Exons:
[4389, 4418] strand:1 on 3L
[4890, 5101] strand:1 on 3L
[5812, 6043] strand:1 on 3L
Note that this is all achieved *without* a domain object model -
the domain model is purely relational.