This directory contains all files pertinent to the publication Frise, Hammonds, Celniker (2009). Contact for questions: erwin@fruitfly.org Contents -------- . |-- GO-associations | |-- GOcorr-all.txt | |-- GOcorr-clustermembers.txt | `-- GOcorr-graphcontents.txt |-- GO-enrichment | |-- Amigo | | |-- genes-clus01.txt | | |-- genes-clus02.txt ... | | `-- genes-clus39.txt | `-- amigo.tar.gz | `-- Figure 6 | |-- go-dotplot.csv | `-- go-dotplot.txt |-- TriangulatedImages | |-- Coordinates | | |-- points.txt | | `-- triangle_corners.txt | |-- EFF | | |-- ti_200907.eff | | |-- ti_stage4-6.eff | | `-- ti_stage4-6_pat.eff | `-- Matlab | |-- ti_200907.mat | |-- ti_stage4-6.mat | |-- ti_stage4-6_pat.mat | |-- ti_stage9-10.mat | `-- ti_stage9-10_pat.mat |-- clusters | |-- cluster_assignment.csv | |-- cluster_assignment.mat | |-- gene_assignments | | |-- genes-clus01.txt | | |-- genes-clus02.txt ... | | `-- genes-clus39.txt | `-- gene_assignments.tar.gz |-- code | | ... |-- dataset | |-- stage4-6_reference.csv | |-- stage4-6_reference.mat | |-- stage9-10_reference.mat | |-- image-details.csv | `-- imageinfo.csv |-- domains | |-- all_genes.csv | `-- named_genes.csv `-- outline |-- outlines_0-79.tar.gz `-- sample `-- img_dir_9.eff Common file formats ------------------- - EFF files (.eff) EFF stands for "Embryo File Format" and was loosely modeled after the common GFF format. The first line is a descriptor/parsing verification of the file format and the version. Currently it's always "EFF 1.0". Subsequent lines are in a multicolumn format with each row describing one file or reference. Following the columns: \t\t\t\t\t\t\t \t = Tab space Reference = File-name of the image file or reference to a csv file content Program = name of the program that produced that line. "Matlab fembryo" for segmentation outlines, "Matlab mesh" for TI or "genes filtered" for the condensed and filtered dataset. Output category = Category of data line, "outline" for segmentation outline or "mesh" for TI Score = Score (if any) produced by the program. In case embryo segmentation outlines a bitmap score with bit1=single embryo, bit2=passed quality control, bit3 and up=multiple touching embryos and which separation step was passed successfully. bit3 and up didn't prove useful. Number of data points = For category "outline", number of x/y coordinate pairs (usually 360), for "mesh", number of TIs (311). Image size = 0.5, all images processed where reduced by 1/2 data = x/y coordinate pairs ("outline") or staining intensities of TI triangles ("mesh"). Individual data points are separated by commas/no space - Matlab matrices (.mat) Matlab 7.x matrices of the corresponding eff or csv files. - Comma separated value files (.csv) Standard CSV format with commas separating the entries and text entries enclosed in quotes. Can be opened in any spreadsheet application (e.g. Excel) or imported with a standard CSV parser. Files & directories ------------------- - dataset CSV files of the dataset used for the analysis imageinfo.csv contains the gene association and stages for each file-name. Columns should be pretty much self-explanatory. image-details.csv contains the orientation for the images. Following the content of the columns: image_path = name of the jpg file dv = embryo orientation: lateral, ventral or dorsal handedness = orientation adjustment. Can be: AP_inverted = flip anterior/posterior axis DV_inverted = flip dorsal/ventral axis AP_DV_inverted = flip both image_processing_flags = if there is a problem with the image and/or my mesh representation image_reject = something wrong with the image, don't use that mesh_reject = something wrong with my mesh (usually a segmentation error). stage4-6_reference.csv/mat contains the reference for the reduced dataset. Columns are as following: reference = reference number, used in Triangulated Image data files order = sorting of the images from early development to later development patterned = 1 if selected as a gene with distinct expression pattern, 0 otherwise symbol = Gene symbol FBgn = Flybase reference number of the gene stage9-10_reference.mat is provided as second example at later stages. - outline x/y coordinates of the segmentation outlines for each embryo. Data are provided as compressed EFF files (one for each 1000 images) and an uncompressed sample EFF file for images insitu9001.jpe - insitu10000.jpe. The file names img_dirX denote the first digit of the file name (e.g. insitu22320.jpe is in img_dir22). - TriangulatedImages Triangulated images from the publications as EFF format and Matlab matrix. Every TI is represented as struct in following format: (e.g. sample) sample.p = x/y coordinates of the triangle corner points, numbered 1-180 (also in Coordinates/points.txt) sample.p_scale = sample.p scaled to a more display friendly scale. sample.t = the 311 triangles, each represented as 3 corner points in p (also in Coordinates/triangle_corners.txt) sample.stain = expression intensities (each 311 triangles) sample.files = file names corresponding to the TIs The matrix with all TIs contains in addition, following triangle connectivity data (used by the MRF functions): ti.tn = for each of the 311 triangles, what other triangles share corner points ti.t_conn = connectivity matrix for each of the 311 triangles. 1 indicates two shared corner points ti.t_edge = for each of the 311 triangles, thenumber of corners at the boundary The calculations behind the connectivity data are shown in one of the code examples (triangle_math.txt). ti_200907 All triangulated images in the dataset. Updated TIs will appear with later dates. ti_stage4-6 Reduced dataset for all stage 4-6 embryos ti_stage4-6_pat Selection of the patterned TIs. This dataset was used for clustering For a later stage example, also stage 9-10 reduced TIs and patterned TIs are provided. - clusters Contents of the clusters cluster_assignments.csv/mat contain the cluster assignments in reference to dataset/stage4-6_reference.csv/mat gene_assignments contain the gene symbols in each cluster, for each individual cluster as text file or all as tar archive. - GO-enrichment Results of GO-enrichment analysis. amigo contains the raw output for enriched GO terms with p < 0.001. Figure_7 contains the detailed contents of the GO-dotblot in Figure 7B of the paper. Figure_7/go-dotblot.csv has following format: cluster = cluster number parent_goid = GO-id for parent umbrella term parent_goterm = corresponding GO-term for parent_goid (as shown in Figure 7B) goid = GO-id for the enriched function associated with the parent term goterm= GO-term for goid entry Figure_7/go-dotblot.txt contains the detailed terms and id for the entire Figure 7B in human readable format, categorized by parent terms and shown for each cluster. - GO_associations Contains the raw data for the GO-association analysis (Figure 6C) GOcorr-all.txt contains all pairs of associations between GO-terms (separated by dashes ("-")) GOcorr-clustermembers.txt contains the clusters inside each node in the graph GOcorr-graphcontents.txt contains only the pairs represented by the graph - domains Contains genes corresponding to TIs in 14 expression domains. named_genes.csv is identical to Supplemental Table II, all_genes.csv contains also unnamed genes. - code See README-MSB.txt and README-online.txt.