Pheno-XML Chris Mungall 2006 The RNC is the master version - RNG and XSD is autogenerated from RNC A document consists of some combination of genotypes and phenotypes, manifestations of phenotypes via genotypes, and genetic features : File format version : should match this schema version (eg 0.01) Each genotype record represents a single individual or group of genetically similar individuals NCBI TAxon ID another genotype A genotype is a collection of alleles In general, only the mutant alleles are specified. sequence variants can also be specified a reference to a genetic feature A genetic feature may be a gene (allele), transcript, SNP, etc typed by SO This is a partial model of genetic features - a full model can be found in Chado unique stable global identifier - this can be used to reference more detailed info in a sequence/genetics db optional: same as genotype.organism_id if not specified SO type. "gene" for mutant allele. May also be some kind of sequence_variant (eg deletion, P_insertion, SNP) defaults to SO:gene (ie an allele) a single genotype can be associated with multiple phenotypes in different experiments and different environments may be from genetic context ontology e.g. dominant, codominant A phenotype manifestation is an observation of a phenotype in one more individual organisms under experimental conditions, linked with the genetic and environmental causes of that phenotype Each manifestation can be identified, but this is optional. Note that the ID is for the manifestation of the phenotype in either a single individual organism, or in a population of indistinguishable individuals in an experiment or trial free text summary of the entire phenotype manifestation; genotypes of the individual or individuals that show the phenotype. The collection of characteristics exhibited in this manifestation support or evidence; experimental details there is an implicit 'normal' environment if multiple environments are specified, the association holds for ALL of them a phenotype is a collection of characteristics possessed by an organism or collection of organisms (or in some cases, the environment of the organism, as a result of that organism's actions - for example a termite's nest) the phenotype arises as a result of the organism's genotype and the environment of the organism Examples of phenotypes are: * "red eyes and notched wings" * "reduced apoptosis and increased susceptibility to cancer" * "Initially a single lateral is formed, but it is thicker than normal and sometimes individual bundles of actin filaments can be seen. The laterals split near the distal tip as they elongate and at later stages they appear to be split over more of their length." * "interrupted imaginal disc development and stunted wings at the adult stage" The phenotype is the entire collection of characteristics exhibited by that organism (or collection of effectively indistinguishable organsisms). Each organism has a single phenotype throughout its lifespan - the characteristics that comprise that phenotype may change, this is reflected in the phenotype_character records each phenotype record represents a specific phenotype which may be instantiated in one or more organisms. If two or more organisms have indistinguishable phenotypes, the same record may be used. If phenotypes are distinguishable then a different record should be used. An identifier for a phenotype term. If supplied, this would come from a full ontology of pre-coordinated phenotype terms such as MP or the plant_trait ontology A phenotype character is one or more qualities inhering in a single bearer entity. For example, a PC with quality type "red" and bearer type "eye" (written here for convenience as "red eye") represents a particular instantiation of "red eye"/"red eyes" in a particular organism or collection of like organisms examples of phenotype character types: red hair, low bone mass, square jaw, long thick bristle, 2cm long tail, spermatocyte devoid of asters, elevated blood pressure, lacking wings, having an additional digit relative to a wild type individual, having a brain size during embryonic development that is larger than a typical individual of the same genus/order, short attention span, having an abnormal shape, abnormal fusion between digits, lacking red spots, abnormally low numbers of B-cells after infection, lacking circularity in the cross-section of an arista lateral A phenotype character record represents an instance of a characteristic possessed by a single organism, or a collection of identical characteristics possessed by like organisms. free text summary of the phenotype character the type of entity in which the qualities inhere can be a process type or continuant type the quality types (attributes, properties) that are the hallmarks of this character each 'bearer' record represents a particular bearer of some general bearer type: for example, a particular wing of a particular fly entities can be spatial objects (3D e.g. anatomical parts) or spatiotemporal entities (4D e.g. processes) qualities can be borne by processes or continuants; the bearer can thus be drawn from GO process, component, or from an ontology of cells or larger anatomical entities. aka attribute aka property any one particular bearer can have multiple qualities inhering in it. For example, a particular tail can have qualities such as shape, length, etc each quality inheres in a single bearer throughout the duration of that entities existence. For example, my height is a single quality that inheres in my body throughout my life. This quality can take on different states throughout the lifetime of the bearer. not required: it can be deduced from the quality type. if provided, the value of this field is always an OBO-style identifier from PATO may be useful for report purposes; eg the color of this entity is red (where 'color' is the determinable) certain kinds of quality implicitly or explicitly refer to numbers. Examples of these are the qualities of "having supernumery parts" or the qualities of "lacking parts". It can be useful to state the exact number here, rather than overloading the ontology of qualities. the number will refer to either the relative or absolute quantity....? This is purely optional, as it is usually sufficient to state that a particular hand has supernumery fingers, without saying how many extra quality type. comes from PATO optional free text Relational qualities inhere in additional entities. This should be used for relational qualities (e.g. sensitivity, distance_from) and not for monadic entities. other examples include: "spermatocytes *devoid_of* asters", "thorax *lacks* wing", hand *has-supernumerary-parts* finger in these cases, the related_entity records the _type_ of part that is lacking or present in lower or higher numbers sometimes a state is only instantiated if a certain condition holds for relative qualities we may want an explicit "yardstick" not all phenotype characters can be specified in qualitative terms using classes from an ontology of quality states. Sometimes unit-based quantitative measurements must be used - for example, weight in kg rather than overloading the 'type' field we treat measurements separately. Measurements complement quality space annotations; a phenotype character can be both "shorter_than" wild-type and also 2cm in length. Either or both of these can be specified in the annotation Qualities can change over time: a quality may instantiate 'hot' at one time and 'cold' at a later time. we may know the exact time stage (eg larval to pupal), or we may have an open range (eg post-larval), repeated ranges.. multiple qualifiers can be specified (eg before t2 and after t1) if no qualifiers are specified then there is no implied time - it is assumed to happen at some point in the organisms life time, UNLESS it is constrained by the bearer bearer type. for example, if the bearer is "larva" then the attached state occurs during the larval stage. for this to be infered, the appropriate relations must be specified in the anatomical ontology quality types can be absolutist (square, red) or relative (high pressure, short height). relative always implies a context. The default context is always an equivalent phenotype character in a typical individual of the same species. We can be more explicit if we like by specifying in_comparison_to Examples: an allele which has more or less pronounced effects in different genetic backgrounds; A yellow eye which is red in wildtype; if left blank, this is inferred from the quality type; for example, if Q="high pressure", then the relation is higher-than or increased-with-respect-to Can be an ID of a particular phenotype manifestation, or an ID for a whole taxon What the quality type would be in wild-type, or whatever the comparison target is. Can specify multiple values if these are exhibited by the target. for example, if color is red or pink in wildtype, then we can specify these two here. specifying target types can lead to redundancy, this is only required when for example our canonical anatomical ontologies are not specific w.r.t quality types What typical measurements would be in wild-type, or whatever the comparison target is. Can specify multiple values if these are exhibited by the target for abnormal/normal these should come from the PATO modifier terminology : measurement : examples: length measurement (with ruler, in cm) spot intensity real-world measurements (exlcuding count measurements) are inexact and hence ranges. the unit should be a reference to an SI unit Q: do SI units have an identifier system, or do we just use eg "mm" Q: do we need the ability to specify composite units, eg lb/kg - or is this handled by SI? the only time this would be omitted is if we only have a range Q: does the range reflect our lack of knowledge (eg accuracy of measuring instrument) or an actual range of values? or should we represent an actual range as multiple measurements? TODO: reference to a particular experiment evidence: could refer to papers, figures in papers, raw images, etc The type of environment. For now this is just a pre or post coordinated type reference. In future We may want to add more under here A reference to a type. Here type is used in the ontology sense of a "kind" or "universal". A type can be referenced simply by giving a term ID from an ontology. More complex types can be "post-coordinated" Base/generic type. This is often all that is required. the ID is OBO-compliant and can be from any ontology We can post-compose references to more detailed types for which no defined term exists for cross-products these are the characteristics that distinguish a specific type from a general type e.g. if we want to annotate to an bearer "scutum bristle" and the anatomical ontology does not contain this term, only "scutum" and "bristle". In this case the bearer type_id (ie the primary term) would be the ID for "bristle". We would have one differentium "part_of scutum" temporal_qualifier examples: during larval stage; after pupal stage; at 7pm on 2000/01/01; overlapping M1-phase; we do not tie ourself to any one theory of time - time ranges and points are represented using time_range e.g. before, after, during - would come from an ontology of temporal relations this is optional. If not specified, treated as *overlap* relation represents an interval or point in time (we are neutral w.r.t points vs intervals) : id : OPTIONAL this is typically not used. You would use it if you wish to grant an identifier to a particular time point or range. This would allow you to make partial-order statements; e.g. t1 before t2 The interval may be typed by a term/class from an ontology of developmental stages, processes or life cycles. an OPTIONAL measurement of the time; eg in seconds, hours, years, etc. Measured either from when the particular attribute (and hence bearer) came into existence, or in absolute time. an optionally-attributed textual description e.g pubmed ID