Pheno-XML
Chris Mungall 2006
The RNC is the master version -
RNG and XSD is autogenerated from RNC
A document consists of some combination of genotypes and
phenotypes, manifestations of phenotypes via genotypes, and
genetic features
: File format version :
should match this schema version (eg 0.01)
Each genotype record represents a single individual or group of genetically similar individuals
NCBI TAxon ID
another genotype
A genotype is a collection of alleles
In general, only the mutant alleles are specified.
sequence variants can also be specified
a reference to a genetic feature
A genetic feature may be a gene (allele), transcript, SNP, etc
typed by SO
This is a partial model of genetic features - a full
model can be found in Chado
unique stable global identifier - this can be used to
reference more detailed info in a sequence/genetics db
optional: same as genotype.organism_id if not specified
[
SO type. "gene" for mutant allele. May also be some
kind of sequence_variant (eg deletion, P_insertion, SNP)
defaults to SO:gene (ie an allele)
]
a single genotype can be associated with multiple phenotypes
in different experiments and different environments
[
may be from genetic context ontology
e.g. dominant, codominant
]
A phenotype manifestation is an observation of a phenotype in one
more individual organisms under experimental conditions, linked
with the genetic and environmental causes of that phenotype
Each manifestation can be identified, but this is optional.
Note that the ID is for the manifestation of the phenotype
in either a single individual organism, or in a population of
indistinguishable individuals in an experiment or trial
[
free text summary of the entire phenotype manifestation;
]
[
genotypes of the individual or individuals that
show the phenotype.
]
[
The collection of characteristics exhibited in this
manifestation
]
[
support or evidence; experimental details
]
[
there is an implicit 'normal' environment
if multiple environments are specified, the association
holds for ALL of them
]
a phenotype is a collection of characteristics possessed by
an organism or collection of organisms
(or in some cases, the environment of the organism, as a
result of that organism's actions - for example a termite's nest)
the phenotype arises as a result of the organism's genotype
and the environment of the organism
Examples of phenotypes are:
* "red eyes and notched wings"
* "reduced apoptosis and increased susceptibility to cancer"
* "Initially a single lateral is formed, but it is thicker than normal
and sometimes individual bundles of actin filaments can be seen. The
laterals split near the distal tip as they elongate and at later
stages they appear to be split over more of their length."
* "interrupted imaginal disc development and stunted wings at the adult stage"
The phenotype is the entire collection of characteristics exhibited
by that organism (or collection of effectively indistinguishable
organsisms). Each organism has a single phenotype throughout
its lifespan - the characteristics that comprise that phenotype
may change, this is reflected in the phenotype_character records
each phenotype record represents a specific phenotype which
may be instantiated in one or more organisms. If two or more organisms have
indistinguishable phenotypes, the same
record may be used. If phenotypes are distinguishable then a
different record should be used.
[
An identifier for a phenotype term. If supplied, this would
come from a full
ontology of pre-coordinated phenotype terms such as MP or
the plant_trait ontology
]
A phenotype character is one or more qualities inhering in a single
bearer entity. For example, a PC with quality type "red" and bearer
type "eye" (written here for convenience as "red eye") represents a
particular instantiation of "red eye"/"red eyes" in a particular
organism or collection of like organisms
examples of phenotype character types:
red hair, low bone mass, square jaw, long thick bristle,
2cm long tail, spermatocyte devoid of asters, elevated blood pressure,
lacking wings, having an additional digit relative to a wild type individual,
having a brain size during embryonic development that is larger than
a typical individual of the same genus/order, short attention span, having
an abnormal shape, abnormal fusion between digits, lacking red spots,
abnormally low numbers of B-cells after infection, lacking circularity in
the cross-section of an arista lateral
A phenotype character record represents an instance of a characteristic
possessed by a single organism, or a collection of identical characteristics
possessed by like organisms.
[
free text summary of the phenotype character
]
[
the type of entity in which the qualities inhere
can be a process type or continuant type
]
[
the quality types (attributes, properties) that are the
hallmarks of this character
]
each 'bearer' record represents a particular bearer of some
general bearer type: for example, a particular wing of a
particular fly
entities can be spatial objects (3D e.g. anatomical parts)
or spatiotemporal entities (4D e.g. processes)
[
qualities can be borne by processes or continuants;
the bearer can thus be drawn from GO process, component,
or from an ontology of cells or larger anatomical entities.
]
aka attribute aka property
any one particular bearer can have multiple qualities inhering
in it. For example, a particular tail can have qualities such
as shape, length, etc
each quality inheres in a single bearer throughout the duration
of that entities existence. For example, my height is a single
quality that inheres in my body throughout my life. This
quality can take on different states throughout the lifetime
of the bearer.
not required: it can be deduced from the quality type.
if provided, the value of this field is always an OBO-style identifier from PATO
may be useful for report purposes; eg the color of this entity is red
(where 'color' is the determinable)
certain kinds of quality implicitly or explicitly refer to
numbers. Examples of these are the qualities of "having
supernumery parts" or the qualities of "lacking parts". It
can be useful to state the exact number here, rather than
overloading the ontology of qualities.
the number will refer to either the relative or absolute
quantity....?
This is purely optional, as it is usually sufficient to
state that a particular hand has supernumery fingers,
without saying how many extra
[
quality type. comes from PATO
]
[
optional free text
]
Relational qualities inhere in additional entities.
This should be used for relational qualities
(e.g. sensitivity, distance_from)
and not for monadic entities.
other examples include:
"spermatocytes *devoid_of* asters", "thorax *lacks* wing",
hand *has-supernumerary-parts* finger
in these cases, the related_entity records the _type_ of part
that is lacking or present in lower or higher numbers
sometimes a state is only instantiated if a certain condition holds
[
for relative qualities we may want an explicit "yardstick"
]
[
not all phenotype characters can be specified in
qualitative terms using classes from an ontology of
quality states. Sometimes unit-based quantitative measurements
must be used - for example, weight in kg
rather than overloading the 'type' field we treat measurements
separately. Measurements complement quality space annotations;
a phenotype character can be both "shorter_than" wild-type
and also 2cm in length. Either or both of these can be
specified in the annotation
]
[
Qualities can change over time: a quality may instantiate 'hot' at one time and 'cold' at a later time.
we may know the exact time stage (eg larval to pupal), or we may have an open range (eg post-larval), repeated
ranges..
multiple qualifiers can be specified (eg before t2 and after t1)
if no qualifiers are specified then there is no implied time - it is assumed to happen at some point
in the organisms life time, UNLESS it is constrained by the bearer bearer type.
for example, if the bearer is "larva" then the attached state occurs during the larval stage. for this
to be infered, the appropriate relations must be specified in the anatomical ontology
]
quality types can be absolutist (square, red) or
relative (high pressure, short height). relative
always implies a context. The default context is
always an equivalent phenotype character in a
typical individual of the same species. We can be
more explicit if we like by specifying in_comparison_to
Examples:
an allele which has more or less pronounced effects
in different genetic backgrounds;
A yellow eye which is red in wildtype;
if left blank, this is inferred from the quality type;
for example, if Q="high pressure", then the relation is
higher-than or increased-with-respect-to
Can be an ID of a particular phenotype manifestation, or
an ID for a whole taxon
[
What the quality type would be in wild-type, or
whatever the comparison target is. Can specify multiple
values if these are exhibited by the target.
for example, if color is red or pink in wildtype, then
we can specify these two here.
specifying target types can lead to redundancy, this
is only required when for example our canonical
anatomical ontologies are not specific w.r.t quality types
]
[
What typical measurements would be in wild-type, or
whatever the comparison target is. Can specify multiple
values if these are exhibited by the target
]
for abnormal/normal
these should come from the PATO modifier terminology
: measurement :
examples:
length measurement (with ruler, in cm)
spot intensity
real-world measurements (exlcuding count measurements) are inexact
and hence ranges.
the unit should be a reference to an SI unit
Q: do SI units have an identifier system,
or do we just use eg "mm"
Q: do we need the ability to specify composite units,
eg lb/kg - or is this handled by SI?
the only time this would be omitted is if we only have a range
Q: does the range reflect our lack of knowledge (eg accuracy
of measuring instrument) or an actual range of values?
or should we represent an actual range as multiple measurements?
TODO: reference to a particular experiment
evidence: could refer to papers, figures in papers, raw images, etc
The type of environment.
For now this is just a pre or post coordinated type reference. In future
We may want to add more under here
A reference to a type. Here type is used in the ontology sense of
a "kind" or "universal". A type can be referenced simply by giving a
term ID from an ontology. More complex types can be "post-coordinated"
Base/generic type. This is often all that is required.
the ID is OBO-compliant and can be from any ontology
[
We can post-compose references to more detailed types
for which no defined term exists
]
for cross-products
these are the characteristics that distinguish a specific type
from a general type
e.g. if we want to annotate to an bearer "scutum bristle" and the
anatomical ontology does not contain this term, only "scutum" and
"bristle". In this case the bearer type_id (ie the primary term) would
be the ID for "bristle". We would have one differentium "part_of scutum"
temporal_qualifier
examples:
during larval stage;
after pupal stage;
at 7pm on 2000/01/01;
overlapping M1-phase;
we do not tie ourself to any one theory of time - time ranges and points are represented using time_range
e.g. before, after, during - would come from an ontology of temporal relations
this is optional. If not specified, treated as *overlap* relation
represents an interval or point in time
(we are neutral w.r.t points vs intervals)
: id : OPTIONAL
this is typically not used. You would use it if you wish to grant an identifier to
a particular time point or range. This would allow you to make partial-order statements;
e.g. t1 before t2
[
The interval may be typed by a term/class from an ontology of
developmental stages, processes or life cycles.
]
[
an OPTIONAL measurement of the time; eg in seconds, hours,
years, etc. Measured either from when the particular attribute
(and hence bearer) came into existence, or in absolute time.
]
an optionally-attributed textual description
e.g pubmed ID