|
Ontology |
PATIKA Ontology
We have described an ontology [1] to model networks of cellular processes through integration of information on individual pathways. Our ontology is suitable for modeling incomplete information and abstractions of varying levels for complexity management. Furthermore, it facilitates concurrent modifications and extensions to existing data while maintaining its validity and consistency.
PATIKA Objects
Every first class object in the PATIKA ontology is a PATIKA object, which describes the common functionality and information. A PATIKA object has an ID that uniquely describes it with a version, an author (the user who first created this object), and an experimental data source, describing how this phenomenon was observed and points to the literature references. A data source can be:
Every PATIKA object also has a name and description, which comply with external naming conventions and vocabularies (such as Human Genome Nomenclature [2]) whenever possible. Finally every PATIKA object is optionally associated with a set of GO Terms [3].
Bioentities
More than often actors, especially macromolecules have a common path of synthesis and/or are chemically very similar. For example, a p53 protein may be in native, phosphorylated, and MDM2-bound forms. Another example is cytoplasmic and extracellular calcium. These molecules usually have different information contexts. It is possible to model all these molecules as separate entities; however, it is not practical as these grouping are very natural and complies with the current biological paradigm. Moreover there is a wealth of information at this level of detail, thus we address entities at this level as well. Therefore it is more agreeable to maintain such biological or chemical groupings as bioentities while representing these �minor� changes in their information context with states.
In most genomic and proteomic databases such as GeneCards, SwissProt or GO, and in high-throughput data such as microarray and Y2H, bioentities form the unit entries. Each bioentity stores a set of external references mapping to these databases, and acts as a gateway to the external resources.
We hope to cover other entities like operons in the future versions of the ontology.
States
We model the actors of these events as states. The term is very generic and encapsulates macromolecules (e.g., DNAs, RNAs, and proteins), small molecules e.g., ions, ATP, and lipids), or even physical actors (e.g., heat, radiation, and mechanical stress). States also represent molecular complexes, or conceptual abstractions that behave like state. Depending on their nature, states are classified as either compound or simple.
Simple States
Simple states represent tangible and unit phenomena. They belong to a bioentity. Each state of a bioentity represents a change in the information context. Those changes are represented with the following bioentity variables:
Any combination of bioentity variables form a unique state of this bioentity. Note that only a very small portion of the state space actually occurs in biological systems.
An important side note is in pathway drawings, it is common represent different states as a single biological entity, even when the mechanistic detail is known. This is an oversimplification as different states can have very different and sometimes conflicting effects. Mapping such information to PATIKA graphs might not be trivial, as in most cases the mechanistic detail is unknown. PATIKA allows defining relations at both bioentity and state level to address these levels of detail and abstraction.
In some cases, a bioentity�s states are also labeled with various logical ( as opposed to physical) tags, such as active form, open channel or resting state. It is best to capture such tags with the states name, as they are context dependent.
Another point is states can be ubiquitous, i.e. participates in a significantly high number of reactions. Most of the time this is true for small molecules such as ATP, or water, which have generic and structural roles. For visualization and analysis, such states can be problematic. PATIKA allows labeling states as ubique, and handles them differently during visualization ( e.g. splits up cytosolic ATP for each reaction it participates in) and query ( e.g. ignores ATP during shortest path query)
States map to a class of molecules/entities rather than a single molecule. It might be that this group is not totally homogenous. For example it is not desirable for most of the cases to model the rotamers of a protein as different states, as there are combinatorially many of them, they are very short lived (in the range of nanoseconds) and switching from one of them to another is almost instantaneous. However PATIKA ontology does not define hard lines for the level of abstraction, as it readily provides a framework for modeling and representing multiple levels of detail. So we can say that state variables can be incomplete and overlapping. An example of incomplete state variable is �phosphorylated p53�. However this representation poses a subtle problem. Since we do not know at which site the p53 is phosphorylated, relationship between this state and phosphrylated p53 at 153Arg is not clear. It might be that two authors actually talk about different states, or the latter is a non-proper subset of the first. A sensible approach is to delagate this issue to the submitter, and to the expert, as it is really hard to come up with a context free resolution rule. If the first is the case, than the submitter must modify the phosphorylated p53 entry to bring it into the correct level of detail.On the other hand if it is the second case they must switch the p53 phosphorylated into an incomplete state to indicate different levels of detail.
Obviously there are still a lot of important issues to cover, most important ones being combinatorial states, generics, polymerization, semi-quantitative phosphorylations. We are constantly evaluating examples ( use-cases) from the biological literature to come up with improving our ontology to include these "hard" cases.
Compound States
A compound state is a grouping of other PATIKA objects, which exhibits a state-like behavior, and needs to be addressed at this level. There are two types of compound states, complex and abstraction.
In biological systems molecules often form clusters for performing proper tasks, behaving like a single state. We consider each member of a molecular complex as a new state of its biological entity. The function of a molecular complex is affected by the specific binding relations within itself. Therefore these binding relations must be represented in the model as well. Moreover, members of a molecular complex may independently participate in different transitions; thus one should be able to address each member individually. In addition, a molecular complex may contain members from multiple neighboring compartments. In that case, always one of those compartments is a member type compartment. It is actually possible to model complexes in a similar fashion to membrane spanning proteins.
Complex states has a set of simple state members which are complex members.
Complex states do not have a bioentity, as they are not simple. However their members have their own bioentities. This information is used for complexes as well, e.g. for querying.
An important question is �what is a complex really? Do we model short lived binding relations as complexes or activation relationships?�. PATIKA�s answer is �never use a compound graph unless you need to�,and this also applies to complexes. If an activation relation would do the trick, it is best to use it. If that level of detail is not sufficient for another user, they can re-edit it to add a complex at that point.
Transitions
A cell is not a static entity, neither are its actors. Molecules in a cell are synthesized, modified, transported and degraded constantly to respond to the changes the environment, or to accomplish a task. One can model such changes as quantitative chemical reactions. However this would reduce the coverage of the model, as currently both molecular concentrations and rate constants for most of these reactions are unknown. It is often preferred to represent these changes qualitatively since this better suits current experimental data.
A transition has a set of states as its substrates ( inputs) and products (outputs). A transition occurs only when all of its substrates are present and activation conditions are satisfied; a function of the certain other states. These states are called the effectors of a transition. Two types of effector relations are defined, activator and inhibitor, for positive and negative regulation respectively. When a transition occurs, all of its products are generated. We take great care to make all PATIKA transitions compatible with the cannonical biochemical paradigm.
PATIKA uses a pragmatic approach for formally defining transitions: any event that changes one or more states to another set of states is a transition. This definition delegates the exact definition of transition to the exact definition of state, and as mentioned above, level of modeling detail for PATIKA states are very flexible. It follows that PATIKA ontology can model transitions at multiple levels, allowing high coverage, without losing from its content. Two transitions are equal if they have the same set of substrates and products. This reveals two invariants for transitions:
Although transitions can have a very large spectrum, we expect that most of them will fall to the certain classes. Those classes are captured by PATIKA transition tree. Under certain circumstances, multiple transitions having the same state as a substrate may affect each other through depleting this common substrate. This happens when the equilibrium constant of a transition is relatively much higher than the others. If such a difference occurs among the equilibrium constants of transitions, we call the transition with the highest equilibrium constant exhaustive over other transitions for the common substrate. Transitions having the same order of equilibrium constant, on the other hand, are said to be cooperative. Transitions that have one or more substrates are exhaustive on each other, through a depleting substrate. Which one of these substrates are likely to deplete is up to the modeler.
Two transitions are called inverse of each other iff one transitions product set is other�s substrate set and vice versa.
Following properties are missing from the current ontology, but were discussed at some point and left out for future versions.
Transition Rules
The term transition logic coins a rather wide spectrum. In modeling transition logic we can use boolean predicates, linear equations, stochastic models, pi-calculus etc. PATIKA ontology assumes that the representation and equality of the transition is independent of the transition logic. We assume that transition logic is represented in the transition rule, which is not an internal part of the ontology. Currently the only way of associating the transition rule with the transition is via custom user object.
Effector Combinations
Currently we assume that any combination of effectors can regulate a transition. This might not be the case, for example two inhibitors may never be present together in the cell, or when two inhibitors are present they cancel out each other. So one point of view is to think of each effector combination as a separate transition. If it turns out that actually only a small number of all combinations of effectors are significant, a possible approach is to use the already existing compound graph notion to include children nodes into the transition for all significant combination sets, in order to be able to address them separately.
Interactions
Relations between bioentities, states and transitions are described using interactions, which can be directed or undirected. Interactions are divided into two based on their level of detail.
Mechanistic Interactions
Mechanistic interactions define relations between states and transitions at the chemical level of detail. There are five types of them:
Bioentity Interactions
Bioentity interactions describe relations between bioentities but not states. They represent incomplete information, and always map to one or more mechanistic level interaction, although latter one might not be identified yet. There are six types of bioentity interactions:
Compartments
A significant number of transitions transport molecules between cellular compartments. Transitions that a state can participate in are strictly related to its compartment; thus a change in the compartment means a change in the state�s information context. We choose to incorporate the state�s compartment in the model.
As the compartments and their adjacencies are cell type dependent, compartmental structure should be modeled as part of the ontology.
Membranes pose an additional problem since not only a molecule may be located completely inside the membrane but also it may span one or both of its neighboring compartments. For membranes there are four types of sub-locations, two sides of the membrane, inside membrane and spanning membrane.
Abstractions
Network of molecular interactions derived from current biological data is incomplete and complicated. Complete network of cellular events is clearly beyond human perception. Different levels of abstractions are necessary to make effective analysis of cellular processes and dealing with complexity better.
Representing a cellular pathway as a single process or grouping related processes under a certain cellular mechanism would enhance the comprehensibility of the network of events (Figure 3). Such mappings are already present and may also be valuable for querying. We model such groupings using regular abstractions. Regular abstractions can be arbitrarily nested and can intersect. However they can not be addressed directly, i.e. they have no incident edges.
Since the data on cellular processes is not complete, different levels of information may be available for certain events. In cases where it is not identified which state among a set of states constitutes the substrate, product or effector of a transition, or where target transition of an effector is obscure, we may need to abstract these states (transitions) as a single state (transition) to represent the available information despite its incomplete nature. An edge defined on an incomplete state means that it is actually defined on at least one state inside but the exact state is not known. A similar semantic applies to incomplete transitions.
In biological systems, a gene is often duplicated throughout its evolution serving a different function. A special case occurs when this differentiation serves as a specialization of a generic mechanism. For example when referring to the wnt gene, we actually mean nineteen various similar genes in human [4]. These genes are all activated by different stimulus at different tissues and can lead to different responses even though the signal processing mechanism is similar. Bhalla also describes common process motifs in signaling pathways, which are even more elementary operations that are reused through the entire network [5]. Our ontology supports representation of such homologies using abstractions.
References
[1] E. Demir, O. Babur, U. Dogrusoz, A. Gursoy, A. Ayaz, G. Gulesir, G. Nisanci and R. Cetin-Atalay (2004) "An Ontology for Collaborative Construction and Analysis of Cellular Pathways", Bioinformatics, 20(3), 349-356.
[2] H.M. Wain, M.J. Lush, F. Ducluzeau, V.K. Khodiyar, S. Povey (2004) Genew: the Human Gene Nomenclature Database, 2004 updates. Nucleic Acids Res. 32 Database issue:D255-7. (PMID: 14681406)
[3] Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium (2000) Nature Genet. 25: 25-29.
[4] J. Miller (2001) The Wnts. Genome Biol., 3, reviews 3001.1�3001.15.
[5] U. Bhalla (2002) The chemical organization of signaling interactions, Bioinformatics, 18, 855�863.