2010-09-02

An inarticulate account on ontologies

This post was inspired by An Introduction to Ontology: from Aristotle to the Universal Core, which is a training course carried out in eight lectures delivered by a notable ontologist Barry Smith and a high-quality web content at the same time.

We share the world and we can also share our descriptions of this world. If we want to share our descriptions of the world with computers these description might take up the form of ontologies. We use natural languages for communicating our descriptions of the world and to express a formalized conceptualization of the world we use ontological languages. There are different maps of the structure of reality and ontologies are just one kind of them. Ontologies can be seen as windows through which we look at the universals in reality.

When two descriptions of the world are sharing the same language they can be combined and integrated. There are different levels of integration that is possible depending on the things the descriptions have in common. First level is sharing the data model which means that the structure of the descriptions matches. Next level is sharing the conceptualization of a part of the world the descriptions are about. Third level is sharing the concepts from the conceptualization.

The necessary condition for collaboration in science is sharing the way of describing the world. In medieval times of scholasticism, Latin was established as a language of science, as a controlled vocabulary that every researcher used at that time. 

To create the language which scientist can speak to each other we need ontologists. The first ontologist was probably Aristotle as he proposed a standard classification of the human knowledge available at that time. Carl Linée was by this standards also an influential ontologist because he made a taxonomy for plants and animals that used extensively in the following years. On the other hand, Immanuel Kant was a reverse-ontologist because of his claim that the structure of the language is the key to the structure of reality. He actually interchanged one particular description of the reality (a natural language) for the natural granularity present in reality itself.

The need for a shared ontology may be more obvious in natural sciences such as physics or biomedicine than in social sciences and humanities. The natural sciences deal with physical world so the ontologies for this domain must present its conceptualizations.

The physical world has holes in it. Places are essentially holes, which is important because we can be in (occupy) such holes. Also, the physical reality has a natural granularity, which enables only certain ways of how partitioning. These partitions are shared social constructs forming multiple transparent layers on each other expressing different levels of granularity.

Discrepancies can arise when you try to partition a continuum category, like a colour. For such categories there is no natural granularity so the conceptualizations of them must be seen as purely arbitrary human constructs. However, most categories are not like that and for them some conceptualizations can be proven better than others. So this is the domain of scientific research.

Scientific research can be also seen as a process to obtain finer-grained partitions of reality. When we encounter a fringe instance of a category, we try to find a new conceptualization with a higher granularity that can explain the instance better. For example, an ostrich is a fringe instance of the category of birds because it cannot fly, so we try to find an explanation for this instance belonging to this category on a finer-grained level, say on the level of DNA.

Likewise, we might see the evolution of science as a convergence to a set of shared ontologies. The converging aspect is important because it enables compare different sources. Once two information sources use the same ontology they become comparable. This also implies that the sources can be integrated together. For example, using the same concepts from an ontology of standards units of measure, say kilograms, means that two measures expressed in kilograms can be compared. Having comparable science and comparable research findings is essential for the further progress of sciences. Barry Smith, the ontologist whose lecture series inspired this post, proposed principles for comparable science:

  1. Scientific theories must be common resources that cannot be bought or sold.
  2. They must be intelligible to a human being.
  3. They must use open publishing venues.
  4. They must constantly evolve to reflect results of scientific experiments, which means they must be evidence-based.
  5. They must be synchronized by a common system of units and common mathematical theories that are built by adults.

Barry Smith expresses his concern with shared ways of describing the world when he says that Scientists should not be free to take existing terms and give them new meaning. In fact, he stands in the opposition to the linked data initiative which favors building order and shared understanding from the bottom up, when he says:

The attitudes of Tim Berners-Lee, which are in favour of freedom and anarchy, and creativity, and all those nice things, mitigate against the coordination which is necessary to make good scientific ontology work - in a way good science works.

The linked data publishing model depends on availability of light-weight ontologies. However, Barry Smith advocates for more scientific approach in developing an ontology, the one that is based on the best scientific theories available at the time. The benefit of this approach is that then domain experts can help you validate your ontology. The feedback from the community of users is an important requirement in development of an ontology and, as Barry Smith says, ontology is not something people should do alone, without public supervision.

In my opinion, this is a difficult question to resolve and it is unclear whether we can converge to a shared description of the world from the bottom up or we need to get to one description by a centralized effort based on the current state of science. The success of the Web proved that the decetralized structures can work very efficiently, however, there's no proof yet that we can decentralize our descriptions of the world (ontologies) the same way we did with out data.

No comments :

Post a Comment