Metric-based stochastic conceptual clustering for ontologies
Section snippets
Conceptual clustering for the Semantic Web
Recently, multi-relational learning methods are being devised for knowledge bases in the Semantic Web (henceforth SW), expressed in the standard representations. Indeed, the most burdensome related maintenance tasks such as ontology construction, refinement and evolution, enabling the SW applications demand for such automation. These tasks can be assisted by specific supervised [5], [30], [22], [3] or unsupervised learning methods [26], [16], [13].
In this work, we investigate on unsupervised
Preliminaries on the representation
In the following, we assume that resources, concepts and their relationship may be defined in terms of a generic ontology language that may be mapped to some DL language with the standard model-theoretic semantics (see the DLs handbook [1] for a thorough reference). Hence, the methods presented in the following apply to generic OWL-DL ontologies.
In the reference DL framework, a knowledge base contains a TBox , an RBox and an ABox . is a set of concept definitions: , where
A genetic programming approach to conceptual clustering
Many similarity-based clustering algorithms (see [23] for a survey) can be applied to semantically annotated resources, exploiting the measures discussed in the previous section. We focus on techniques based on stochastic methods which are able to determine also an optimal number of clusters, instead of requiring it as a parameter. However, the algorithm can be easily be modified to exploit this information that may dramatically reduce the search space.
Conceptual clustering requires also to
Evaluation
The clustering algorithm has been evaluated with an experimentation on various knowledge bases selected from standard repositories. The option of randomly generating assertions for artificial individuals was discarded for it might have biased the procedure. Only populated ontologies were considered as suitable for the experimentation.
Extensions
Some natural extensions may be foreseen for the presented algorithm. One regards upgrading the algorithm so that it may output hierarchical clusterings levelwise in order to produce (or reproduce) terminologies possibly introducing new concepts elicited from the ontology population. Even more so, this process may be mechanized with a method for detecting drifting concepts as separated from novel emerging ones.
Related work
The unsupervised learning procedure presented in this paper is mainly based on two factors: the semantic dissimilarity measure and the clustering method. To the best of our knowledge in the literature there are very few examples of similar clustering algorithms working on complex representations that are suitable for knowledge bases of semantically annotated resources. Thus, in this section, we briefly discuss sources of inspiration for our procedure and some related approaches.
Concluding remarks
This work has presented a framework for stochastic conceptual clustering that can be applied to standard multi-relational representations adopted for knowledge bases in the SW context. Its intended usage is for discovering interesting groupings of semantically annotated resources and can be applied to a wide range of concept languages. Besides, the induction of new concepts may follow from such clusters, which allows for accounting for them from an intensional viewpoint.
The method exploits a
Acknowledgments
The authors would like to thank the anonymous reviewers who provided suggestions for the improvement of the paper and for further investigations.
References (39)
- et al.
Conceptual clustering of structured objects: a goal-oriented approach
Artificial Intelligence
(1986) - F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. Patel-Schneider (Eds.), The Description Logic Handbook, Cambridge...
- et al.
Some new indexes of cluster validity
IEEE Transactions on Systems, Man, and Cybernetics
(1998) - S. Bloehdorn, Y. Sure, Kernel methods for mining instance data in ontologies, in: K. Aberer, et al. (Eds.), Proceedings...
- A. Borgida, T. Walsh, H. Hirsh, Towards measuring similarity in description logics, in: I. Horrocks, U. Sattler, F....
- C. d’Amato, N. Fanizzi, F. Esposito, Reasoning by analogy in description logics through instance-based learning, in: G....
- C. d’Amato, N. Fanizzi, F. Esposito, Query answering and ontology population: an inductive approach, in: S. Bechhofer,...
- C. d’Amato, S. Staab, N. Fanizzi, On the influence of description logics ontologies on conceptual similarity, in: A....
- C. d’Amato, S. Staab, N. Fanizzi, F. Esposito, Efficient discovery of services specified in description logics...
- et al.
Pattern Classification
(2001)