Elsevier

Information Systems

Volume 34, Issue 8, December 2009, Pages 792-806
Information Systems

Metric-based stochastic conceptual clustering for ontologies

https://doi.org/10.1016/j.is.2009.03.008Get rights and content

Abstract

A conceptual clustering framework is presented which can be applied to multi-relational knowledge bases storing resource annotations expressed in the standard languages for the Semantic Web. The framework adopts an effective and language-independent family of semi-distance measures defined for the space of individual resources. These measures are based on a finite number of dimensions corresponding to a committee of discriminating features represented by concept descriptions. The clustering algorithm expresses the possible clusterings in terms of strings of central elements (medoids, w.r.t. the given metric) of variable length. The method performs a stochastic search in the space of possible clusterings, exploiting a technique based on genetic programming. Besides, the number of clusters is not necessarily required as a parameter: a natural number of clusters is autonomously determined, since the search spans a space of strings of different length. An experimentation with real ontologies proves the feasibility of the clustering method and its effectiveness in terms of standard validity indices. The framework is completed by a successive phase, where a newly constructed intensional definition, expressed in the adopted concept language, can be assigned to each cluster. Finally, two possible extensions are proposed. One allows the induction of hierarchies of clusters. The other applies clustering to concept drift and novelty detection in the context of ontologies.

Section snippets

Conceptual clustering for the Semantic Web

Recently, multi-relational learning methods are being devised for knowledge bases in the Semantic Web (henceforth SW), expressed in the standard representations. Indeed, the most burdensome related maintenance tasks such as ontology construction, refinement and evolution, enabling the SW applications demand for such automation. These tasks can be assisted by specific supervised [5], [30], [22], [3] or unsupervised learning methods [26], [16], [13].

In this work, we investigate on unsupervised

Preliminaries on the representation

In the following, we assume that resources, concepts and their relationship may be defined in terms of a generic ontology language that may be mapped to some DL language with the standard model-theoretic semantics (see the DLs handbook [1] for a thorough reference). Hence, the methods presented in the following apply to generic OWL-DL ontologies.

In the reference DL framework, a knowledge base K=T,R,A contains a TBox T, an RBox R and an ABox A. T is a set of concept definitions: CD, where C

A genetic programming approach to conceptual clustering

Many similarity-based clustering algorithms (see [23] for a survey) can be applied to semantically annotated resources, exploiting the measures discussed in the previous section. We focus on techniques based on stochastic methods which are able to determine also an optimal number of clusters, instead of requiring it as a parameter. However, the algorithm can be easily be modified to exploit this information that may dramatically reduce the search space.

Conceptual clustering requires also to

Evaluation

The clustering algorithm has been evaluated with an experimentation on various knowledge bases selected from standard repositories. The option of randomly generating assertions for artificial individuals was discarded for it might have biased the procedure. Only populated ontologies were considered as suitable for the experimentation.

Extensions

Some natural extensions may be foreseen for the presented algorithm. One regards upgrading the algorithm so that it may output hierarchical clusterings levelwise in order to produce (or reproduce) terminologies possibly introducing new concepts elicited from the ontology population. Even more so, this process may be mechanized with a method for detecting drifting concepts as separated from novel emerging ones.

Related work

The unsupervised learning procedure presented in this paper is mainly based on two factors: the semantic dissimilarity measure and the clustering method. To the best of our knowledge in the literature there are very few examples of similar clustering algorithms working on complex representations that are suitable for knowledge bases of semantically annotated resources. Thus, in this section, we briefly discuss sources of inspiration for our procedure and some related approaches.

Concluding remarks

This work has presented a framework for stochastic conceptual clustering that can be applied to standard multi-relational representations adopted for knowledge bases in the SW context. Its intended usage is for discovering interesting groupings of semantically annotated resources and can be applied to a wide range of concept languages. Besides, the induction of new concepts may follow from such clusters, which allows for accounting for them from an intensional viewpoint.

The method exploits a

Acknowledgments

The authors would like to thank the anonymous reviewers who provided suggestions for the improvement of the paper and for further investigations.

References (39)

  • R.E. Stepp et al.

    Conceptual clustering of structured objects: a goal-oriented approach

    Artificial Intelligence

    (1986)
  • F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. Patel-Schneider (Eds.), The Description Logic Handbook, Cambridge...
  • J. Bezdek et al.

    Some new indexes of cluster validity

    IEEE Transactions on Systems, Man, and Cybernetics

    (1998)
  • S. Bloehdorn, Y. Sure, Kernel methods for mining instance data in ontologies, in: K. Aberer, et al. (Eds.), Proceedings...
  • A. Borgida, T. Walsh, H. Hirsh, Towards measuring similarity in description logics, in: I. Horrocks, U. Sattler, F....
  • C. d’Amato, N. Fanizzi, F. Esposito, Reasoning by analogy in description logics through instance-based learning, in: G....
  • C. d’Amato, N. Fanizzi, F. Esposito, Query answering and ontology population: an inductive approach, in: S. Bechhofer,...
  • C. d’Amato, S. Staab, N. Fanizzi, On the influence of description logics ontologies on conceptual similarity, in: A....
  • C. d’Amato, S. Staab, N. Fanizzi, F. Esposito, Efficient discovery of services specified in description logics...
  • R. Duda et al.

    Pattern Classification

    (2001)
  • F. Esposito et al.

    Incremental learning and concept drift in INTHELEX

    Journal of Intelligent Data Analysis

    (2004)
  • N. Fanizzi, C. d’Amato, F. Esposito, Randomized metric induction and evolutionary conceptual clustering for semantic...
  • N. Fanizzi et al.

    Approximate measures of semantic dissimilarity under uncertainty

  • N. Fanizzi, C. d’Amato, F. Esposito, Conceptual clustering for concept drift and novelty detection, in: S. Bechhofer,...
  • N. Fanizzi, C. d’Amato, F. Esposito, DL-Foil: concept learning in description logics, in: F. Zelezný, N. Lavrač (Eds.),...
  • N. Fanizzi et al.

    Evolutionary conceptual clustering based on induced pseudo-metrics

    Semantic Web Information Systems

    (2008)
  • N. Fanizzi, L. Iannone, I. Palmisano, G. Semeraro, Concept formation in expressive description logics, in: J.-F....
  • A. Ghozeil, D. Fogel, Discovering patterns in spatial data using evolutionary programming, in: J. Koza, et al. (Eds.),...
  • M. Halkidi et al.

    On clustering validation techniques

    Journal of Intelligent Information Systems

    (2001)
  • Cited by (0)

    View full text