Abstract
Nowadays, due to the high level of data distribution, it is frequently impossible to generate a unified representation of a variety of heterogenous data sources in a single step. Dividing the integration process into smaller subtasks and their parallelization can solve this problem. Unfortunately, it entails difficulties concerning the initial classification of data sources into groups that can be independently integrated, and serve as an input for the final integration step. The problem becomes even more complicated when not only raw data is required to be integrated, but the designed system is expected to perform more expressive integration of heterogenous knowledge representations, such as ontologies. In our previous work [10] we have proved both analytically and experimentally that such approach to the integration task can increase its effectiveness in terms of the time required to obtain the final result. In this article we intend to explore the issue of selecting initial classes of ontologies based on the novel notion of the knowledge increase. This indicator can be computed before the integration and moreover answer the question concerning whether this integration is viable. This not only simplifies the initial distribution of aforementioned subtasks, but can also be used as a stop condition during subsequent steps of the integration.
Get full access to this article
View all access options for this article.
