Abstract:
Data integration, the task of providing a unified view over a set of data sources, is undoubtedly a major challenge for the knowledge graph community. Indeed, such flexible data structure allows to model the characteristics of source schemata, rich semantics for the global schema and the mappings between them. Yet, the design of such data integration systems still entails a manually arduous task. This becomes aggravated when dealing with heterogeneous and evolving data sources. To overcome these issues, we propose a fully-fledged semi-automatic and incremental data integration approach. By considering all tasks that compose the end-to-end data integration workflow (i.e., bootstrapping, schema matching, schema integration and generation of querying constructs, we are able to address them in a unified manner.We provide algorithms for each task, as well as theoretically prove the correctness of our approach and experimentally show its practical applicability.