Abstract:
Effective integration of heterogeneous statistical datasets remains a key challenge in semantic data publishing. Traditional approaches, ranging from ETL pipelines to OLAP and ontology-based solutions, often struggle with schema rigidity, limited reusability and complex transformation logic. This paper introduces a modeling-based integration approach that shifts the integration effort to the design of modular and reusable Data Structure Definitions (DSDs) within the RDF Data Cube framework. The method follows a clear sequence of modeling steps — including DSD construction, component and codelist definition, dataset description, semantic transformation and SPARQL querying — that support integration directly at the modeling stage. To operationalize this approach, we present CubeModeler, a lightweight semantic modeling environment that enables declarative integration through coded component hierarchies and facilitates dynamic querying via SPARQL over semantically aligned dimensions. Two real-world use cases, sports analytics and environmental measurements, demonstrate how the approach and its implementation in CubeModeler simplifies integration and querying across domains. A set of representative SPARQL queries illustrates its expressiveness in various contextual and temporal aggregations, while a comparative evaluation highlights its workflow simplicity, modular scalability and reusability for semantic multidimensional data integration.