Editorial Board

Editor-in-Chief
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Michael Cochez
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Sebastián Ferrada
Mark Gahegan
Aldo Gangemi
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Krzysztof Janowicz
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Angelo Salatino
Christoph Schlieder
Stefan Schlobach
Cogan Shimizu
Blerina Spahiu
Sanju Tiwari
GQ Zhang
Rui Zhu

Former/Founding Editors-in-Chief
Krzysztof Janowicz
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Semantic Integration of Multidimensional Statistical Data: The CubeModeler Framework

Submitted by Charalampos Bratsas on 12/24/2025 - 15:38

Tracking #: 3995-5209

Authors:

Panagiotis Marios Filippidis

Euclid Keramopoulos

Rigas Kotsakis

Lazaros Ioannidis

Charalampos Bratsas

Responsible editor:

Harald Sack

Submission type:

Full Paper

Abstract:

Effective integration of heterogeneous statistical datasets remains a key challenge in semantic data publishing. Traditional approaches, ranging from ETL pipelines to OLAP and ontology-based solutions, often struggle with schema rigidity, limited reusability and complex transformation logic. This paper introduces a modeling-based integration approach that shifts the integration effort to the design of modular and reusable Data Structure Definitions (DSDs) within the RDF Data Cube framework. The method follows a clear sequence of modeling steps — including DSD construction, component and codelist definition, dataset description, semantic transformation and SPARQL querying — that support integration directly at the modeling stage. To operationalize this approach, we present CubeModeler, a lightweight semantic modeling environment that enables declarative integration through coded component hierarchies and facilitates dynamic querying via SPARQL over semantically aligned dimensions. Two real-world use cases, sports analytics and environmental measurements, demonstrate how the approach and its implementation in CubeModeler simplifies integration and querying across domains. A set of representative SPARQL queries illustrates its expressiveness in various contextual and temporal aggregations, while a comparative evaluation highlights its workflow simplicity, modular scalability and reusability for semantic multidimensional data integration.

Full PDF Version:

swj3995.pdf

Tags:

Reviewed

Decision/Status:

Major Revision

Solicited Reviews:

Click to Expand/Collapse

Review #1

Anonymous submitted on 14/Mar/2026

Suggestion:
Major Revision

Review Comment:

First of all, I would like to emphasize that the fundamental idea of investigating statistical data — particularly their representation using the RDF Data Cube model — and developing corresponding processes, tools, and architectures is highly welcome to me. Approaches that facilitate more efficient integration of heterogeneous statistical datasets, as well as their linkage with corresponding raw data, are of considerable relevance from both scientific and practical perspectives.

Within the paper, the authors introduce CubeModeler. This system is described as a collection of tools and services designed to support the process from data acquisition to the management of RDF Data Cube components. A particular emphasis is placed on ensuring consistency and connectivity within a data lifecycle that, however, is not described in detail. In this context, the authors address several dimensions of semantic data integration, including the RDF transformation of data, their semantic enrichment, and the adaptation or tailoring of existing datasets. To illustrate both the underlying challenges and possible solution approaches, the paper refers to several real-world examples.

Despite this promising premise, the paper still requires substantial revision in order to fully meet the standards expected of an excellent scientific publication.

One of the main aspects that also increased the effort required for the review concerns the organization of content within the individual chapters. Particularly in the introductory sections, the paper frequently shifts between conceptual descriptions, concrete implementation details, and illustrative examples. This mixture makes it more difficult to clearly understand the underlying architecture and methodology. For example, already in the first chapter (page 3), two RDF Data Cube concepts are mentioned that could instead be treated purely conceptually at this stage. The corresponding vocabulary is introduced only later, and therefore the prefixes used at this point are not yet easily understandable for the reader.
In addition, some passages appear unnecessarily verbose and partially redundant. Certain aspects are repeated in slightly different wording, although a more concise and precise formulation would be possible. Overall, the paper occasionally gives the impression of being in a relatively early stage of development. This is particularly noticeable in the inconsistent and sometimes delayed introduction of acronyms, which are later used in varying forms. While such inconsistencies can occur in collaboratively written publications, they should be harmonized before final submission.

From a conceptual perspective, several design decisions also remain unclear. For example, the paper uses a specific triple store. While this is entirely reasonable, it would be helpful for readers to understand whether CubeModeler provides generic interfaces — such as SPARQL-based components — that would allow the system to be deployed in alternative infrastructure environments.

Another point concerns the use of SKOS in the context of semantic transformation (Section 4.4). The motivation for this design decision is not entirely clear. In particular, it would be useful to explain why i.e. skos:prefLabel is preferred over alternatives such as rdfs:label.

From Chapter 5 onwards, several use cases are presented. However, parts of these examples are already anticipated in earlier chapters. A clearer structural separation would therefore be beneficial: the preceding chapters could focus more strongly on describing the model, architecture, and processes on a conceptual level, while Chapter 5 could then introduce concrete application domains.
In this chapter, several SPARQL queries are presented that are executed on an RDF Data Cube / RDF Graph created with CubeModeler. While this is appropriate, an introduction to the underlying generated data model would be very helpful. Without this context, it is difficult to understand how the presented queries are structurally derived. In addition, it remains unclear where these queries originate from. The URIs used in the examples to reference RDF Graphs (e.g., http://basketontology.org/) are currently not dereferenceable, which makes it difficult to reproduce or verify the examples.

The evaluation chapter primarily focuses on integration tasks during the modeling of Data Cubes, particularly the reuse of components, adaptability under schema drift, and the maintenance of semantic consistency in evolving datasets.
Section 6.1 evaluates the reuse of existing components in several scenarios. However, it is not entirely clear whether the term scenarios is appropriate here. From my understanding, the main objective is to create new RDF Data Cubes and assign existing components to the required Data Structure Definitions (DSDs). Apparently, the scenarios define differences between the concepts used, and the successful mapping between them is used as the success criterion. However, the complexity of this task is unfortunately not described in a sufficiently transparent manner. Furthermore, the systems CubeModeler, Karma, and RMLMapper are compared without providing short explanations, references, or version information for these tools.
Section 6.2 evaluates performance costs. For this purpose, a Windows client with 8 GB of RAM is used, and measurements are taken with the Windows Performance Monitor, apparently with a second run for comparison. This experimental setup appears somewhat unusual and is likely to provide only indicative results rather than reproducible ones. Moreover, it is not entirely clear whether the tools compared in Table 4 were evaluated under comparable conditions. According to my interpretation, CubeModeler uses datasets of up to approximately 20,000 rows and 25 columns. The RML Modeller appears to use similarily sized datasets (being faster but requiring more memory). However, the remaining tools listed in the comparison are evaluated using different and sometimes significantly larger datasets. The rationale behind this experimental design therefore remains unclear.
The paper concludes with a short conclusion and an outlook on future work. In addition, the appendix provides further insights into the modeling workflow with CubeModeler.
In summary, the manuscript still contains a number of smaller issues which, taken together, significantly affect the readability and clarity of the paper. Should the paper be accepted, I would therefore recommend a substantial revision. The underlying idea of the proposed implementation is interesting and addresses a topic that is becoming increasingly important in the context of publishing statistical data.

Further remarks (not exhaustive):
* Page 3: The description of an “alignment mechanism based on RDF properties and SKOS codelists” remains too vague.
It would be advisable to introduce a consistent acronym for RDF Data Cubes (e.g., “DC”) and use it consistently throughout the paper. Currently, the terminology alternates between “Data Cubes”, “RDF Data Cubes”, and later simply “cubes”.
The same issue applies to the term “RDF Data Cube Vocabulary”, which is also used inconsistently (e.g., on pages 2, 6, 7, and 9).
* From page 3 onwards, the term Data Structure Definition (DSD) is introduced. After the first introduction, the acronym DSD should be used consistently ("Data Structure Definition (DSD)" occurs so many times).
* Page 5: LOD2 project could be cited (appropriate paper are available)
* Figures: It would be helpful to add a legend explaining the meaning of colors, dotted lines, and arrows (e.g., in Figure 1).
* Page 9: The statement “DSDs provide a template structure” should be clarified. The structure appears to relate to a set of interlinked qb:dataset instances rather than a single dataset.
* Page 10: Some explanations could be more precise, for example regarding measure properties and the statement that an RDF Data Cube may contain multiple measures.
* Also on page 10: The explanation of “slicing and dicing” appears somewhat simplified. The efficiency of such queries strongly depends on the underlying triple store implementation.
* Page 11: The term “component (dimension)” is misleading, since components in the Data Cube model may also represent measures or attributes.
* Page 11: The phrase “model narrower components” is somewhat unfortunate terminologically and could be clarified with respect to specialization versus abstraction.
* Page 11: The statement “in a Greek dataset the list of air stations is fixed” appears problematic. It does not necessarily hold for all Greek datasets and does not take the open-world assumption of semantic data models into account.
* Page 13: The phrase “it may damage the knowledge graph” is unclear, since the paper previously refers to RDF Data Cubes or RDF datasets. It should be clarified whether the authors refer to inconsistent modeling, reduced semantic expressivity, or another issue.
* its not possible to access the gitlab repository
* no datasets are provided
* no evaluation results are provided in reproducible form
* demonstrators r links are not provided

Review #2

By Benedikt Kämpgen submitted on 04/Apr/2026

Suggestion:
Major Revision

Review Comment:

The paper "Semantic Integration of Multidimensional Statistical Data: The CubeModeler Framework" addresses the question of how to more easily integrate heterogeneous statistical data, such as from the World Wide Web, e.g. in CSV format.

Its goal is to present a modelling tool and acompanying methodology for statistical data and to evaluate its broad applicability.

Its approaches are:
* the reuse and elaborate application of a standardised vocabulary for statistical data on the semantic web, the RDF Data Cube Vocabulary (QB).
* an implemented tool for modelling and integrating statistical data using the QB vocabulary when storing the RDF in a database (RDF triple store, with SPARQL support).
* a methodology of how to use the modelling tool
* two examples of using the tool in a use case (in sports analytics, and in environmental monitoring)
* an evaluation with two other approaches w.r.t. modelling effort (in clicks) and performance (in time for preprocessing/loading)

The conclusions of the paper are that its approaches - despite some limitations and several interesting open works - is broadly applicable, easy to use and allows efficient preprocessing/loading.

There are several things I like about the paper:

The central question asked by the paper is both interesting and important. At a time when users pose questions more and more to chatbots and those chatbots need to find relevant information via search engines, databases and other tools, it is all the more relevant to provide harmonised access to as many statistical datasets as possible.

I personally like the RDF Data Cube vocabulary and find it useful and (all the more) promising to bridge the gap between (mostly internally built and used) data warehouses / data analytics tools and the (mostly used for communicating with external parties) semantic web. Therefore, I appreciate research and applications around it.

Also, the appropriate methods are used to address the question, i.e., a method to reuse RDF Data Cube vocabulary features such as declarative statements between dimension properties in hierarchies, an implementation and methodology, two practical applications in use cases, and a quantitative evaluation.

Also, the data from the evaluation does support the conclusion of a potential useful and widely applicable tool.

Yet, I found the following major issues with the paper:

1) The paper lacks significance of results with respect to the approach. The superiority of the introduced method is not clearly explained. For instance:

* The paper stems from the expertise in ontology mapping and knowledge graph construction; this view point I like and am convinced brings in some novelty. Still, in the current status of the work, this contribution is not made clear, enough. For instance, the SDM-RDFizer and Morph-KGC are cited to have duplicate-aware operators; however, with statistical linked data this seems rather far off, no example or explanation from the current work is given.

* One method is the use of rdfs:subPropertyOf between different dimension properties as a good balance between rigid and flexible modelling. Although this is an interesting idea I do not see its benefit. If all datasets are modelled while loading them into the system, one can directly try to use the same modelling as the basis for integration. Similarly, yes, it is easier to match properties via rdfs:subPropertyOf instead of using owl:equivalentProperty or the same URIs since it is not as rigid, but still respective modelling efforts are needed. The same goes for code lists and slices/dices. Yes, you can more losely relate different data cubes. But in the end, you will have it more difficult to query those data cubes together.

* I was disappointed to read "Further data modeling challenges regarding data cubes exceed the scope of the current paper". How many more are there? Is this just a tiny bit of possible heterogeneities? From a journal paper, I expect some kind of "completeness" for some view on the problem/question at hand. I would at least list (and give names to) different challenges.

* The SPARQL queries are difficult to understand. For instance, for Listing 1, the description says "This query combines two types of cubes (games, player statistics) for different leagues for a specific player.", however, in the SPARQL query, I only find one data cube with ?statline its observations and without any hint that there are indeed several cubes (qb:DataSet or qb:DataStructureDefinitions) queried. Maybe I am missing some greater contribution as to the querying the "unified view", but at the moment I mainly see the rather large effort in both modelling and querying. The use of "BIND" constructs in the query add further complexity and is not explained. This does not fulfil the claim of "showing how complex yet intuitive querying across semantically aligned datasets can be performed".

* Yes queries could be filled in by a form with a simple name input. Yes, that is true. Still, coming to the query does not seem intuitive/easy to me (even as a person having worked with this kind of data a lot).

* "traditional integration would require flattening these datasets into a single schema or building complex mappings between each structure, something that can be fragile and time-consuming." Even though this statement is correct, it does not say much about the papers work. The work lies between those two extremes. How the approach of the paper finds a better balance between those two extremes, remains unclear.

* The term "component" is not used consistently throughout the paper which makes understanding difficult. From its first mention in the abstract "The method follows a clear sequence of modeling steps — including DSD construction, component and codelist definition, dataset description, semantic transformation and SPARQL querying.", over its mention in the introduction "This is also the case for statistical data, which often rely on metadata and contextual dimensions such as time, location, or measurement unit. These are core components of their structure, essential for interpreting and comparing values in semantic data integration." to its mention in "creating or reusing components and codelists".

* The limitations chapter discusses possible open work such as on query optimization. The possibility of "alternative integration approaches outperforming CubeModeler" is very nicely but rather (too) shortly described. Since there is a separate section on "Future Work", I would recommend to separate those topics more clearly: A "discussions" chapter on the interpretation of the results with respect to the actual goals of the paper (such as w.r.t. to alternative approaches), and an "open work/future work" section describing possible work beyond the scope of the paper.

2) The paper lacks significance of results with respect to the evaluation. The chosen and measured quality metrics are poorly chosen, and a reproducibility of evaluation is not given. For instance:

* In the evaluation, the usage of Karma and RMLMapper is not sufficiently explained for understandability and reproducibility. Both tools/methods could have been explained and related to the current work in more detail in the related work section. Instead, they are very superficially introduced, there. At the least, this could have been added to the appendix.

* I appreciate the elaborate evaluation method used for comparing workflow simplicity, model reusability and semantic flexibility between CubeModeller, Karma and RMLMapper. However, the times for different tasks seem rather arbitrary to me. Similarly, the Reuse Ratio does not seem helpful to me; how comes reuse is different between CubeModeller, Karma, and RMLMapper if they have the same expressivity? Also, Precision/Recall for me seems not a good metric for integration scenarios. At least, I have so far not seen them used. In general, it would have helped the evaluation if (peer-reviewed) papers would have been cited that use similar evaluation methods.

* In the evaluation, do the scenarios increase in complexity? If so, it would be good if a column in Table 1 would show that. Only then, it will make sense to speak of a trend over complexity w.r.t. reuse ration between different approaches, in Figure 12.

* It is good to have performance evaluations included in the paper. However, in my opinion, the performance of integration/loading is much less interesting than the performance of querying of the data. For one thing, the preprocessing and transformation may more show the focus/wilingness by the respective developers (of each tool) on proper software engineering; and may show less the appropriateness of the method/approach. For another, in applications, the focus is much more on query performance than ETL performance. By the way, the loading of different sizes of data could have been shown as a line chart to show some trend, e.g., of linear or exponential growth. ETL and query performance may be related since more complex integration/loading may lead to more (or less) complex querying. However, the paper does not discuss this. The paper discusses some preliminary query performance tests, only.

* The supplementary material (SWJ_CubeModeler.zip) does only include the latex source of the paper. The actual source code is only available after request and may be published in Q1 2026. I think, this should be clarified before publication.

3) And the quality of writing/presentation can be improved, most importantly with a common thread throughout the paper.

* Introduction and Related Work build a lot of suspense, e.g., "increasing volume and variety of statistical datasets published in decentralized ways highlights the need for a consistent and streamlined data integration process", or "alternative modeling-based approach to data integration is explored, shifting the integration process to the modeling stage". I recommend to be more clear about the contribution of the paper.

* The chapter "Dynamic evolution" mixes results and discussions of results, which I find confusing. By which number is the statement "near minimal-effort scalability" confirmed? This may also be improved by more descriptive chapters, e.g., instead of "Dynamic evolution" maybe "Applicability of CubeModeller with increasing number of datasets". Similarly, having one chapter called "Modular Scalability and Integration" and one "Performance Overview" shows potential for more descriptive naming with a common thread throughout the paper.

* In general, I think, the evaluation (effort in modelling, expressivity in modelling, performance of translation) could have been put more in alignment with the research question / problem in the introduction. For me, they came rather "out of the blue" and were not systematically lead to. This is also related to my comment of "building suspense" in the introduction withouth fulfilling it in later chapters; instead I would have let the readers know more early and clearly what will be done in the paper.

* For instance, it may help to define a "persona" in the introduction or use cases having the investigated problem of integrating various statistical datasets (e.g., for him/herself or its group/department). This person may or may not be also the user querying the data.

* Figure 1 mixes data flow and knowledge graph which is difficult to understand. Do the colors carry any specific meaning? If so, please explain, if not, consider leaving out the colors.

* Figure 2 again mixes different meanings of arrows. It is not clear what the visualisation should explain. Maybe add a descriptive text to the figure as to make clear its message, otherwise consider leaving it out.

In summary, with some novelty about solving the problem of statistical data integration, the paper shows potential for being published at SWJ.
Yet, in its current form of significance of results and quality of presentation, I can only recommend a "major recision" of the paper.

Ideally, the paper contains:

* A well defined set of possible heterogeneities between tabular datasets with statistics.
* A well defined set of methods to deal with those heterogeneities with as little effort as possible, preferable to add declarative knowledge and to have a clean translation process from tables to RDF data cubes.
* A well defined set of queries with placeholders as a blueprint for querying the unified view of data cubes.
* An application of the approach to one or more clear use cases, from the administrator's view managing the unified view to the end user having information needs fulfilled.
* An evaluation based on an (open source) implementation applied to the two use cases, with all resources made openly available for reproducibility.

For acceptance, much less will be required, but - in my opinion - the paper should clearly go into that direction.

Minor comments:

* I like how the paper introduces slices as a means for better querying.

* Figures 3 and 4 are nice, and I see some meaning in the coloring. Figure 5 is nice, too.

* What if datasets are modelled in QB vocabulary already? Can we simply import them?

* I found a few spelling errors, e.g. "cases are prsented".

* Some sentences / paragraphs are too informal for my taste, e.g., "alignment isn’t obvious, CubeModeler offers a “smart slice mapping” feature that automatically detects shared values across slice-level components to generate grouped slices. The algorithm ensures each slice includes all observations with matching values in the relevant columns" or "if a semantic transformation task for the same DSD has to take place later with another dataset, it can leverage the respective configuration workflow and automatically all settings will be imported and adjusted directly to the mapping interface. In this way the user can simply click the execute button to proceed to the RDF transformation and have the final RDF file in a blink of an eye." or "Finally, the dataset is totally transformed to RDF in TTL format".

* With respect to cited papers: Several references are missing their publication year. Many cited papers are rather old. No previous workshop or conference publications on the topic by the authors (some previous work on that topic at ESWC or ISWC workshops / conferences by the authors would give some credibility). A lot of papers about ontology matching and knowledge graph construction are cited, the authors show high knowledge of the field; about the topic of statistical linked data, this is less the case. Also, I noted that several times, several similar papers were cited by the same authors (e.g., by Huang W, and by Haves-Fraga D), this could be in better balance.

* I would suggest to consider the following papers for your work as related work or foundation. As a co-author of the papers, I am biased. Still, I think they fit nicely with your research goal of easy integration to a unified view based on the QB vocabulary.

Bischof, S., Harth, A., Kämpgen, B., Polleres, A., & Schneider, P. (2018). Enriching integrated statistical open city data by combining equational knowledge and missing value imputation. Journal of Web Semantics, 48, 22–47. https://doi.org/10.1016/j.websem.2017.09.003

For instance, the paper defines a unified view as the basis for querying, and describes methods to convert measures from one unit to another, and to predict missing values.

Kämpgen, B., Stadtmüller, S., & Harth, A. (2014). Querying the Global Cube: Integration of Multidimensional Datasets from the Web. EKAW 2014, 250–265. https://doi.org/10.1007/978-3-319-13704-9_20

For instance, the paper defines a unified view (Global Cube) and describes different integration challenges such as diﬀerent dimensions, different dimension names, different levels of detail, and different units of measurement.
It describes a method to convert measures (with declarative descriptions, from one unit to another or one or more measurements to another) and to merge two data cubes.

Kämpgen, B., & Harth, A. (2011). Transforming Statistical Linked Data for Use in OLAP Systems. I-Semantics 2011, (Mdm). http://www.aifb.kit.edu/web/Inproceedings3211

For instance, the paper includes reasoning over equivalent dimensions (based on owl:sameAs) to integrate one or more cubes.

* The cover leter by the authors contains information that is better placed directly in the paper, e.g., the "Summary of the manuscript’s key contributions" and the description of supplementary files in a GitLab repository.

* As the presented tool "CubeModeler" seems very mature, maybe consider publishing it in the category "Reports on tools and systems".

Review #3

Anonymous submitted on 11/May/2026

Suggestion:
Major Revision

Review Comment:

The paper presents CubeModeler, a data integration framework based on the W3C RDF Data Cube standard. The framework follows a five-step workflow: (i) defining a Data Structure Definition (DSD), (ii) creating or reusing components and codelists, (iii) describing dataset-level metadata, (iv) performing semantic transformation (“RDFizing”) of tabular data, and (v) enabling integrated access through SPARQL queries.

The main contribution is the proposal of an integration workflow centered on RDF Data Cubes and the provision of tool support through the CubeModeler platform. The approach supports integrated-by-design data models through shared vocabularies, focusing on tabular sources, and supports direct mappings with no value transformations, such as merging or splitting cell values.

(+) practical applicability of the approach for statistical data
(+) a good overview of related work and gap identification
(+) An initial demo website of CubeModeler is available for evaluation

(-) The lack of description about the architectural design and components of the cubemodeler framework
(-) The usage flow of the approach is not clearly explained in the main part of the paper
(-) unclear reasoning behind the evaluation setup; no source code for the evaluation setup available
(-) The information about the work scope is not clear from the text, e.g., target users, type of source data for integration
(-) Example data is only available in a private repo and reviewers can only access them if they share their contact details - which is not desirable for me

While the framework is well motivated, the conceptual foundation is not entirely novel. At a high level, the proposed integration strategy resembles the Hybrid OBDI approach described by Wache et al. (2001), which integrates heterogeneous data sources through a shared vocabulary. Similarly, the first three steps of the workflow largely reflect standard practices for modeling RDF Data Cubes through DSDs. The more interesting aspects of the paper are the KG population (“RDFizing”) process, which is based on Data Cubes definition, and the CubeModeler tool itself, both of which represent valuable and practical contributions to the community.

The CubeModeler tool is currently available as an online demo (cubemodeler.com), and the authors state that an open-source release is planned for Q1 2026. However, the current demo lacks documentation and explanatory material, making it difficult to understand how the framework operates in practice. Although the paper includes several screenshots of the graphical interface, the actual workflow and internal mechanisms only become clear in the appendix. Since these technical details are central to understanding the contribution, the authors should consider moving them into the main body of the paper rather than deferring them to supplementary material.

Another weakness is the lack of a clear architectural overview of the CubeModeler framework. Figure 6 outlines the general integration workflow, but this process is fairly generic and applicable to any use case that uses RDF Data Cubes. The paper would benefit from a clearer presentation of the framework’s architecture, components, and internal interactions, particularly to clarify how CubeModeler differs from existing approaches.

The evaluation section also raises several concerns. CubeModeler is compared against Karma and RMLMapper (Section 6.1), but the rationale for selecting these baselines is not discussed. In another section (Section 6.2), the CubeModeler is compared with another set of tools (cf. Table 4), but the evaluation setup is not clear from the description - e.g., for replication by reviewers.

Furthermore, the paper does not clearly identify its target users—whether the system is intended for lay users, domain experts, or knowledge engineers. Likewise, the motivation for focusing on performance-oriented evaluation rather than user-centered evaluation is not adequately justified. If performance is the main focus, the experimental setup should be described in greater detail to ensure reproducibility.

The paper additionally suffers from issues of length and focus. At nearly 40 pages excluding references and appendices, the manuscript is considerably longer than necessary given the amount of content presented. Some sections, such as the background discussion on RDF Data Cubes, could be shortened and referenced to the official Data Cubes specifications. Condensing the paper to approximately 25 pages would improve readability and help readers focus on the actual contributions.

Overall, the paper demonstrates promise, particularly through the RDFization workflow and the CubeModeler tool. However, substantial revisions are necessary to provide readers with a clearer architectural overview, a more accessible explanation of the framework internals, and a stronger evaluation design. In addition, the relationship between the proposed approach, Hybrid OBDI, and the standard use of DSD in RDF Data Cubes should be clarified, as should the target users for the framework. Finally, condensing the paper to the suggested maximum page (i.e., 25 pages) would improve readability and accessibility for reviewers and readers alike.

Log in or register to post comments
1223 reads

Main menu

Editorial Board

Syndicate

Semantic Integration of Multidimensional Statistical Data: The CubeModeler Framework

Tracking #: 3995-5209

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Semantic Integration of Multidimensional Statistical Data: The CubeModeler Framework

Tracking #: 3995-5209

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles