JenTab: Bridging Tabular Data and Knowledge Graphs – A Detailed System Overview

Tracking #: 3743-4957

Authors: 
Nora Abdelmageed
Sirko Schindler2
Birgitta Koenig-Ries

Responsible editor: 
Guest Editors KG Construction 2024

Submission type: 
Full Paper
Abstract: 
Semantic Table Annotation (STA) stands as a crucial process in the realm of data interpretation and knowledge extraction, especially within the context of big data and the Semantic Web. Tables, ubiquitous across diverse domains from scientific literature to business reports, contain a wealth of structured information waiting to be unveiled. However, this information remains largely untapped in the absence of effective methods for semantic annotation. In essence, STA enriches raw tables with semantic metadata such as entities, classes, and relations obtained from Knowledge Graphs (KGs). It bridges the gap between unstructured data and structured knowledge representation, enabling sophisticated data analytics, information retrieval, and decision-making processes. It unlocks the potential of tabular data in the era of data-driven decision-making. However, automating this semantic annotation, particularly for noisy tabular data, remains a formidable challenge. In this paper, we give a detailed overview of our STA system, JenTab. We developed and tested JenTab under the umbrella of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) challenge 2020-2022. However, we extended the evaluation of JenTab beyond the scope of these challenges. JenTab is a core system for STA, it is a top-3 ranked systems among other participants throughout its years of development. In addition, we present a detailed evaluation for its individual components, extensive discussion of JenTab with its limitations, and a demonstration of the system configuration, execution and output.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Ioannis Dasoulas submitted on 18/Dec/2024
Suggestion:
Minor Revision
Review Comment:

This paper provides a detailed overview of JenTab, an STA system, presenting an evaluation of its accuracy, the way that it functions and its limitations. It is a well written paper, that explains in detail related work, the system’s architecture and different strategies it employs. JenTab is evaluated over many different datasets from the SemTab competition throughout the years, demonstrating its strengths and weaknesses, and summarising the system’s progress. What I didn’t find very clear were some of the claims made by the authors in the system architecture and audit results, which I explain below. Also, I missed a general discussion about the authors’ opinion on the current state of STA and its applicability in real-world scenarios. It is not entirely original as a lot of the results have been presented in previous papers, however in general, I think this is a good paper that provides a system overview of JenTab and showcases the strength of semantic web community systems. Thus, I believe it should be accepted after minor revisions.

Detailed comments:
Related work section: Previous system architectures are explained in detail. I would expect some further reference to modern deep-learning based solutions, such as HYTREL, and DODUO that have shown very good results, as well as general-purpose LLMs like GPT. I believe these are important to look into and discuss their promise, even if they are not explained thoroughly like the existing related work architectures.

JenTab toolkit section: The discrete components are explained in detail. I appreciated the figures and the examples. Some comments:
- It is not clear to me how subject cells are identified. Is the subject column considered known for JenTab or is it identified?
- ‘represent the same characteristic of the corresponding tuples’: This is not clear to me
- The Filter stage seams reasonable, as it seemingly rejects some candidates, boosting the speed of the system. However, I was wondering whether the authors have estimated the candidate coverage before and after filtering candidates? Is it possible that some valid candidates are filtered out? In other words, does filtering introduce a speed-accuracy trade-off? It is mentioned later that ‘sometimes’ indeed it happens but this is somewhat vague. Have the authors tested JenTab without filtering?
- The default pipeline ‘results from experimentation with available input tables’. I understand that it can change, depending on the nature of tables. Still in this form it is a bit hard to follow. Maybe it can be generalised further with some more high-level steps. Still, I appreciated the author’s explanation for different variations.

Evaluation section:
- All tested datasets have been created with ground-truth labels from Wikidata or DBpedia in mind. It would be interesting to have experiments or at least a discussion afterwards about JenTab’s performance for tables outside the SemTab competition or datasets whose data do not originate from the JenTab target KGs. What do the authors expect? This is an important question for the future of STA, in general.
- CEA Creation: It is not clear to me whether all strategies are always employed or the authors manually select which strategies should be employed for each dataset. If it’s the former, how is ‘most used strategy’ calculated? By looking at candidate coverage? If it’s the latter, there should be a discussion regarding the generalisation of the approach.
- Similarly for CEA Selection. It is not clear what ‘solves cases’ means. Does that mean that the authors use one strategy after the other and calculate its accuracy? For example, for the Hard Tables, how do the authors know that String Similarity solves all cases? Does that mean perfect candidate coverage or perfect accuracy (i.e the top candidate is the correct one)? In general, I believe the Audit Results can be clarified a bit more. In theory, a strategy may be employed more by the system, but still not have the best results.
- tfood: I recall that the TorchicTab system had reported good results for tFood at the SemTab 2023 competition. These could be added in the evaluation or at least get mentioned.

Discussion section:
- As previously, mentioned I would additionally expect a discussion about the authors’ opinion on STA systems applicability after their years of working in this field. Yes, the reported systems produce good results for SemTab datasets. What about real world datasets? BiodivTab is still annotated with general-purpose KG labels, this is not exactly a real world-scenario. Is STA, and specifically JenTab ready to be applied in real applications? Is STA as a field mature enough for this? If not, why? What are the limitations of the field and what are its strengths in today’s world?
- I would also expect a small discussion about deep-learning systems (since they are not included in the evaluations section). What is the authors’ opinion? Do they show the same promise as heuristic-based systems like JenTab?

Minor comments:
Section 2: ‘(Figure 2c’ —> ‘(Figure 2c)’
Section 5:
- ‘CEA Creation Figure 10 shows how much each strategy across general domain benchmarks.’: Wrong syntax
- “We hosted added”: Wrong syntax

Review #2
Anonymous submitted on 14/Feb/2025
Suggestion:
Reject
Review Comment:

Summary:
The paper gives an overview of the JenTab semantic table annotation tool. It compares its different modules and pipeline configuration with some existing state of the art. The paper covers all previous work regarding JenTab and showcases how the authors iteratively improved their system over the years. Evaluations were performed on well-known benchmark dataset from the semtab competition, hosted over multiple years at the ISWC conference.

Overall comment:
The paper provides some insights on how to build a semantic table annotation system and what the challenges are to make this an efficient and good performing system. Still the paper reads to me as yet another semantic table annotation system. The authors mention their main strengths as: being open-source and receiving a “Artifact Availability” Badge. I think the authors should think about how they would position their work compared to the state of the art and how they would like to support their advantages with well motivated empirical results. At this point, I don’t see the novelty of this work compared to the current state of the art. The paper also lacks in clarity and is at some points hard to read or follow. The related work misses some of the more recent works. Within the current experimental setup it is still sometimes unclear which configurations were tested and evaluated. The time results are not compared against the current state of the art, which makes it hard to position JenTab in this perspective. The discussion section does not reflect JenTab regarding the related work. The discussion also does not give the reader insights in why we should use JenTab in the first-place and what are the competitive strengths of this work. I advise the authors to go through the more detailled comments below.

Detailled comments:
- Introduction P1: “STA transforms raw data tables into rich sources of knowledge that machines can comprehend and analyze” —> This is somewhat vague to me. What are the reasons why we want to do STA? I think the authors should incorporate the claims of an improve data integration (e.g. standardising the datasets over a given taxonomy) and data validation or consistency checks in the end.
- Introduction P1: “enabling advanced data analytics, information retrieval, and decision-making processes.” —> What do I have to understand about: "an advanced decision making process". Do you have examples where STA was used in this perspective? These claims should be ideally be supported with citations.
- Introduction P2: “discovery across heterogeneous sources” —> I don't understand the claim that the authors make on how STA facilitates the knowledge discovery across heterogeneous data sources. As to me, a table is not directly an example of a heterogeneous data source. Maybe a definition what the authors expect regarding this is necessary here. Heterogeneous data types can also refer to images, time series data, video, which I typically don't see in a table.
- Introduction P2: “Moreover, STA plays a crucial role in various applications, including information extraction from documents, database augmentation, and semantic search.” —> This is not supported with proper references. As a reader I want to see some references regarding these applications.
- Introduction P2: “(naming according to [2]): Cell Entity Annotation (CEA) matches cells to individuals, whereas Column Type Annotation (CTA) does the same for columns and classes. Furthermore, Column Property Annotation (CPA) captures the relationship between pairs of columns.” —> I think STA requires a better definition and description of what it is about. There are also more categories (such as providing annotations to whole tables) as stated by the authors later in the paper. At this point, in the introduction, it is unclear from the text why the authors only focused on these categories here.
- Introduction P2: JenTab is being introduced but I have no idea why we needed JenTab in the first place, what the current gaps are within STA and what Jentab brings to this field. It is claimed that JenTab can handle large KGs, but was this an issue of the other, already existing systems?
- Introduction P2: What is the goal of this paper? Is the goal that you compared different modules of the CFS steps to investigate which ones are the most efficient? If this is the case, I would make it more clear within the introduction.
- Introduction P2: “We developed and tested JenTab during the participation in the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab)2 challenge 2020-20223. JenTab received a second place prize (Usability Track) of the SemTab challenge sponsored by IBM Research4 during ISWC 2021 [5]. In addition, JenTab was awarded the Artifacts Availability Badge by SemTab 2022” —> Is this relevant information for within the introduction?
- Introduction P2: Might be a better idea to provide the contributions in this work thoroughly without referring to other papers for more information. I understand that the authors combined some of their earlier work together and use this journal paper to evaluate, relate and discuss their work but then I want as a reader all relevant information directly.
- Introduction P2: First contribution —> I found this weird to state as a contribution for a journal publication, as I expect an overview of the state of should always be the case in a paper.
- Introduction P2: Third contribution —> I do not understand from the introduction why a new benchmark was needed. Are there current gaps within the current available datasets that are of relevant? In the remainder of the paper, this contribution is only partly being mentioned and never thoroughly discussed again.
- Background P3: For me def 2.2, is just the formal definition of a graph. It might be better to provide the definition of a knowledge graph in the context of STA. Referring also to the underlying schema/ontology.
- Background P3: “A table could be just layout or be encapsulating a certain amount of information [23]. The former is used for visualisation” —> I don't understand the purpose and meaning of this sentence. Can you provide an example? Do layout tables have no information at all?
- Background P3: “We express a genuine table in two dimensions” —> I do not understand this classification. Is it for example possible to have a matrix inner-relational table with vertical orientation?
- Background P3: “considering both horizontal and vertical headers” —> It is the first time that the authors mention that tables can have a header but in the context of semtab, this would require some more attention.
- Background P3: STA tasks —> In introduction 1) you speak about CEA, CTA, CPA while here Cell to instance etc. and 2) in total you have 5 tasks but in the remainder of the paper you ignore 2 of them completely? There is no reason provided why JenTab only focusses on CEA, CTA and CPA in the end.
- Background P4: “Finally, Topic Detection (TD) classifies the entire table to a topic. Wikipedia articles are a perfect source of the expected output from this task. In the given example, https://en.wikipedia.org/wiki/Country would be a solution.” —> Where is the "semantic" part in this particular task? This is not linked to a KG ever?
- Background P4: I think it semtab still ran in 2024, but I understand the authors had to make some cutoff dates. Just make this more clear.
- Background P4: “So far, SemTab, used Wikidata, DBPedia, and Schema.org as target KGs” —> Later on in this paper, in your Table 1, the authors provided YAGO as a possible KG for one of the current available system. Nobody has a system for schema.org or it was not evaluated? Hence why it is provided here?
- Related work P5: The current related work misses LLM-based solutions, which were for some tasks already available in 2023. Such as: 
Dasoulas, Ioannis, et al. "TorchicTab: Semantic Table Annotation with Wikidata and Language Models." CEUR Workshop Proceedings. CEUR Workshop Proceedings, 2023.
Korini, Keti, and Christian Bizer. "Column type annotation using chatgpt." arXiv preprint arXiv:2306.00745 (2023).
Huynh, Viet-Phi, Yoan Chabot, and Raphaël Troncy. "Towards Generative Semantic Table Interpretation." VLDB Workshops. 2023.
There are also newer LLM-based system as the outcome of semtab 2024 which need to be incorporated within this overview.
- Related work P5: I'm also missing all the current limitations of the existing systems. Sometimes you mention these issues (like for Magic) but not for all systems. Again, it should be somehow clear what the strengths of JenTab are compared to related works.
- Related work P6: “They supported Wikidata as a target KG” MTab —> Could you clarify or make it more stateful why mtab dropped their dbpedia support? Because the original 20219 version did support DBPedia as a KG
- Related work P6: “CSV2KG [32] addresses the three tasks of STA” —> Three of two? Because I don't see a description of the CPA task in this provided text (which I think does exists)
- Related work P8: “All these features are weighted through a machine learning framework.” —> Might be worth it to provide mor details about about the ML framework as this is the reason why it is different from the heuristic tf-idf systems.
- Related work P9: Deep Learning Techniques —> I would like to see the specific deep learning aspects of all these methods (hence the reason why you provided them in this category)
- JenTab toolkit P10: 4.1 Architecture —> I don't think the JenTab toolkit section should start with the architecture. I think it might be better to build it up from the functional building blocks and working towards the architecture. More concrete: starting with the explanation of what the Solver does (which is probably the main component of JenTab). It might be a good idea to also explain what they see as “the actual pipeline” in this perspective.
- JenTab toolkit P10: 4.1 Architecture —> The architecture makes a large amount of assumption for which no clear implementation section is provided. It looks like this is implemented as a microservice architecture and can therefore benefit from scaling and modular decoupling but no evidence is provided (at least in the paper). It would be a good idea if the authors provide more details in a separated implementation section after the architecture section.
- JenTab toolkit P10: “Figure 4 illustrates the most recent system design of JenTab. For the previous architecture, please refer to [13].” —> I would state this differently. It might be better to showcase that this was an evolving process and refer to the previous versions
- JenTab toolkit P10: “The Manager’s dashboard contains information on the current state of the overall system, i.e., processed and still queued tables.” —> Would be a good idea to provide the reader here already with a visual of the dashboard (now it is in the discussion).
- JenTab toolkit P10: “A Runner coordinates the processing of a single table at a time through a series of calls to different services.” —> It is not directly clear from this text if these services are within the runner itself or shared across the runners. I assume this is the case due to the box around the runner and solver in Figure 4?
- JenTab toolkit P10: “First, caches for computationally expensive tasks or external dependencies increase the overall system performance.” —> This should be clarified more. Cache hits work when multiple similar requests are made. This means that you expect that multiple requests will be made when annotating a large amount of tables. Why should we expect this is the case for STA?
- JenTab toolkit P10: “when the target KG is to be substituted, all necessary changes, like adjusting SPARQL queries, are concentrated within just two locations: the corresponding lookup and endpoint services.” —> From architectural point of view, I can't found the necessary information on how the authors realised this. Did the authors use some kind of listener paradigm such that new proxies can be registered to the solvers? Or is it more hardcoded within the "pipeline" configuration?
- JenTab toolkit P11: “2. We use a regular expression to split up terms that are missing spaces like in 1stGlobal Opinion Leader’sSummit into 1st Global Opinion Leader’s Summit” —> This is vague. The authors might be better provide the regular expression here or if it is too long within an appendix for completeness.
- JenTab toolkit P11: “The result of these steps is stored as a cell’s clean value” —> At this point, it is not clear where the authors store these clean values
- JenTab toolkit P11: “We deprecated the autocorrect step in 2021 due to its unstable behavior.” —> Did the authors compare alternatives to autocorrect? Such as the commonly used textblob for spelling correction, translations etc.? Because now, one, in my opinion, an important step is being deprecated due to the malfunctioning of an external library.
- JenTab toolkit P11: Datatype prediction —> Can this be extended to other datatypes as well or is this fixed. How generic is this module?
- JenTab toolkit P11: “We determine the primitive datatype of each column.” —> But how? How do you decide if something is a String or a Date. Is it based on casting towards that datatype property and check if it fails or not? What about columns which provide have e.g. one spelling mistake in one cell such that it can be casted to a number? How do you handle these cases in JenTab? This should be clarified within the text.
- JenTab toolkit P11: “We extracted the unique values from all tables of a given dataset and matched those against labels of the respective KG using an optimized Jaro-Winkler Similarity implementation based” —> So to state it differently: you stored the corresponding similar textual descriptions between the different values in the dataset and all labels within the KG upfront? And this similarity is calculated through Jaro-Winkler? This now looks like a real fixed static approach. What about new information becoming available in KG or new unseen tables? Why not other, newer alternatives here, like using text embeddings?
- JenTab toolkit P12: “We converted all the given cells into the embedding space using fasttext [60] to avoid the out-of-vocab (OOV) problem.” —> But this only the cell entity and not the context? I do not fully understand what fasttext did here because word embeddings consider the context in the sentence to create the embedding space. Again, if you created the embedding space, what about new unseen entities? The whole embedding space has to be recreated to encounter these new tables. This seems like a real specific fix for this particular case but what about more long-lasting solutions?
- JenTab toolkit P12: Disambiguation contexts —> Is this something novel? Does e.g. mtab not also provide such disambiguation patterns?
- JenTab toolkit P12: “It is based on the premise that all cells within a column represent the same characteristic of the corresponding tuples.” —> Isn't this the purpose of a column in a table? Can the authors provide examples when this is not the case?
- JenTab toolkit P13: “Further to query the KG, we rely on the official SPARQL endpoint of Wikidata” —> So this implies JenTab is only applicable to Wikidata?
- JenTab toolkit P13: Generic Strategy —> This is the precomputed option right? If so, it might be a good idea to refer to the generic lookup
- JenTab toolkit P13: “We iteratively lower the selection threshold from 1 (exact matches) to 0.9 until a set of candidates is returned.” —> Why this option? Why can't JenTab not query all the candidates above 0.9 threshold and order them according to their similarity measure?
- JenTab toolkit P13: “All Tokens Strategy splits a cleaned cell value into tokens removing all stopwords. The lookup service is then queried with all possible orderings of these tokens.” —> How is this done? This requires some more information because there exist a large amount of tokenization strategies nowadays.
- JenTab toolkit P13: “Autocorrection Strategy uses the autocorrected value from the preprocessing to query the lookup service.” —> The authors previously mentioned that this was a deprecated option but at this point I’m confused wether it is or not.
- JenTab toolkit P13: “for the subject column” —> The author did not state the cases when there is no subject column?
- JenTab toolkit P13: “we retrieve entities from the KG that are instances of a subject column candidate” —> How did you determine these column candidate? Is it a CTA task that determines the column types first? Or is it provided directly in the benchmarks? If this is the case, this introduce a certain requirement for JenTab in order to work correctly.
- JenTab toolkit P14: “it will fetch all entities from the Knowledge Base (KB) that are directly connected to a subject cell candidate.” —> So it is reversed in the sense that JenTab queries the properties of the subject candidate and look for a match? What string distance function was used here?
- JenTab toolkit P14: “Multi Hops creates a more general tree of parents following subclass of (wdt:P279) relations.” —> Where did JenTab stop to build the tree? What was the root node?
- JenTab toolkit P15: CEA by string distance —> I don't understand the filter aspect here as many create functions also compare the labels using e.g. Levenshtein distances. What is the difference here?
- JenTab toolkit P16: “For the remaining candidates, we calculate the string distance to the original cell value using the Levenshtein distance” —> Like, again? Because in create and filter, this distance was also calculated to A) first come with a set of candidate B) filter them even more to a smaller set and now C) reduce the set to 1? It is from the current pipeline not clear why the second step (filtering) is needed here. I guess it is due to the iterative refinement of other cells that you keep these candidates and filter them accordingly. But this is not clear in the current and previous sections of the paper.
- JenTab toolkit P16: “Next, we compute the support for all candidates similar to the respective filtering-function” —> How is this support calculated? Can you update the figure to state why these number are 24? Is this due to the fact that you take into account the parent classes in the candidates? It is not that you first enrich the possible candidates with their parents?
- JenTab toolkit P16: “We remove all candidates with support less than the maximum” —> Maximum is here determined as the number of cells within a column? As this can be problematic if one cell has completely different column type candidates…
- JenTab toolkit P16: “it only considers the direct connections of an entity” —> Why? What is the reason behind this decision?
- JenTab toolkit P17: Default pipeline —> In my opinion, this should be the in the beginning of the Jentab Toolkit section where you explain your pipeline. Do not provide details within the function within the block but just represent them generically as group 1, 2, 3,..., 9. Each block has its functionality and contributes to the end goal. In subsequent sections. You can provide details about the different inner mechanisms of these groups based on create, filter and select functions. This is for me the core of the JenTab system.
- JenTab toolkit P17: Group 9 —> Why do you create candidates for CPA in group 6 as they are never followed by a select/filter operation in 7, 8 or 9?
- JenTab toolkit P17: Other pipelines —> just provide us schematic figures of these pipelines. This will let the reader more easily see the differences with the full pipeline.
- JenTab toolkit P17: “In particular, each step runs only once” —> What do the authors mean with each step is only run once? As I see CTA support is ran multiple times… In which group is it ran within the essential pipeline? Which parts are here neglected?
- JenTab toolkit P17: “It became necessary initially as some tables proved too demanding when executed using pipeline_full” —> What is too demanding in this context? Large amount of cells? Columns?
- JenTab toolkit P18: “Indeed, such method suits datasets that contain meaningful header.” —> Which relies on a header detection module for the table? Or is it based on additional information (datasets containing only tables with headers?)
- Evaluation P19: “we start with assessing the preprocessing during the “Type Prediction” step” —> I need some more information for the Type Prediction step. Do I have to see this as a ML learning model or also a heuristic?
- Evaluation P20: —> In an evaluation and results section and a description of these results and how they were obtained. I expect the interpretations of the results in a discussion section where the authors can provide a more subjective opinion.
- Evaluation P20: “This lookup represents our primary and only source to fix spelling mistakes since 2021.” —> So does this sentence indicate that you reran this code for the 2020 datasets or not? Or was this still the autocorrect results?
- Evaluation P20: Table 2 —> why are there no results for BioTables, BiodivTab and tFood here, this is also not explained in the discussion.
- Evaluation P20: Audit results —> The authors should give more information what the readers should understand from these results and how these results were obtained, what we can see in the plots and how we should interpret them. Why e.g. a log scale was used.
- Evaluation P20: “all benchmarks except HardTables where the “String Similarity” managed to solve all cases” —> And what about the tFood dataset? It is hard in the section to follow and understand the obtained results.
- Evaluation P22: Result tables —> What are we seeing in these tables? Is it the JenTab configuration at that particular year or a full rerun of the best possible configuration? Why are there no comparisons between JenTab versions and configurations. How can we now as a reader know which version/configuration is used? Please indicate within the tables and text.
- Evaluation P22: “For CEA, over the years, JenTab gained much higher scores on this dataset from 37% to slightly above 80%, given Wikidata as a target KG.” —> Is this also in a table?
- Evaluation P24: “In 2022, CEA results have significantly improved due to a sophisticated cleaning module for such datasets.” —> But did the benchmark tables remain the same? Is there a comparison on the same dataset for different version of the dataset to investigate the accuracy over different JenTab configurations?
- Evaluation P24: “We set up three different experiments to test the effect of CTA selection strategies” —> Why only CTA in this Runtime Performance evaluation?
- Evaluation P24: “the processing time for all four rounds” —> Four rounds of? What do you mean with rounds here? On which datasets is the evaluation made?
- Evaluation P24: “We excluded the “Multi-Hops” from further use and limit ourselves” —> I don't understand why. Isn't this like fixed? Or can this not be an iterative tree building process? Once you reach an already found hop, it is just adding a branch to a tree? I don’t understand why this is so computationally heavy in this case. It might be a good idea that the authors provide a clarification in the discussion.
- Evaluation P24: “Table 10 shows the runtime of JenTab during its participation in 2021 and 2022. This table shows the configuration yielding the best scores.” —> So these are the results of the whole pipeline? Or is it a subset of certain modules as this sentence confuses me.
- Evaluation P25: Result tables —> Might be good idea to highlight (bold) the top results such that it easy to compare JenTab with current best obtained state of the art results.
- Discussion P25: For me, This is not a discussion section at all. It does not reflect JenTab regarding the state of the art. We don't know if JenTab belongs to the group of heuristic models and how it compares against each group. Moreover, there is no reflection of time and memory consumption against the state of the art. How does it compare against e.g. DAGOBAH even if this method requires more workers. Even the effect of preprocessing the text values - labels (on the processing time) is ignored within the discussion here. Overall, the discussion does not show to the reader the strengths of JenTab. It is now yet another STA method. Why should I use JenTab. Is the fact that it can switch between KGs unique? Is the easy-to-use docker execution unique?
- Discussion P25: “For example, it provides an easy way to change the target KG and enable various settings for solving the STA tasks” —> Again as stated before, I did not see a direct proof for this in the paper... Does the code make generic api calls to a separate module or?
- Discussion P28: “Further, Chaves-Fraga and Dimou [68] used our artifacts to compare fully automatic systems versus the declarative mapping rules.” —>What does this imply exactly and why is this important for JenTab in the first place?
- Evaluation P29: Table 10 —> No BioTables for P31, why?
- Conclusion P31: Again, I miss some clear statements why we should use JenTab in the first place and what problems it eventually solved. By not providing such statements, it is hard to assess the novelty, applicability and quality of JenTab within the STA field.
- Conclusion P31: Also, a clear statement where we can find all JenTab resources is missing. Now the GitHub repository is only mentioned as a footnote within the abstract of the paper.

Minor:
P2: KG is used as abbreviation but not introduced before
P3: Semantic Table Annotation (STA) is for the second time introduces as abbreviation
P3: Missing ) for (Figure 2c
P4: Semtab is introduced for a second time
P5: Semtab is introduced for a third time
P6: “It is started with DBpedia lookup and endpoint in 2019.” —> I don't understand this sentence here
P7: “The authors use a regular AccessMediaWiki API search” —> reference of the AccessMediaWiki API?
P7: “It adopts the approach of generating comparison matrices, namely INK embeddings, to speed up computational efficiency” —> I think INK embeddings are generic and require its own citation?
P9: DAGOBAH-DL [52] Solves the three STA tasks The authors —> no . Between tasks and The
P11: We group them in “Data Cleaning” and “Datatype Prediction”. —> This is "clean cells", be consisting in Figure and text about how you call each process.
P11; fifty —> The creators of fifty stated within their repository that they would like to be cited according a given bibtex
P11 “Besides locating the entities of interest that would be mapped to CEA, such classification helps us develop a datatype-specific property matching technique. For example, we solve the CPA task differently if the given column is DATE or NUMBER.” —> I don’t understand this sentence here
P13: creation and filtration blocks —> What is a filtration block?
P13: “Autocorrection Strategy uses the autocorrected value from the preprocessing to query the lookup service.” —> The authors previously mentioned that this was a deprecated option but at this point I’m confused wether it is or not.
P15: “For create-functions depending on previously generated candidates, this can substantially reduce
the queries required and overall running time.” —> I do not understand this claim as the create function is independent of the filter?
P15: “We compute the support of a candidate as the number of cells in the same column it can be connected to” —> I don't understand the same column aspect here
P16: “We define popularity as the number of triples the respective candidate appears in them.” —> What do you mean with "appears in them"?
P16: “Given an example of Court Cases in R3,” —> What is R3 here?
P17: “It was applied in tasks that featured only CEA and CTA targets and omitted any CPA ones” —> So only applied a particular type of tables
P19: BioTables —> What was the target KG here?
P19: “we extract the unique values from all tables of a dataset and match them against labels (and aliases) of the target KG using an optimized Jaro-Winkler Similarity implementation” —> This information is already provided in previous sections
P23: “In contrast, for the synthetic dataset, HardTables 2021, DAGOBAH achieved the maximum F1-score 97.4%/99% (CEA / CTA)” —> Why are you mentioning HardTables results within the BiodivTab results subsection?
P23: “We gained our lowest scores on this benchmark as the case for other systems.” —> I don't understand this sentence
P24: “Starting from 2021, we hosted added a virtual machine” —> hosted added
P25: “addition, it supports an easy-to-use execution via Docker containers, as seen in Listing 1” —> This listing does not add anything to the paper in my opinion. It are just generic docker commands.
P27: Table 9 —> Why the evaluation is stated in days?
P28: “are open-source data and code, well-documented, and have open-source and reasonable dependencies.” —> I don't understand this as a benefit here. It was already mentioned previously.
P29: “The output format is shown in Figure 19 where Wikidata is the target KG. First, CEA results include the file/table name without extension, row id, column id (together they point to a specific cell), and the mapped entity from KG. Second, CTA results consist of file/table name, target column id, and the mapped semantic type or class from the KG. Finally, CPA output contains the file/table name, subject and object columns id, and their semantic property or relation from the KG.” —> This is just implied by the semtab challenge i assume? What does this information bring to the discussion?

Review #3
By Franck Michel submitted on 17/Mar/2025
Suggestion:
Minor Revision
Review Comment:

JenTab carries out the semantic annotation of tables (web table, csv spreadsheets etc). It is the result of development of several years. This articles provides a detailed presentation of the methods, architecture and strategies implemented in JenTab. The paper first goes through a comprehensive technical description of the modular design of JenTab (that allows to describe different pipelines tailored to specific contexts). Then it describes the benchmark datasets and provides a comprehensive evaluation of JenTab's performances in terms of accuracy and runtime, and a comparison with other SotA approaches.

JenTab is certainly an impressive work and is the fruit of a very substantial development and refinement effort. The article is well written and easy to follow, however I fail to understand what ne contributions this article brings, apart from a detailed summary of multiple other articles published over the years in conferences and workshops.

The related works provides a description of the other candidate approaches that competed in the SemTab challenge. Regrettably, this remains a sheer list and does not come with a side-by-side description of how JenTab compares with these approaches. This is however completed with several performance comparison tables in section 5.

There is no insight into how to reuse JenTab to annotate tables with a custom target knowledge graph, that is, other than Wikidata and DBpedia. Sections 6 and 7 mention that this is possible, yet we have no idea of how simple/complicated this is. For instance, is the lookup service dedicated to Wikidata? What does it take to use another graph, is there some preprocessing to be carried out? Does this require large hardawre resources? etc.

In the perspectives, I would like to see a discussion on how LLMs may change the landscape of Semantic Table Annotation approaches, if on-going works rely on LLMs to carry out STA or at least steps thereof, and whether we can expect a real change in the performances of the current approaches.

Sentence "JenTab has a reasonable amount of open-source dependencies" is very curious. What does "reasonable" mean here? What is the impact? Is that to say that there are some reuse issues because of non open-source dependencies? Please elaborate further.

The authors provide URL https://github.com/fusion-jena/JenTab as a "Long-term Stable Link to Resources". Note that Github cannot be considered as Long-term and Stable. See for example what happened to Google Code. Also, Github does not provide with a citable permanent identifier. I would strongly recommend to back up the repo to Software Heritage (which gives a SWHID) or Zenodo which gives a DOI.

Typos:
p3 line 31 "or Entity (Figure 2c": no closing parenthesis
p8 line 33: tackels -> tackles
Tables 3 and 4: the legend mentions columns R and AR but there are no such columns.d