A systematic literature review and classification of approaches for keyword search over graph-shaped data

Tracking #: 3789-5003

Authors: 
Leila Feddoul
Frank Löffler
Sirko Schindler

Responsible editor: 
Mehwish Alam

Submission type: 
Survey Article
Abstract: 
Knowledge graphs provide machine-interpretable data that allow automatic data understanding and deduction of new facts. However, machines are not the only consumers of such semantic data. Human users could also benefit from graph-structured data by browsing and exploring it to detect interesting associations and draw conclusions. To achieve that, methods that allow for search over knowledge graphs are highly sought after. Keyword search is an intuitive and common way to retrieve relevant data (e.g., documents) and can also be leveraged to search over knowledge graphs. In this survey paper, we derive the typical architecture of a system for keyword search over graph-shaped data, we formally define the problem, we highlight related challenges, and we compare to existing relevant surveys to identify the gaps. We conduct a comprehensive review of studies dealing with the topic of keyword search over graph-shaped data (e.g., knowledge graphs) following a systematic method. Based on that, we derive and define different aspects for classifying existing works. We also give an overview about how those systems are evaluated and highlight possible future research directions.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Stefano De Giorgis submitted on 06/Apr/2025
Suggestion:
Accept
Review Comment:

The proposed work presents a comprehensive survey on keyword search over graph-shaped data, particularly knowledge graphs, providing a systematic review of existing approaches, deriving a typical system architecture, formally defining the problem and related challenges, classifying works according to various aspects, examining evaluation methods, and identifying future research directions.

Given previous round of reviews, the paper seems now well organised, covering - to the best of my knowledge - relevant work.
Some minor improvements could be done, as in the followings.

p. 5 "Summary-based. Operate on a summarized version of the data..." --> it would be better to make explicit before this point *how* it can be summarized.
p. 14 "[16] state..." --> "states"
p. 18 "[42] aims at retrieving another type of results called r-clique..." --> this has been mentioned previously, either move the small explanation before, or rephrase it to connect it to previous mention

Review #2
Anonymous submitted on 09/Apr/2025
Suggestion:
Minor Revision
Review Comment:

The paper has been significantly revised compared to the earlier version. Overall, I am satisfied with the changes, though I have a few minor concerns outlined below:

- Section 2 could optionally include a general overview of the surveyed research papers and their statistics. While this is mentioned later in Section 7, providing a high-level view earlier would give readers better context on the scope and activity within the research space. This would help situate the reader in the broader landscape of work the survey covers.

- Figure 2 is a nice addition. However, the text should provide a clearer explanation of the figure, particularly clarifying where the discussion refers to schema-level or instance-level aspects.

- Section 8 has been improved in terms of readability and the connections made between the different papers. A minor improvement here would be to introduce the timeline of the various works, so readers can better understand how the different approaches have evolved over time. Citing papers using author names and publication years could help implicitly convey this, but explicitly referencing the years where relevant would also add more value.

- Additionally, in Section 8, it might be helpful to indicate how many papers are covered in each subsection in the beginning, as this grouping is not directly inferable from the tables.

- Finally, it would be worthwhile to add a few lines in both the Introduction and Conclusion about who the target audience for this survey is, and how they might benefit from it.

Review #3
Anonymous submitted on 01/Jul/2025
Suggestion:
Accept
Review Comment:

I would like to thank the authors for taking into account my comments in this new version of the article. I think the paper has now a better structure that allows the reader to gain significant insights in the field of keyword search over graph-based data.

I have only minor remarks:
- Section 2 could be enriched with a summary table comparing the existing surveys to the present one, allowing to easily position the current survey. In this version, the text and the numbered references sometimes makes the positioning difficult to grasp.
- Section 3: Keyword query paragraph also contains keyword nodes definition. I would rename it in "Keyword query and nodes" or separate the two.
- Average precision is defined "based on the calculation of precision and recall at every position" where only precision is considered in the formula.
- I find strange that time measures never appear in Table 1, where Section 6 discusses that approaches can be evaluated on their effectiveness and efficiency, where not all work consider both.
- I also find Table 1 to be a bit difficult to read where column labels such as Query creation and Ground truth involves the possibility to be manual, random, generated or extracted from a dataset, and the column "dataset" does not actually reference the test benchmark but the original dataset from which queries / ground truth are manually created or extracted (in the case of a test benchmark). I am wondering if renaming columns or adding additional columns may be useful to help quickly grasp the meaning.