SINFIO – A User-Friendly Hybrid Semantic Search Engine

Tracking #: 1839-3052

Authors: 
Kinga Schumacher
Harald Sack

Responsible editor: 
Thomas Lukasiewicz

Submission type: 
Full Paper
Abstract: 
Hybrid semantic search aims to close the gap between fact retrieval and semantic document retrieval. The combination of facts and documents throughout the entire search process is required in order to exploit available information in various representation forms to full extent. This leads to numerous challenges such as combining fact and document retrieval, merging facts and documents into hybrid results, ranking of differently structured results, etc. Moreover, to achieve user acceptance, the complexity of the system must be hidden from the user. Especially in the case of novel systems, an understandable presentation plays a key role. SINFIO, the hybrid semantic search engine described in this paper, offers a solution to the challenges mentioned above. Besides proving the gain on effectivity over a hybrid semantic search solution which does not combine facts and documents throughout the entire search process, this paper also presents the results of evaluations with respect to ranking and the user interface. The user studies show a clear acceptance of SINFIO. Despite novel hybrid results, users prefer SINFIO over the solution that does not combine facts and documents across the entire search process, over standalone fact retrieval as well as standalone document retrieval.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 01/May/2018
Suggestion:
Major Revision
Review Comment:

This paper presents an hybrid semantic search engine to retrieve and combine both facts from semantic sources and documents as results.

The paper is well written and the topic is very relevant. I believe combining results from the IR and semantic fields is the way forward for search engines. The hybrid approach presented here combines structured and unstructured information to answer user queries. This isn’t novel and there is a lot of state of the art on hybrid semantic search engines, even from commercial companies such as google, which uses Google Knowledge Graph to improve the results and even to give straight answers to users queries if any. However, in this paper a user based evaluation is presented. This evaluation focuses on important aspects such as user acceptability / usability and efficiency in terms of better answering user information needs, in a way that is meaningful and that can be understood by users with no semantic background.

The authors also did a good work presenting the state of the art. They claim the novelty of the proposed approach is that structured and unstructured data are combined through the entire search process, while most approaches (such as Google KG) seem to present the facts and concepts independently from each other. Hybrid semantic search approaches, as defined in this paper, either include the partial results of one retrieval method into another (e.g., PowerAqua) or perform reconciliation at the end between matched facts and documents (K-Search), SINFIO claims to perform fact and semantic retrieval “interdependently”, to accept as input formal, informal and hybrid queries (basically through semantic autocompletion?) and in addition it can present results as facts, documents, or hybrid. This is shown nicely in Table 1.

In the architecture section (3.1) it isn’t clear to me what performing facts and document retrieval “interdependently” really means, this should be made a bit more explicit, i.e., does it means that semantic entities are used to find documents (ie. expanding the search / query expansion) and documents to find semantic facts ? or that both the entity search and the document index search are performed at the same time (using and combining partial / intermediate results ) instead of one after another?

To close the gap between the user and system knowledge a semantic autocompletion component is used to support the user create queries with as many formal parts as possible, to do that the formal knowledge is extended with lexically related words from WordNet. If the user does not choose one of the recommendations of the autocompletion , n-gram matching is applied to match a query term agains the KB (in this case Dice-Similarity is computed for the n-gram values).

In my view the key contribution of this paper is the user based evaluations that compare the user acceptability on the hybrid results with respect to a standalone fact or document retrieval, ie. probing the hypothesis that the complexity of the search process can be hidden from the user, while at the same time providing novel content that can be meanigfully understood by the users to answer user information needs more efficiently

Regarding the selected approach, a triple based fact retrieval and a graph-traversal (semantic activation) based semantic document retrieval algorithm have been chosen as methods to combine both fact and document retrieval. First label and synonyms of the matched resources in the query are used for query expansion. Then, this first set of ranked resources and documents constitute the starting point for the hybrid algorithm, if the fact retrieval is successful, a hybrid semantic search is performed, otherwise just a semantic document retrieval. Fact retrieval limited to two hops to avoid relevant inferences. In the case of 1 term query all resources and triples dependant on the resource the terms was (lexically) matched to are returned, if there is more than one term it iterates over two adjacent terms till all query terms are matched or no more terms match to existing triples, so as result a subgraph of triple sets is identified, where each triple is connected to another by the same subject or the same object (excluding classes).

The paper will benefit from a more detailed example on how the ranking is done, so it will be easier to follow the explanation on 3.2.3 and 3..3, eg., following the style of the example in figure 7 to see the weights assigned to the various nodes / edges.

The matched resources are then used the expand the query and perform a keyword search in the document ranking, in other words the entities found are used for the document search, why is this “interdependent” (e.g, the document search does not influence the entity search)? . Also, if I understood well, the entities found for the intermediate steps, which are not final answers (e.g., “I love Trouble” a movie starring Julia Robert but no directed by Garry Marshal), are used to find documents, if so what is the impact of these approach in terms of precision / recall (vs. just using the query terms and the final set of answers once all terms in the query are considered)?

In the examples Figures 8-12 it will be good to show what is the user input query.

The evaluations are generally thorough and convincing , and I think this paper will potentially get many citations because there aren’t many papers that perform user evaluations and consider usability, and that’s why for me this is the key contribution. There are however some limitations that perhaps should be mentioned and discussed:

- only 20 NL queries are randomly selected from DBpedia logs, that is a very limited sample, these are also facts queries so it may bias the results presented here, eg., for queries that are a mix of facts and document base queries (ie not fully covered by DBpedia) the users may find the hybrid approach less adequate ..

- with respect to IR measures SINFIO is compared against just fact or document retrieval. This is OK to evaluate that the hybrid approach improves over an standalone approach, but the drawback is that it is difficult to asses what is the efficiency of the present approach with respect to other search engines / state of the art.

Basically, while the approach present here is interesting, there are are some design choices made, like for example: 1) when building the subgraph of connected triples by subjects / objects, connections by the same classes are excluded , 2) terms in the graph are searched by pairs, based on the adjacent term in the sentence, eg and (instead of ). The design choices are sufficiently justified in the paper but it is not known what is their impact of precision / recall. For example, properties are notoriously difficult to match and often ambiguous when users pose questions in NL, however to find the intermediate answers properties are used such as “director” with “Garry Marshall” in Figure 7. If no results are returned because the property could not be matched the combination “film” and “Garry Marshall” is not further explored . It is difficult to assess the impact of these design choices on precision / recall because there is no other system to compare to.

While I do not expect the authors to include another evaluation as their main purpose is to validate the hybrid approach with respect to a non hybrid one, I would like to know why they didn’t use known evaluation benchmarks and gold standards, such as QALD (if limited to fact queries with no aggregations, comparison, etc) or any of the freebase evaluation standards like simple queries. That would have made easier to see if the performance of SINFLIO is “comparable” to other NL based QA systems, at least in terms of fact retrieval for questions (even if the results across systems are not really directly comparable, but just to get an idea of what is the P/R achieved using a well known gold standard).

- Could you publish the 20 questions used in the evaluation?

- Could you comment on the performance of this approach in terms of the average time taken by the system to provide results to the users for the given queries ?

The hypothesis to be evaluated are nicely presented in 4.1 to guide the comparison of the hybrid vs the standalone approaches . Again the only issue I find here is that only 3 of 20 queries presented hybrid results, that is a very limited set to draw conclusions on the effectivity of the approach. Also, the P/R values presented in Table 2 are quite low. There is no baseline or other systems to compare too (besides the standalone semantic document / fact retrieval , which has even lower P/R values). I am not sure what to make up of these low values and I would like to see some discussion on this , why the F-measure for the semantic document retrieval is only 0.35? and why the precision in SINFIO seems to be so low (good recall but less than 0.5 precision)? how could this be improved? would a simple index document retrieval be more precise than a semantic document retrieval (with similar recall) ? is this because of the lack of methods to solve ambiguity when no facts are retrieved?

The authors mentioned two adverse points, mainly regarding the completeness of the answer, I do not see this as a big issue , as the interface could clearly show if there are more than 10 results and therefore allow the user to explore them (if I understood that well to me is more of a UI issue that can be fixed ), however the low precision value is an important issue to discuss here.

The semantic autocompletion validation is based on 5 predefined questions, could you explain how and why those 5 questions were selected? Here metrics like task duration are given too, but again I am not sure how meaningful these are without a baseline to compare to.

Despite the low number of queries, I find the validation on the users preference for the hybrid results to answer their information needs convincing, however, I am less convinced by the validation of the semantic autocompletion. Semantic autocompletion, while useful to avoid ambiguity, is known to be cumbersome when the knowledge bases to query are very large and ambiguous. This is because the user could easily be overwhelmed if there is a long list of candidate possibilities to complete their sentence or when their choice is either not there (or it is rank very low in the list). It also requires that the users express the query in a way that follows the ontology structure, which may imply having to reformulate the query several times. The 5 examples presented here may not reflect on this issue and I believe a large set of examples is needed for this validation, as well as to show more clearly the advantages and disadvantages of autocompletion. For example, how do the users react when autocompletion fails to present the right choice (in particular taking into account that the coverage of WordNet fo find lexically related words is limited, and that ambiguity introduces lost of noise) ?

In sum I do really like the topic of the paper, the directions taken to create and present an hybrid approach, and its really good to see that lot of attention has been taken to evaluate the usability aspects of the system with users. This is worth of publication, however the limitations should be noted and the discussion needs to be expanded on various aspects .

Review #2
Anonymous submitted on 15/Aug/2018
Suggestion:
Major Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The following paper presents SINFIO, a hybrid semantic search engine. While the system proposes a very interesting idea, that of merging structured and unstructured information to better satisfy the user’s information needs, it is unclear what is the key contribution of the work (since the architecture of the system has been presented in a previous work - reference [16] in the paper - http://www.scitepress.org/Papers/2011/33407/33407.pdf) It seems the paper presents some new evaluations of the system. However, this extension may not be sufficient to considered the work for publication. I recommend the authors to clarify what are the key contributions/extensions/innovations of the paper with respect to the aforementioned work. The explanation of the system/approach also doesn’t seem to be self-contained and hence reproducible by other researchers. A demo/video of the system is also not provided.

Detailed comments
------------------------
Regarding the state of the art review, the paper stresses as key difference/enhancement between SINFIO and previous hybrid approaches the fact that that SINFIO “combines facts and documents across the entire search process”. It is however unclear within the paper what does this means in practice. In the introduction the authors mention that previous approaches: (i) analyse results independently and (ii) the search possibilities are highly limited. They stated that SINFIO not only finds facts and documents but also uses relations that exists between the different items of data at every state of the search process, and integrates them into the search results. I have carefully read sections 2 and 3 to understand these statements, but it is still unclear to me what the authors mean by “combining facts and documents across the entire search process” (do you refer by this to the spreading activation step?).

The paper compares SINFIO against three previous approaches PowerAqua, K-Search and CE2. However, please note that other hybrid approaches exist (or existed) that may be worth mentioning (Faceted search systems [4], commercial systems, such as Hakia, Powerset, Search Monkey, etc. and academic systems [2,3] etc.). Also, some of the statements about the presented systems are not entirely correct. For example, PowerAqua allows the user to refine the query by selecting concepts/triples. You may want to take a look to the following paper about PowerAqua [1] as well as some of PowerAqua’s available demos http://technologies.kmi.open.ac.uk/poweraqua/

The contribution of the paper focuses on the online process of SINFIO (i.e., the system assumes that a search space exist where structured knowledge has already been extracted and annotated with documents). However, the architecture of this system has already been presented in a previous paper (reference [16] - http://www.scitepress.org/Papers/2011/33407/33407.pdf) So it seems the contribution of this paper is reduced to the different proposed evaluations.

Some details about the presented approach (search matching/ranking) also seem vague or incomplete and hence not reproducible. For example, section 3.2.1 mentions that the KB is extended with WordNet (but there is no mention on how this is done). Similarly, it is mentioned that synonyms, abbreviations, homonyms, variants in spelling, singular/plural and phrases must be recognised in order to identify concepts (however, there is also not an explanation on how this is done)
In section 3.2.2. it is unclear how dice is used when unigrams are considered as query terms (since dice considers the number of n-grams shared by two strings- reference [24] of the paper)
Similarly, in this section it is mentioned that query templates are used to perform SPARQL queries, but it is unclear how these templates are created. Looking at the function findStatements it would seem that you only aim to find triples where both terms exist (is that correct?)
In the matching of a query term to a literal (equation 5), do you restrict the matching to the label of the resource? Note that if you allow the query term to match with any literal associated to a resource you may be producing erroneous matches. The formula does not seem to specify any restrictions.

In section 3.2.3 it is unclear: (1) how documents are instanciated and made part of the KB, (2) how the initial weights are set and (3) how the final set of resources is selected (it is mentioned that high ranked resources are selected but it is unclear what high means in this context (is there a threshold / fixed number, etc.)

In the semantic document retrieval it is mentioned that only document nodes are originally activated when no facts are found (but it is unclear how the mapping is done from the original query to the documents when no facts are found – see Figure 2).

In terms of evaluation (section 4) the authors select 20 queries from a previous work where 485 queries are specified (reference [37] in the paper) however there is no description which 20 queries where selected and why those 20 queries and no others.

SINFIO is compared to “a hybrid approach which do not combine structured and semi- or unstructured content throughout the entire search process” (FSDR). However, no description is provided of how FSDR is implemented

References
----------------------
[1] Fernández, M., Cantador, I., López, V., Vallet, D., Castells, P., & Motta, E. (2011). Semantically enhanced information retrieval: An ontology-based approach. Web semantics: Science, services and agents on the world wide web, 9(4), 434-452.
https://www.sciencedirect.com/science/article/pii/S1570826810000910
[2] Mayfield, James, and Tim Finin. "Information retrieval on the Semantic Web: Integrating inference and retrieval." Proceedings of the SIGIR Workshop on the Semantic Web. 2003.
[3] Cohen, Paul R., and Rick Kjeldsen. "Information retrieval by constrained spreading activation in semantic networks." Inf. Process. Manage. 23.4 (1987): 255-268.
[4] Arenas, M., Cuenca Grau, B., Kharlamov, E., Marciuska, S., Zheleznyakov, D., & Jimenez-Ruiz, E. (2014, April). SemFacet: semantic faceted search over yago. In Proceedings of the 23rd International Conference on World Wide Web (pp. 123-126). ACM.

Review #3
Anonymous submitted on 24/Sep/2018
Suggestion:
Major Revision
Review Comment:

The article proposes SINFIO, an engine for semantic seach on a combination of knowledge bases and text. This is an interesting and relevant and timely topic. Also, the article covers a lot: a search paradigm is proposed, a new system is designed and implemented, the system is evaluated and there is even a user study. However, there are three major weaknesses, which need to be addressed very carefully before this article can be published:

1. Strongly related work is not mentioned, let alone compared to. This is almost inexcusable, since a simple google search of the keywords "semantic search text knowledge bases" (or similar searches) would have pointed the authors to several missing papers.

2. The authors also touch on translating natural language queries to structured queries or unstructured queries or combination of the two. These are huge fields of research and the authors don't even start to survey the state of the art in these fields, let alone comparse themselves to these works. I guess, the authors are simply unaware of this. See below for some more details.

3. The proposed system is evaluated only against variants of itself or simple baselines. I leave it to the editor to judge this. In the research communities familiar to me, this would be unacceptable: if there are related systems (and there are many in this line of work), you have to compare with at least one or several state-of-the-art representatives. All the more so, if demos and code are available (like, for example, for Mimir and Broccoli and Qlever). What's the point of just proposing "another system". Authors have to clarify in which way it differs from previous systems and where it is better and if it is better.

4. What exactly can this search engine do? One has to read very far into the technical parts of the paper before this becomes clearer and it never becomes as clear as it should be. The article starts out very vague and then incrementally adds more detail here and there. This is very hard to follow for a reader.

5. What exactly are the contributions of the paper? Again, one has to read the whole paper and even then it does not become entirely clear because so many different aspects are covered and the "description part" and the "evaluation part" are not clearly separated in many places annd new elements and concepts are introduced throughout the whole article. And like I said, the conceptual differences and performance differences to previous work are not clear.

Here are some more details concerning these points:

@1: The authors seem to be unaware of the works on on ESTER [DBLP:conf/sigir/BastCSW07], Broccoli [DBLP:journals/corr/abs-1207-2615 + follow-up work] and QLever [DBLP:conf/cikm/BastB17]. These were all explicitly built for combined search on text and knowledge bases, which is the core problem considered in this article (the authors call it "hybrid search"). Aforementioned systems also provide sophisticated autocompletion features, which is one of the selling points of this article. To me it seems that these systems are the closest competitors to the system proposed by the authors and the article has to deal with that. Also, aforementioned systems are mentioned in some of the papers cited in the example, for example [8]. I think it's fair to say that the authors haven't done their homework properly in this respect. There is a recent survey on semantic search on text and knowledge bases [DBLP:journals/ftir/BastBH16] which you might want to consult. This survey contains a whole chapter dedicated to what you call "hybrid search", discussing the state of the art and some key systems from the past.

@2: By allowing natural-language queries, the article also touches upon the areas of "Question Answering on Text" and "Question Answering on Knowledge Bases". There is a huge literature base for both of those. Again, the authors might want to check the survey [DBLP:journals/ftir/BastBH16], which has a chapter for both of these fields (and also one for the combination of the two). Actually, compared to the state of the art, the methods proposed in this article concerning the natural language processing are very simplistic.

@3: In the evaluation, SINFIO is compared to FSDR (which can be considered a simpler version of SINFIO), and two simple baselines (called "Fact Retrieval" and "Sem. Doc. Retrieval" in Table 2). This is not enough, the system has to be compared to the state of the art somehow, see above.

@4: The introduction is quite vague about what SINFIO actually does. For example, the introduction talks about "browsing facts and documents" (very vague, what exactly is that supposed to mean), "combining them during the search process" (dito), and "rank them in an appropriate order" (what is an approriate order). About the queries it is said that "SINFIO supports structured, unstructured and hybrid queries while the user formulates the information need using natural language queries". This also lacks concreteness and moreover sounds suspicious because the authors here list in one sentence all existing query modes and claim that their engine can do them all. Also, the authors repeat what their search engine does several times, but each time say something slightly different. For example, in the introduction it is first written that "SINFIO supports ... natural language queries", then it says "... query formulation (structured, unstructured and hybrid)" (natural language not mentioned here), and a bit later "we define hybrid semantic search as a process ..." (saying similar things on a different level).

I strongly recommend to have a very clear decription of the capabilities of the search engine in the introduction. Everybody reading this article wants to know that and should not be forced to read the whole article several times just to understand what the search engine actually does. This should be accompanied by one or two very carefully chosen examples. That is, one or two concrete(!) example queries by the user, with a clear description of what the search engine does with them in principle, and what kinds of results are then returned. Note that this can be done without having to understand the technical parts of the paper. Actually, that is the purpose of an introduction. And the problem should really be defined ONCE and CLEARLY, and not several times unclearly and each time a little different.

@5: The introduction should provide a list of the concrete contributions and results of this work. It should be in one place, clearly marked as such, and not scattered over the whole article.

There is a lot more feedback to give on a lower and local level. But giving the aforementioned shortcomings on the high level, I stop this review here. Let me just mention one minor thing from the introduction:

You mention the phrase "fact retrieval" and the word "fact" or "facts" several time in your article. While this is not uncommon, it is a problematic term in the context of knowledge bases because "fact" implies truth and there is no guarantee of truth in a knowledge base whatsoever. More common terms are therefore "triples" or "statements".

I leave it to the editor how to turn this review into a decision. My impression is that a lot of work has been done and it might well be worthwhile. But the article needs a complete overhaul to address the issues above. In particular, this includes significant additional work. This could be done in the form of a re-submission from scratch or in the form of a major revision. Either way, I want to encourage the authors to do this, good luck!