# Continuous Multi-Query Optimization for Subgraph Matching over Dynamic Graphs

### Tracking #: 2782-3996

Authors:
Xi Wang
Qianzhen Zhang
Deke Guo
Xiang Zhao

Responsible editor:
Guest Editors Web of Data 2020

Submission type:
Full Paper
Abstract:
There is a growing need to perform real-time analytics on dynamic graphs in order to deliver the values of big data to users. An important problem from such applications is continuously identifying and monitoring critical patterns when fine-grained updates at a high velocity occur on the graphs. Various approaches for incremental processing of queries have been proposed over the years. Unfortunately, current approaches are mainly designed for a single query: optimizing and answering each query separately. When multiple queries arrive at the same time, sequential processing is not always the most efficient. In this paper, we study the problem of continuous multi-query optimization for subgraph matching over dynamic graph data. (1) We propose annotated query graph, which is obtained by merging the multi-queries into one. (2) Based on the annotated query, we employ a concise auxiliary data structure to represent partial solutions in a compact form. (3) In addition, we propose an efficient maintenance strategy to detect the affected queries for each update and report corresponding matches in one pass. (4) Extensive experiments over real-life and synthetic datasets verify the effectiveness and efficiency of our approach and confirm a two orders of magnitude improvement of the proposed solution.
Tags:
Reviewed

Decision/Status:
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 08/May/2021
 Suggestion: Major Revision Review Comment: The paper studies the continuous subgraph matching problem with multiple queries. The problem is important as the query is widely used in real-world applications. Overall, the algorithm proposed in this paper is good. However, some important problems should be addressed. The details are listed as follows. Thanks. 1. The problem definition is problematic: the definition says that "the query is to identify all satisfied query graphs". What does "identify all satisfied query graphs" mean? Based on Example 1, the query should report the positive or negative matches in the data graph, rather than output some query graphs. It seems that the definition contradicts the example. So the problem definition can be significantly improved. 2. How to merge all queries into a single query, i.e., AQG? I think this is the core design of the algorithm, while this paper fails to give the details. Section 3.1 says that AQG is the union of vertexes and edges of queries. However, it seems that it is hard to process some cases. For example, the first case is that queries contain multiple vertices with the same label. The second case is that a query is "A-B-C", and another query is "D-E-F". Then, how to merge them? The authors may want to give more details of the algorithm merging queries. 3. It seems that the authors misunderstand the design of TurboFlux. In Section 1, the authors claim that "It maintains candidate query vertices for each data vertex using a graph structure such that a data vertex can appear at most once". This is not correct. Actually, TurboFlux maintains candidate data vertexes for each query vertex, and a data vertex can appear multiple times in the auxiliary data structure. For example, given a query with two vertices u0 and u1 labeled as "A", each data vertex labeled with "A" can be the candidates of u0 and u1, respectively. Minor issues: - In Example 1, "When $\delta g$ occurs" -> "When $\delta g_1$ occurs". - In Section 2.1, "$E \in V \times V$" -> "E \subset V \times V". - In Definition 2, "homomorphism to" -> "homomorphic to". - At the last line in page 2, "graph homomorphism" -> "subgraph homomorphism". - At Line 9 in Algorithm 1, how can an operation be equal to an edge? The algorithm should be formalized carefully.
Review #2
Anonymous submitted on 21/Jun/2021