Review Comment:
The authors introduce a pattern mining approach for knowledge graphs (KGs) based on the Minimum Description Length (MDL) principle. It aims to mine a relatively small number of compact patterns, which can represent frequent patterns in the KG. Informally, a pattern is a labelled graph that can be embedded into the KG via an injective function with its vertex and edge labels subsumed by those of the mapped vertices and edges in the KG. For example, "a book is authored by a person" is a pattern which may have many occurrences (or embeddings) in a KG about books and authorship. Graph mining methods based on MDL have provided ways to calculate the description lengths of patterns for the selection of sets of patterns with minimum lengths. Yet, no MDLbased pattern minding methods have been proposed for KGs previously.
The authors adapt and extend the graph mining method GraphMDL+ [18] for KGs, as GraphMDL+ does not restrict the shapes of the patterns. To achieve this, the proposed approach first preprocesses the KG and then searches for patterns based on the adapted MDL metrics. Intuitively, the search starts with a large number of primitive patterns and iteratively combines them to form more representative patterns. While the algorithm looks intuitive, I have two major concerns. First, it appears that the code table, i.e., the list of patterns, keeps increasing as new patterns are iteratively added (Alg. 1, l9). I would expect the combined patterns would replace the original patterns, instead of simply being added. If the code table only increases in size, how can the description length reduce (Alg. 1, l10)? Maybe I missed something important here.
My another concern is that the search performed in such a straightforward manner appears to be inefficient. The initial code table is already very large, which contains the singleton patterns corresponding to all the vertices and edges of the KG. Then, the number of combinations between each pair of existing patterns is also huge (Alg. 1, l6), while many of the combinations may turn out to be undesired. At some points, I assume it needs to decide whether two patterns in the code table are equivalent through graph isomorphism, and maybe even pattern redundancy through injective homomorphism. Moreover, the computation of description lengths after each new pattern is added to the code table (Alg. 1, l10) is rather expensive. I wonder how the algorithm could scale over mediumsized KGs. The only heuristic mentioned is used to select pattern combinations (Alg. 1, l6), where only a reference is provided. It is unclear to me whether such a heuristic has a major impact on the efficiency of the search algorithm.
The computation of description lengths is rather involved, but it is left unexplained what such a complex computation achieves. That is, the intuition behind the equations in Fig. 6 should be explained. For instance, I wonder why a simple count of the occurrences of symbols is insufficient. I am sure the design of the computation has much merit, which is based on the research on MDL principle. Yet, it should be explained for readers who are unfamiliar with MDL literature.
For the experiments, the choice of datasets could best further explained. It is unclear why some widely used datasets extracted from common KGs are not considered, such as FB15K237 from Freebase and NELL995. Again, the numbers of patterns in Tab. 2 suggest that during the search, besides adding new patterns, some old patterns are removed. It is also interesting to know the range of sizes of individual patterns in terms of vertices and edges.
Overall, the idea of computing representative patterns of relatively small numbers and sizes is interesting, and the experiment results also look promising. Although the approach is based on GraphMDL+, the authors have clearly explained the novel contributions compared to GraphMDL+. The paper is generally easy to follow, but I have some doubts on the pattern search algorithm as mentioned above. Also, some clarification on the experiment design and results would be desired.
Some detailed comments are as follows:
 p1, l23, the powerlaw may worth a brief explanation when it is first mentioned.
 p1, l27, a "humansized" set of patterns is mentioned several times but it does not sound right. It sounds like the size of a human being. Maybe "humancomprehensible"?
 p2, l6, it says existing graph mining algorithms were designed for small undirected graphs such as molecules. I do not think this claim holds for the significant amount of research on network mining, such as social networks.
 p6, l26, rdf:nil as subject => as an object.
 p6, l2930, can this property of the converted KGs be stated formally?
 p7, l27, port vertices are undefined, and are easily confused with ports.
 p12, l26, "more than 80% of the labels are described by a pattern" => by the patterns
