Review Comment:
Overall Comment:
In the submitted manuscript the authors address the challenge of unsupervised network embedding. They propose the Hierarchical Graph Convolutional Network (HGCN) architecture to perform this task. The HGCN combines embeddings from shallow and deep neural network models and includes two GCNs. Different existing embedding models can be combined in the HGCN giving rise to the different HGCN variants, the HGCN-L and HGCN-DL. The manuscript addresses an interesting problem and the idea of combining different models, and especially local as well as global information of the network, is a promising solution approach. Unfortunately, I found it very difficult to read the submitted manuscript. In many places the language is imprecise, concepts are left undefined and the mathematical notation is undefined or abused. I also believe that a more quantitative assessment of the results is required for publication. In my opinion the presentation of the submitted material needs to be fundamentally revised. Therefore, I have to conclude that the submitted work is not publishable without mayor revisions.
Specific Points:
-In Section 2:
--- In Section 2.1 you discuss PCA and MDS applied to the adjacency matrix as traditional network embedding methods. Potentially it might be suitable to add a discussion of the spectral clustering technique or more generally spectral embeddings for dimension reduction as these are much more commonly applied to the the Laplacians and adjacency matrix than PCA and MDS.
--- In Section 2.2, it is unclear to me what you mean by the word expand in "Nevertheless, unsupervised graph embedding is difficut to expand and [...]" There is several other places throughout the manuscript where imprecise language is used.
-In Section 3:
--- It would be nice to discuss the complementarity of your definitions of "Local Structural Information" and "Global Structural Information" or alternative measures. Since your definitions appear to be model dependent it would be good to provide further intuition about these concepts. Furthermore, you mention "linear structure" of graphs in several places. It would be good to more rigorously define what you mean by this.
---The discussion of the equations is not sufficiently precise, e.g., you say "where u_i is the vector of v_i". It would be better if you explicitly stated how this vector u_i was obtained, how its content can be interpreted and that v_i denotes a given node in the network.
---I suggest renaming the second computational step of your model. Computations in GNNs are often referred to as being split into an aggregate and update step [1]. So it might cause confusion to call your second step containing several GNNs the Update layer.
---It is unclear to me how the two GCNs in your architecture interact. This is likely due confusion about your use of the word labels in the following sentence "Meanwhile, node labels Y′ are predicted, and the second GCN is used to learn the updated features E_D2 and labels. " Y' is not used anywhere else in the manuscript.
--- You should clearly define which mathematical operation you refer to as splicing since this operation is fundamental to the combination of the different embeddings obtained in your architecture. You should furthermore add a discussion of how this operation alleviates the issue of oversmoothing in GCNs. It also seems too general to me to say: "after the second training iteration, the trained GCN model will have an embedding that is too smooth, [...]". A more nuanced discussion of this problem would be nice.
--- In Section 3.2.2 you write "In the real world, due to the lack of privacy and tagged data, GCN is not suitable for graphs with only structural information." Firstly, in my understanding a lack of privacy would result in a greater number of labelled datasets. Therefore, these two concepts don't seem to support a single point to me. Furthermore, GCNs can successfully be used for datasets only containing structural information (see for example [2]). So, your conclusion in the quoted sentence seems to be stated too generally.
- In Section 4:
---Your benchmark model DL should be presented in greater detail. The current introduction "We directly join the local linear embedding and global linear embedding to get the model, named DL." seems to be insufficient to me.
---You show the hyper parameters and details of the training of your baselines. It would be nice if you could provide the same information for your proposed models.
---In Tables 3 and 4 and in Figure 4 you should add estimated confidence intervals. Some of your conclusions are drawn based on small absolute differences and estimated confidence intervals could help reinforce your points.
---In the discussion of Figures 5, 6 and 7 you discuss the separatedness of classes within certain regions in the embeddings. It is unclear to me how these regions are chosen. It would furthermore be nice if you could quantitatively assess the seperatedness of the classes within these boxes by giving the number of samples belonging to the different classes contained in them. The current qualitative discussion does not always convince me, especially for Figure 7.
[1] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, & G. E. Dahl, "Neural message passing for Quantum chemistry," In: Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 1263 -- 1272, 2017.
[2 K. Xu, W. Hu, J. Leskovec & S. Jegelka, `"How powerful are graph neural networks?," In: International Conference on Learning Representations (ICLR), 2019.
In addition I list several minor issues in the presentation now:
- the words "Linear "and "Nonlinear" are misspelled in Figure 1 and 2.
- the reference numbering in the text of Section 2 does not correspond to the numbering in the reference list.
- In Section 2.1 lower-case m and upper-case M are used to refer to the same quantity.
- I wonder whether the last paragraph in Section 2.3 discussing GNNs could be moved to Section 2.4 titled "Graph Neural Networks."
- Page 4 Line 9 an square bracket is opened and never closed.
- Section 3 Line 1 graphs are defined as G=(V,E), in Section 3.2.1 as G=(V,A), where A seems to denote the adjacency matrix, and in Equation (6) they are defined as G=(V,E, S_{S-G}(W_v_i),Y). This is a clash of notation. Also in the definition G=(V,E) the accompanying text can be misunderstood to mean that V denotes a single node and E a single edge.
- Section 3 Paragraph 1, it might be nice to give the domain of E_s, E_D and E_{All} to give the reader a better understanding of these quantities.
- Section 3.1 Paragraph 1 you discuss the eigenvectors of the Laplacian as allowing the observation of network characteristics in the absence of structural information. However, the eigenvectors are obtained from the Laplacian defining the structural information. So it seems misleading to me to say "in the absence of structural information."
- Following Equation (1) you write "p(.,.) is a nxn vector". I think you mean to say matrix and I could not find p(.,.) in another place in the paper.
- Section 3.2.1 following Equation (4) it seems more appropriate to me to say that P is a normalised adjacency matrix and "th" should be placed in the superscript of l and (l-1).
- Section 3.2.1 trainable weight matrices in the GCN are denoted by lower-case w and uppercase W. You should pick one notation.
- In Equation (4) you show the computations made in a convolutional layer of a GCN not a graph convolution. The graph convolution operation does not involve an activation function or trainable weights.
- In Equations (6) and (7) the notation of w is inconsistent being both subscripted by v_i and taking v_i as a functional input, respectively.
- The table and figure captions are are uninformative. More detail might be better.
- Table 2 should be redesigned. It is very difficult to extract information from it in its current state.
- Figure 6 and 7 are distributed over several pages. This makes it diffult to read them.
- In the discussion of Figure 7, it is unclear to me what you refer to by the "middle edge" of the figure and "blurred" classes.
- In Figures 5, 6 and 7 you should order your legends for clarity of exposition. Furthermore, it might be better to chose a smaller marker size as many samples cover each other, which makes it difficult to feel comfortable with your visual assessment of these figures.
- In several places throughout the manuscript you use abbreviations without ever spelling them out such as AP clustering.
|