Network representation learning method embedding linear and nonlinear network structures

Tracking #: 2660-3874

Hu Zhang
Jingjing Zhou
Ru Li

Responsible editor: 
Guest Editors DeepL4KGs 2021

Submission type: 
Full Paper
With the rapid development of neural networks, more attention has focused on network embedding for complex network data, which aims to learn low-dimensional embedding of nodes in the network and how to effectively apply learned network representations to various graph-based analytical tasks. Two typical models are the shallow random walk network representation method and deep learning models like graph convolution networks (GCNs). The former can be used to capture the linear structure of the network using depth-first search (DFS) and width-first search (BFS). Hierarchical GCN (HGCN) is an unsupervised graph embedding that can be used to describe the global nonlinear structure of the network by aggregating node information. However, the two existing kinds of models cannot capture the nonlinear and linear structure information of nodes simultaneously. Therefore, the nodal characteristics of nonlinear and linear structures were examined in this study, and an unsupervised representation method based on HGCN that joins learning of shallow and deep models is proposed. Experiments on node classification and dimension reduction visualization were carried out on citation, language, and traffic networks. The results show that, compared with the existing shallow network representation model and deep network model, the proposed model achieves better performance in terms of micro-F1, macro-F1, and accuracy scores.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Johannes Lutzeyer submitted on 30/Apr/2021
Major Revision
Review Comment:

Overall Comment:
In the submitted manuscript the authors address the challenge of unsupervised network embedding. They propose the Hierarchical Graph Convolutional Network (HGCN) architecture to perform this task. The HGCN combines embeddings from shallow and deep neural network models and includes two GCNs. Different existing embedding models can be combined in the HGCN giving rise to the different HGCN variants, the HGCN-L and HGCN-DL. The manuscript addresses an interesting problem and the idea of combining different models, and especially local as well as global information of the network, is a promising solution approach. Unfortunately, I found it very difficult to read the submitted manuscript. In many places the language is imprecise, concepts are left undefined and the mathematical notation is undefined or abused. I also believe that a more quantitative assessment of the results is required for publication. In my opinion the presentation of the submitted material needs to be fundamentally revised. Therefore, I have to conclude that the submitted work is not publishable without mayor revisions.

Specific Points:

-In Section 2:
--- In Section 2.1 you discuss PCA and MDS applied to the adjacency matrix as traditional network embedding methods. Potentially it might be suitable to add a discussion of the spectral clustering technique or more generally spectral embeddings for dimension reduction as these are much more commonly applied to the the Laplacians and adjacency matrix than PCA and MDS.

--- In Section 2.2, it is unclear to me what you mean by the word expand in "Nevertheless, unsupervised graph embedding is difficut to expand and [...]" There is several other places throughout the manuscript where imprecise language is used.

-In Section 3:
--- It would be nice to discuss the complementarity of your definitions of "Local Structural Information" and "Global Structural Information" or alternative measures. Since your definitions appear to be model dependent it would be good to provide further intuition about these concepts. Furthermore, you mention "linear structure" of graphs in several places. It would be good to more rigorously define what you mean by this.

---The discussion of the equations is not sufficiently precise, e.g., you say "where u_i is the vector of v_i". It would be better if you explicitly stated how this vector u_i was obtained, how its content can be interpreted and that v_i denotes a given node in the network.

---I suggest renaming the second computational step of your model. Computations in GNNs are often referred to as being split into an aggregate and update step [1]. So it might cause confusion to call your second step containing several GNNs the Update layer.

---It is unclear to me how the two GCNs in your architecture interact. This is likely due confusion about your use of the word labels in the following sentence "Meanwhile, node labels Y′ are predicted, and the second GCN is used to learn the updated features E_D2 and labels. " Y' is not used anywhere else in the manuscript.

--- You should clearly define which mathematical operation you refer to as splicing since this operation is fundamental to the combination of the different embeddings obtained in your architecture. You should furthermore add a discussion of how this operation alleviates the issue of oversmoothing in GCNs. It also seems too general to me to say: "after the second training iteration, the trained GCN model will have an embedding that is too smooth, [...]". A more nuanced discussion of this problem would be nice.

--- In Section 3.2.2 you write "In the real world, due to the lack of privacy and tagged data, GCN is not suitable for graphs with only structural information." Firstly, in my understanding a lack of privacy would result in a greater number of labelled datasets. Therefore, these two concepts don't seem to support a single point to me. Furthermore, GCNs can successfully be used for datasets only containing structural information (see for example [2]). So, your conclusion in the quoted sentence seems to be stated too generally.

- In Section 4:

---Your benchmark model DL should be presented in greater detail. The current introduction "We directly join the local linear embedding and global linear embedding to get the model, named DL." seems to be insufficient to me.

---You show the hyper parameters and details of the training of your baselines. It would be nice if you could provide the same information for your proposed models.

---In Tables 3 and 4 and in Figure 4 you should add estimated confidence intervals. Some of your conclusions are drawn based on small absolute differences and estimated confidence intervals could help reinforce your points.

---In the discussion of Figures 5, 6 and 7 you discuss the separatedness of classes within certain regions in the embeddings. It is unclear to me how these regions are chosen. It would furthermore be nice if you could quantitatively assess the seperatedness of the classes within these boxes by giving the number of samples belonging to the different classes contained in them. The current qualitative discussion does not always convince me, especially for Figure 7.

[1] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, & G. E. Dahl, "Neural message passing for Quantum chemistry," In: Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 1263 -- 1272, 2017.
[2 K. Xu, W. Hu, J. Leskovec & S. Jegelka, `"How powerful are graph neural networks?," In: International Conference on Learning Representations (ICLR), 2019.

In addition I list several minor issues in the presentation now:

- the words "Linear "and "Nonlinear" are misspelled in Figure 1 and 2.
- the reference numbering in the text of Section 2 does not correspond to the numbering in the reference list.
- In Section 2.1 lower-case m and upper-case M are used to refer to the same quantity.
- I wonder whether the last paragraph in Section 2.3 discussing GNNs could be moved to Section 2.4 titled "Graph Neural Networks."
- Page 4 Line 9 an square bracket is opened and never closed.
- Section 3 Line 1 graphs are defined as G=(V,E), in Section 3.2.1 as G=(V,A), where A seems to denote the adjacency matrix, and in Equation (6) they are defined as G=(V,E, S_{S-G}(W_v_i),Y). This is a clash of notation. Also in the definition G=(V,E) the accompanying text can be misunderstood to mean that V denotes a single node and E a single edge.
- Section 3 Paragraph 1, it might be nice to give the domain of E_s, E_D and E_{All} to give the reader a better understanding of these quantities.
- Section 3.1 Paragraph 1 you discuss the eigenvectors of the Laplacian as allowing the observation of network characteristics in the absence of structural information. However, the eigenvectors are obtained from the Laplacian defining the structural information. So it seems misleading to me to say "in the absence of structural information."
- Following Equation (1) you write "p(.,.) is a nxn vector". I think you mean to say matrix and I could not find p(.,.) in another place in the paper.
- Section 3.2.1 following Equation (4) it seems more appropriate to me to say that P is a normalised adjacency matrix and "th" should be placed in the superscript of l and (l-1).
- Section 3.2.1 trainable weight matrices in the GCN are denoted by lower-case w and uppercase W. You should pick one notation.
- In Equation (4) you show the computations made in a convolutional layer of a GCN not a graph convolution. The graph convolution operation does not involve an activation function or trainable weights.
- In Equations (6) and (7) the notation of w is inconsistent being both subscripted by v_i and taking v_i as a functional input, respectively.
- The table and figure captions are are uninformative. More detail might be better.
- Table 2 should be redesigned. It is very difficult to extract information from it in its current state.
- Figure 6 and 7 are distributed over several pages. This makes it diffult to read them.
- In the discussion of Figure 7, it is unclear to me what you refer to by the "middle edge" of the figure and "blurred" classes.
- In Figures 5, 6 and 7 you should order your legends for clarity of exposition. Furthermore, it might be better to chose a smaller marker size as many samples cover each other, which makes it difficult to feel comfortable with your visual assessment of these figures.
- In several places throughout the manuscript you use abbreviations without ever spelling them out such as AP clustering.

Review #2
Anonymous submitted on 25/May/2021
Major Revision
Review Comment:

The paper presents a new approach for graph embedding combining linear and nonlinear network structures, that intends to overcome shortcuts of existing shallow and deep learning models.
The approach is based on Graph Convolution Networks (GCNs), a deep learning model for network representation, and more specifically Hierarchical GCNs (HGCN). Two methods are proposed: HGCN-L which extends HGCN with local linear structure embedding from shallow models, and HGCN-DL extending HGCN with global and local linear structure embedding from shallow models.

The local structural information is observed by the first-order proximity similarity given by edge information in the network. The LINE method is used to describe the local shallow linear structure.

The global structural information is captured by random undirected graph walks and using the language model skip-gram to learn node embeddings. The Deep-Walk algorithm is used to describe the global shallow linear structure.

The approach is evaluated with four different datasets on two downstream tasks : clustering and dimension reduction visualization. The experiments on the clustering task show in most cases better performances of the proposed model in terms of micro-f1, marco-f1 and accuracy. Experiments on 2-dimensional projections show enhanced capabilities to extract distinguishing features of labeled nodes.

Overall, the idea presented in the paper is interesting. However, the writing quality should be revised. The reader can more or less follow the ideas flow through the paper despite a structuring of paragraphs and sentences that can sometimes be confusing. Originality and significance of results are moderate.

Section 1: Introduction
The flowchart of figure 1 shows by the end a fine tuning step that is not presented in the text. What does this step consist of?
Typo: in figure 1, ‘a’ is missing in Nonliner and Liner.

Section 2: Related work
The first sentence in section 2.2 is not clear, should be re-written.
Sentence “The typical methods.. specific professional fields and manual..” : give some examples of extracted fields and how this is done.
Typo: “The other method primarily..” -> The other category of methods primarily examine..
Based on which parameters the optimization is done in “ solving an optimization problem, which enables..”
In “..can ensure more comprehensive features.. closely related to the prediction network accuracy.”, how this is concluded : comprehensive features, closely related, etc. I guess this is described in citation 10, but more concrete details should be given here.
Give metrics or references to “unsupervised graph embedding is difficult to expand and has high training complexity.”
In 2.3, the notions of first-order and second-order similarity are used many times but never introduced.
In 2.4, typo: “it can make GCN..” -> more efficient by providing generalized inference, “[while both are..” -> while both are..
What the correct extra feature refers to in “M3S uses the correct extra feature..”

Section 3: Model
In 3.1, the section starts with a state-of-art description mixed with the rest, making it difficult to distinguish the proper limits of the proposed approach.
Typo: in 3.1 “ linear representation LINE..” -> linear representation. LINE..
Typo: in paragraph (1) “..pair of undirected (i,j)..” -> should be pair of undirected edges ?
In the line following formula (1), it is not clear what p(.,.) refers to ?
Typo: in paragraph (2) “..Global network embedding..” -> global
In 3.2.1, the formulas are not clearly defined and symbols are not used rigorously. For example, A in G=(V,A) refers to what exactly ? the adjacency matrix ? it seems to be the set of edges, which was earlier referred as E.
In 3.2.3 the whole section (sentences structure) should be revised.
The same remark for figure 2, add ‘a’ to Nonliner and Liner.

Section 4: Experiments
In 4.2, the choice of the cited models as baselines is not argued, what are their differences with the proposed approach and what does the comparison with each of these models bring in terms of evaluation of the approach?
In table 2, why this choice of models parameters ? we know that the performances of such models depend in major part on this choice. Are they determined experimentally? A random choice can affect the obtained results and distort the evaluation.
Typo: in 4.4, “ is 0,6% higher than HGCN in micro-f1..” -> micro-F1
In 4.5: English to be revised.
In 4.6: typos in the first paragraph -> “For a more intuitive comparison model, three different kinds of datasets.. language networks, are visualized..”
In 4.6.1: what the categories correspond to in Citeseer network ?
In 4.6.2: typo “Figure 6 (d) shows that HGCN-DL..” -> Figure 6 (d) shows that for HGCN-DL

Review #3
Anonymous submitted on 27/Jun/2021
Major Revision
Review Comment:

This paper presents a method to address representation learning by creating unsupervised graph embeddings using a method called Hierarchical Graph Convolutional Networks (HGCN).

The idea seems interesting and to my knowledge is novel enough. On the downside, the paper has many problems regarding clarity and in particular about the evaluation setup. I think that for this reasons it must undergo a major review process.

Here there are some more detailed comments:

- Fig.1 and 2: the words "liner" -> "linear" and "nonliner" -> "nonlinear"
- p.4 "[while both are derived" unnecessary square bracket?
- Sec.4: you should use cursive any time you use a variable in text (q, p, d...)

Node classification experiments: please clarify what is "node classification" in this context and how it is performed. Is it that some nodes are removed from the graph and then they are reassigned to the correct "parent" node? How many of them?
Without this information it is almost impossible to understand the contribution from the experiments.
Moreover, if there is the need of a "parent" node, it seems that the method could only be used for taxonomies or taxonomy-like networks in which each node has a parent.
- Table 4: some scores are higher than those highlighted (ex. Wikipedia and DeepWalk). Also, what the asterisks mean? You should specify it in the caption to improve the understanding of the table.
I suggest also to have the paper reviewed by a native speaker. I'm not either, but some constructions seem odd.