| Review Comment: 
 This paper proposes a method to predict the relation between two entities according to a context given in a text. Unlike existing methods, the method proposed in this paper is end-to-end trainable. The authors also show that their method has advantages when there are low training resources.  The paper addresses a relevant problem for the Semantic Web, which is extracting knowledge from text, and in the realistic setting of having low training resources. The authors also show that the proposed method has positive results, and the method introduces some original ideas. However, there are many issues regarding the writing of the paper that makes me impossible to fully understand the method, and even the problem. So, I recommend the paper for a major revision.  The major issue is that the problem is not clearly explained. Only in the Methodology Section, I understood some aspects of the problem. Equation one defines an optimization problem to predict the relation r that is the argmax for the probability P(r | x, h, t), where r is a relation, x is the text, and h and t are two entities that could be related. Since h and t are entities, it is implicit that there is a set of entities E, and that the entities should be referred to in the text x. It is not clear if there are some highlighted subtexts of x which are linked to the entities. For example, Figure 1 shows some examples of highlighted text but does not specify entities' existence.  Figure 1 has several issues that make it difficult to understand. A figure like this should distinguish between data and processes. There is an arrow going from the text I think is the input text to the Relation Instance Database. This could be interpreted as the Relation Instance Database was created from the text with a non-specified process. I guess that arrows represent processes between data because the retrieval process is denoted with an arrow. However, the relation extractor is not denoted with an arrow but with a box. This lack of uniform notation makes this figure very confusing. Also, the Relation Extractor has as input some text with some subtext highlighted, and a document that should contain a list of sentences extracted from the Relation Instance Database. In understand that this extraction from the Relation Instance Database is the Retriever (which is called “retrieval process” in Figure 1). So, the document should be z, and the text with two highlighted subtexts should be the tuple (x, h, t). So, the Relation Extractor has the parameters (x, h, t, z), which are the parameters of the function on the right side of equation 4. It required significant effort to understand this figure and make it compatible with what is stated in the text.  I understand that the authors propose replacing the probability expression P(r | x, h, t) with the model TextGenerationModel(r, x, h, t, Retriever(x, h, t, D)) defined in equations 3 and 4. Notice that I changed the notation “|” with “,” because “|” is only used for probabilities.  The output of the Retriever generates an additional parameter for the model, which is called prompt, but it should be clear that it is a text like x. Thus, the retriever outputs a text z, and not a set of instances as is suggested in lines 32-24, page 5.  In this paper, a retriever returns a text z, but in other works a retriever returns a set of instances. Maybe I do not understand what a retriever is.  It was not defined what database D is. On page 5, lines 25 and 43, it is said that D contains instances. So, I may guess that D is a set of instances. However, it was never explained what an instance is. Is an instance a synonym for entity or is an instance a text with two highlighted text spans, with a relation connecting them, as is depicted in the result of the retrieval process in Figure 1? I cannot understand what D is.  On page 5, line 22, it appears the text “verbalized input” without explaining what it is. Is there a non-verbalized input? If the input and outputs of the problem were formally explained, then these adjectives would not be necessary.  On page 6, line 36, you write “single discrete instance.” What does this mean? Is a discrete instance different from an instance? What is an instance? Equation 6 defines a set K. It also needs to say that dᵢ is in D (which is said in the previous page, but formulas must be self-contained). Similarly, this formula does not explain what sᵢ is. I can understand that it is a distance because it appears in Algorithm 1, but it is not clear enough.  There are several terms whose meaning was not defined: prompt, soft prompting, virtually selected, instance, and entity. If I am not wrong, the method samples the instances. How does this sampling affect the accuracy of the relation extraction? |