Review Comment:
This paper compares Transformer-Based Architectures and Large Language Models regarding the Semantic Event Extraction task. To this end, the authors develop two methods, T-SEE and L-SEE, which represent the Transformer-Based architectures and the Large Language Models, respectively. The design of these two models assumes the separation of the Semantic Event Extraction task into two tasks: the first classifies events and the second extracts relations. Additionally, the authors include two baselines: Text2Event and EventGraph.
Regarding the reproducibility of the papers, the authors provided access to the code, and a long-term preserving dataset published in Zenodo.
In general, I found this an interesting paper because comparing different approaches can provide new insight on the applicability of these methods. However, I have some questions that convince me to recommend the paper for a major revision:
Q1. Why do you define two datasets if you consider three sources? On page 1, line 32, you state that the sources are Wikipedia, Wikidata and DBPedia.
Q2. On page 3, line 19, you state that is a relation. However, this does not follow the general notion of what a relation is (from mathematics and database theory). In this case, what a binary relation is. The pair contains the information to state a relationship, labeled country, from the event to Poland. A relation is usually a set of relationships (or a set of tuples). You also do this on page 4, line 29, where you say that relations and edges are the same, instead of stating that a relation is defined by a set of edges.
Q3. I would not call Event Ontology to what is stated in Definition 1. More than an ontology, according to Definition 1, an event ontology consists of a pair of two sets. According to this definition, the pairs ({1}, {2,3}) and ({1,2}, {2,3}) can be called event ontologies. It seems that what you intend to define here is the vocabulary of the ontology.
Q4. On page 4, line 24, you should write ⊆ instead of = because you don't want to define R as the set of all possible relationships.
Q5. Suggestion: I do not recommend using $p_{type}$ to denote the predicate to define the types of elements because subscripts are less readable and people may think that p is another predicate. It would be simpler to either use write (e, type, C). Furthermore, currently your definition allows writing (e, type, d) where d is not in C. To fix the definition, you can say that triples with property type only allow elements of C in the third component. However, you can also avoid introducing such a restriction by encoding the types of entities as another relation T ⊆ E × C. Do you want to define types only for events or for any element in E?
Q6. Definition 3 is not clear on what is expected for the extracted relations. I can imagine the goal is to extract triples of the form (e_t, p, o) with p in P and o in E ∪ L. However, this is not explicit. One may also extract triples whose subject is not an event.
Q7. The problem statement should indicate what assumptions are made, and what data is provided to the models to learn the task.
Q8. In figures 1 and 2, you refer to queries as pairs . You should make explicit what these queries mean because the word query has a specific meaning in graph database systems, and I do not understand what your queries do.
Q9. You have an extra parenthesis in Algorithm 1, line 9.
Q10. In Algorithm 1, line 9, I do not understand why the method ECM.classifyEvents(t, O) returns a set of events. The name suggests it should return a set of pairs (e, C) or a set of triples (e, type, C) --see my question Q5-- where e is an extracted event and C is the class of the extracted event. However, in line 13, you introduce the notation e.c, which implicitly said you already assumed that the class is an attribute of the events generated in line 9. These notations are confusing because in Definition 2 you used V as a set of elements, which are a subset of E, and in Algorithm 1, the elements of V are objects that have attributes like c. That is, c is not a class, but the name of the attribute of the object e, which contains the class of e. You should stick with the notation that was already introduced.
Note that this is super confusing on page 8, line 24, where you write e_t.c = c. The symbol c is used with two different meanings in this identity.
Q11. On page 7, line 4, you write, "[...] additional constraints can be applied to remove queries from Q." Can be applied or are applied?
Q12. On page 8, line 1, you write, "traditional multilabel classification approaches [...]." The word traditional is subjective. Instead, you should cite the methods you want to refer to. You also use the adjective "traditional" in a vague way on page 9, line 37.
Q13. On page 8, line 25, in the definition of the set of queries, you should add space around the symbol | denoting "such that." You can use the latex symbol \mid.
Q14. I think that Figure 5 does not clearly indicate where the language model is accessed. For example, the event classification shows two arrows that end in the classified elements. Does this mean that the arrows represent the processing of the prompts by the LLM, and the separated results are combined into the classified events, or that the prompts are combined, and then the LLM returns the classified events? Are all the arrows calls to the LLM? What about the last arrow? It appears that there is no prompt associated with that arrow.
Q15. Sections 4.1 and 4.2 are so short. I would appreciate a more detailed description of the prompts used.
Q16. What LLM did you use in your evaluation?
|