Capturing the semantics of individual viewpoints on social signals in interpersonal communication
This is a revised submission after an "accept with major revisions". The reviews for the original submission are below.
Solicited review by Andreas Hotho:
Summary:
The main contribution of this work is a study of interpersonal communication signals within a job interview with the goal of better understanding the exchange of such social signals. Semantics is used to model the observed social signals described by a set of different observers. Based on the semantic model the divergent information are aggregated and combined. A framework is proposed and evaluated on a case study with 10 user commenting on 8 job interviews.
Overall the paper is readable, has a clear contribution and contains interesting findings. Unfortunately, the dataset used for the experiments is rather small, the method is not totally clear and related work can be improved. Therefore I can't recommend to accept the work.
Significance to Field
I'm not able to judge the significance to the field as I'm not an expert in social signal analysis. For the semantic web and social web parts I'm think that it is a interesting application. The prossed methodology and the made experiments gives promising insides into the application field and show what is possible with semantic web technology.
Relevance to Journal
The work is partially relevant to the journal as semantic web technology is used to solve the task of analyzing social signal. Unfortunately, I think that a bunch of details in the semantic part of the work are missing which lowers the benefit for the journal audience.
Methodology
It is not totally clear which steps are made manually and which are intended to be done automatically. The influence of the used ontologies and the made model decisions are not always clear and the whole experiment is very difficult to repeat due to missing information.
Data Analysis
Unfortunately, this is the weakest part. The paper uses a bunch of data and made a lot of analysis but the dataset is very small and contains, as far as I understood, so many manual steps that the concluding results are very difficult to judge.
Literature Review
The related work is not focussed enough and discusses topics which I think are not in the core of the paper. One main task, the extraction of information from the gathered comments is missing.
Writing Style/Clarity
The paper is readable but the structure could be improved by providing a better introduction combined with a motivation.
Details:
The introduction is well written and states nicely the main research question including the main contribution. Nevertheless I miss something. Therefore, I suggest to improve the structure of the paper. Section 2 could be integrated into the introduction to provide a better introduction and motivation for the analysis of social signals as the reader of the journal are not experts in interpersonal communication but rather in semantics. Such an integration should reflect the targeted audience.
Sec. 3.1: The description of the generic layer seems reasonable and quite a bit straight forward. This observation holds for the database schema as well and I think that the lessons learnt is rather small.
For 3.2: I think the description is reasonable and applies state of the art methods. Please provide references for the applied tools like the Stanford parser and explain why the filtering should be applied. The most critical issue is the word sense detection. As far as I know this is possible but by far straight forward. Either you point to a paper which describes the method you have applied [8] is only generic reference to Wordnet or you have to describe the method. Maybe it is also a misunderstanding and no WSD is done. Then I suggest to rephrase the last bullet in 3.2.1.
3.2.2: I do not understand, how WordNet is used to exclude words which are not significant for the domain. Please define what is meant by "significant" and provide a more concrete algorithm. Given the description I'm not able to rebuild your approach. The same holds for (ii) in sec. 3.2.2. Please explain in this situation the term information extraction as I do not understand what kind of information is extracted from what.
Please explain the algorithm in fig. 3 in more detail and provide a motivation of steps applied in the heuristics. I think it is good to have the pseudo code but it contains so many parameters which are not substantiated that I suggest to add such a explanation.
The way the knowledge statement extraction works as explained in sec. 3.3 is not totally clear. The description is to high level and not precise enough. Please explain the way the semantic information is utilized and how the benefit of the semantic information and the knowledge extraction helps to solve the application task.
Some comments to sec. 4. I really like the presented case study and the results are interesting but I think that the size of the dataset is rather small which limits the generality of the findings and the dataset is not available which does not allow to repeat the experiments. As the results are strongly influenced on all the additional resources like ontologies and so on these should be made available as well. It would be helpful if the urls of the analyzed videos would be made available as supplementary material.
Another issue is the way the steps are made. As far as I understood, most of the steps are done manually. If yes this should stated more clearly. Further, I think that it does not makes sense to add arbitrary additional steps like in sec. 4.3 the removal of the word "due". Either there are clear rules which can be applied and are generalizable or the results are difficult to interpret as the influence of very step is not clear. Please add a clear algorithm or a set of rules which allows to do the intended analysis.
Sec. 5: The results are quite nice and the quantitative analysis of the mismatches is very interesting. To be honest I do not understand the conclusions of the findings in sec. 5.3. What I got is that the semantic representation and processing is not precise enough to capture the information from the comments. Is this correct?
One of the question which came in my mind during I read the work was: Has the cultural background of the observer any influence on the statements he made?
Another issue of the proposed evaluation is the missing baseline which is needed to demonstrate the effect of the used semantic representation, the reasoning and the retrieved distributional words? One baseline cloud be e.g. a very simple and naive approach utilizing word list to detect important words but without this the effect of the utilized semantic information remains unclear and not measurable.
I'm not an expert in the area of signal processing but I guess that work in the area of
ontology based information extraction is another topic which is most closely related to the topics of your work. Regarding the extraction of semantics from tags I do not see the relationship to your method. Please add a better description here.
Minor:
the following reference seems to be broken: I. Fernández-Tobías, et al., "Ignacio Fernández-Tobías, Iván Cantador, Alejandro Bellogín," in Proceedings of the International Workshop on Semantic Adaptive Social Web, in connection with the 19th International Conference on User Modeling, Adaptation and Personalization, UMAP 2011, Girona, Spain, 2011.
Solicited review by David Vallet:
This paper presents a framework that analyzes user comments in order to extract concepts related to emotions and non-verbal communication (referred to as social signals in the paper). A social signal ontology is also introduced in order to represent non-verbal communication.
Whereas most of the paper is well-written and structured, I would recommend revising the abstract in order to clarify what is the goal of the paper. The first part of the paper (up to "This paper presents") confused me and it was not until I read the second part together with the introduction that I got an idea of what the paper was about. Also the abstract should clarify what is the purpose of using semantic technologies in this domain.
The inclusion of the experimental evaluation is welcomed, but I felt that this could be improved easily in order to analyze the benefit of introducing semantics into your proposed framework. There is a lack of a baseline (a simple one with no semantic processing would suffice), so there is no way of comparing your approach. Additionally, I would advise to also compare your approach with a similar approach without ontological inference of without DISCO. In this way, it can be clearer what and to what extent is the benefit of using additional information to perform the emotion extraction. I suppose this would need an increment on the number of users used in the study.
I would also like to see some examples of real life scenarios in which information from the Social Web can be used in your system. Your experimental setup seems to be quite specific, as it required the users to directly input the opinion related to video snippets about job-interview videos. What about real setups, e.g. YouTube videos pertaining to any category, or Twitter and Facebook feeds -- mentioned in the paper?
Although I could see why this paper was submitted to this special issue, sometimes I felt that it could fit better in a social science publication.
Minor comments
- Sometimes the flow of the text is truncated by a figure or Table (e.g., Table 12). Normally the flow of the left column would continue until the end of the page before changing to the right column. Better yet, figures could be group at the top or bottom of the page (similar to what LaTeX does)
- Sec 4.1.1 => "YouTube terms of user?"
- Sec 6 => "Table 15.It" => "Table 15. It"
Solicited review by Tsvi Kuflik:
The paper nicely presents a generic framework for automatic annotating video snippets using textual comments assigned by users. It is done as part of a larger scope, aiming at using these annotations for personalizing simulated training in the future.
Reading the paper, a few questions popped up. The first is about the need for such application. It is unclear to me why users that may comment on video snippets will not use available domain-specific and generic ontologies to annotate them and prefer to provide textual comments (and by doing that, simply skip the automatic process). Since I am not an expert in that domain, I am curious to understand that. I believe that this should be explained also in the motivation for the work.
Another question is regarding the "domain knowledge defined in the framework. Even though the paper describes a framework and aims at demonstrating it in job interview, there is nothing in the example that can be defined as "domain specific" surely behavior/body language can be interpreted in this context differently that in other contexts, but this results from the concepts and the comments, not from any domain knowledge base. Hence I fail to understand the "domain specific" part of the framework.
Another question is methodological one – the authors used available ontologies, "as-is", so there is no wonder that they are incomplete and relevant concepts are missing. I believe this could have been expected and should not come as a surprise given the recall results
Finally I wonder why there was no attempt to involve more experts than one for validating categories and two for evaluating the comments and why there was no additional effort invested in achieving higher level of agreement between the domain experts (using Delphi method for instance).
Since the paper presents a step in an ongoing larger project, I suggest that the authors will consider improving the methodological aspects (but this is beyond the scope of this paper though).
All in all, the paper nicely presents an interesting and what seems to be novel framework, but the work needs to be better motivated.
The paper itself needs some minor revisions as detailed below. It seems also that it can benefit from language editing.
Detailed minor comments:
The social web content provides an excellent resource to capture personal experiences related to real world human activities
The content of the social Web (?)
The abstract, until "This paper…" is unclear. I simply could not understand it I suggest re-writing the first part of it.
Introduction:
First paragraph: It is unclear what the following section refers to "This could be an expensive process if the collection and linking of these examples are done manually." (it seems disconnected from the previous section).
"Can digital traces from the social spaces be used to construct a model of the real-world activity and context, and how can this model improve adaptation in simulated environments for learning and enable intelligent content retrieval?"
I understand the first part – using real-world examples for modeling behavior in various situations and using it to improve simulation. How can it contribute to intelligent retrieval?
"this paper presents a novel framework approach…" A framework or approach?
"A generic framework is proposed in this paper…" – repetition
"In ImREAL, learners will be involved in simulated situations and perform activities that resemble actual job activities,.." : "job activities or job interview activities?
"Therefore adaptation can be enhanced, if taking into account that different personal experiences which may lead to diverse interpretations of situations, in this context, for understanding social signals as part of soft skills development." – complicated and unclear sentence
>>> Intro summary:
There is a contradiction between: "this paper presents a novel framework approach to collect and analyse user comments on social media, particularly comments on videos populated in a YouTube-like environment, in order to: Identify key concepts related to specific activity and determine individual viewpoints on the activity."
And the research question: research challenge: "Can digital traces from the social spaces be used to construct a model of the real-world activity and context, and how can this model improve adaptation in simulated environments for learning and enable intelligent content retrieval?"
I suggest to focus on the framework from the very beginning or to make clear that the paper addresses only part of the abstract research challenge.
3
3.2.1
"Sentence detection and splitting to achieve performance;" What do you mean by achieve performance?
"Sentence tokenization to detect words and punctuation;" Tokenization is clear, no need to explain it
Syntactic filtering – usually punctuation marks are removed during tokenization, isn't it the case also here?
Figure 3 is unclear
I suggest explaining the process described by figure 3 since the pseudo code is not self explanatory
3.3 "removal of punctuation" I thought punctuation marks were already removed.
I suggest accompanying the description in section 3 with a concrete example – from the very beginning until the end. This will help improve the readability of the section which is a bit hard to understand (especially section 3.3)
Section 4.1.1
At the end of the second paragraph there is a question mark, why?
"to make comments" -> to comment (?)
The first two sections describe what seems to be a generic interface while the third describes another (?) system developed to annotate YouTube movies (?)
4.1.2 –
Either list all categories of Wordnet you used or none why did you present examples (consider listing all of them and explain what guided the selection. Also all SUMO concepts (probably in an appendix).
Section 2- the validation of the categories and concepts by a single domain expert may be a bit problematic. I suggest that you consider using the Delphi method with more than one expert for the selection process in the future
4.1.3
Change the font to superscript when you point to footnote 6 (knowledge base number 2)
How complete is the ontology? How was it validated?
According to figure 1, I expected to find also a domain-specific knowledge base – something that limits the scope of the two general knowledge bases to the specific domain and gives some domain specific knowledge, but there is none.
The comment keeps popping up – please give all details, may be in an appendix, not just examples
From what population were the users selected at random? It looks like they were students with relatively (too) little experience. This is a pity.
Consider making the snippets available to the readers (may be the one used for the example?)
4.4
While the knowledge elicitation about discomfort-anxiety-negative emotion is clear, how did you get to "frustration" in the body language?
By the way, why do you have discomfiture (French) in the example and not discomfort?
linking to emotion – linking to emotions
5
conducted by experiences social scientists -> …experienced…
It is better if Table 11 will include a form that was already filled by an evaluator
Given the number of "I do not know" the evaluation is quite problematic.
Based on what do you think that the results is considered significant?
If you were following Dephi approach here – another session of discussion may have helped you reach higher level of agreement.
7
The work presented by the paper focuses on automatic annotation and enrichment of textual descriptions of videos. As such, related work regarding the use of social/semantic web for personalization/information retrieval does not seem relevant (about 1q2 page is dedicated to it). However, given the abstract goal of the research (that the current work is only part of), these aspects can be discussed, but to my opinion in a discussion section, not just as related work.
The same comment is true also for the personalized training section
8
"Towards exploiting digital traces from the social web spaces, to improve adaptation in simulated environments for learning, a framework to semantically augment user comments on an activity has been presented." – This is a long, complicated and unclear sentence
May be here is the place to explain how this work becomes a part of a larger research, aimed at personalized simulated training environment – in this case the section can be extended with parts of the related work mentioned earlier – this discussion may provide the necessary context for discussing the ideas and the potential of the work presented here.

