Review Comment:
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.
This paper describes an empirical analysis of user roles in ontology engineering projects on the web. Collaborative knowledge engineering on a large scale is gaining traction in recent years, as it is shown by thriving projects such as Wikidata, which makes this topic very important and timely.
The breadth of the analysis, which covers a large number of projects from the WebProtégé platform, makes this work useful and relevant in its field. Furthermore, it is well-written, pleasant to read, and clearly structured.
Nonetheless, a few points need still to be addressed, therefore we advise major revisions. We truly believe the work could have substantial impact for the community and the changes proposed would make it better.
In particular, the paper presents very promising results and, after raising some interesting issues that we would have expected to find addressed therein, it veers to an analysis of the transitions between roles. Furthermore, although the paper is intended to address the gap in the literature of large-scale studies of collaborative ontology engineering projects, the discussion falls short of explaining how its findings compare to prior works, e.g. whether they contradict or support them. More detailed comments are listed per section below.
We look forward to reading the revisions and responses of the authors.
Section: Related work
p. 3, Detecting user roles paragraph: This paragraph covers three aspects of your approach, the type of features used, the clustering algorithm, and the scale of the study. At the moment, these three aspects are covered together by listing a number of studies investigating user roles in ontology engineering projects. The rationale behind your choices is not clear.
In order to clarify this point, we would suggest to: 1. Separate the three aspects mentioned above (at least, make them emerge clearly from the text); 2. Explain why your choice (especially with respect to the clustering algorithm and the features) was suitable to your study; 3. Articulate the added value resulting from carrying out an analysis at a large scale.
Section: Materials and methods
The data collection and preprocessing is adequately described, the methods adopted are sound and sufficiently grounded in previous research. Some improvements could be made though.
p.3, 18-26: a screenshot of the WebProtégé would give those who are not familiar with its interface a visual reference. Furthermore, you could add a graphical representation of the relationships between projects, ontologies, and metadata.
p.3, 42: ‘high-level action’, it would be good to define what this refers to.
p. 4, Figure 1: this could be larger, spanning over two columns.
p.4, line 29: ‘we remove all projects with fewer than 250 total log entries’, why 250? How did you determine this threshold?
p.4, lines 33-36: ‘we define a lower threshold of two edit actions and remove all users that contributed only a single change to their project.’, this is a necessary step, considering the approach you follow. Casual users are often left out in similar studies. Nonetheless, some authors [1] include editors with a small number of contributions in order to take ‘marginal profiles’, such as occasional users and vandals, into account. It would be interesting to quantify these marginal users (comparing that to the total number of users) and provide details about the actions they perform.
Section: Results
p. 7, Figure 2: we understand the need to plot several variables, but 3D charts are hard to read. We would suggest to explore the possibility to visualise the differences between plots using a different approach.
p.7, lines 28-onwards: please add further details about how you have defined edit actions for each principal component to help someone reproduce what you have done. Moreover, the cluster would be more rigorously defined by applying a significance test to the edit actions of their members, e.g. a Tukey’s test.
p.9, lines 29-31 ‘Our results indicate that in 62% of all projects, there exists a single editor role which all users of the project assume.’, this is a very interesting insight and we would have expected some further analysis to understand its implications. A possible way to expand on the analysis could be to look at what being a ‘single-role’ project implies in terms of e.g. structural features of the resulting ontology such as average or maximum depth or inheritance richness.
p.9, Figure 5: The figure is hard to read, partly because of the overlaying cluster labels. One solution to that could be to make it larger in size, spanning over two columns and place the labels externally.
Section: Discussion and conclusion
Several works already exist around user roles in collaborative ontology development projects, some of which you cite in your work. You motivate your study with the lack of comprehensive insights about how the community at large works on ontology engineering projects in the wild. How do your insights connect to (and differ from) previous findings in the field? Without this framing in previous literature it is difficult for the reader to fully appreciate the contribution of your work.
The limitation section may be expanded as well: e.g. what does using k-means imply, compared to other clustering algorithms?
[1] Arazy, O., Daxenberger, J., Lifshitz-Assaf, H., Nov, O., & Gurevych, I. (2016). Turbulent stability of emergent roles: The dualistic nature of self-organizing knowledge coproduction. Information Systems Research, 27(4), 792-812.
|