|Review Comment: |
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.
This paper focuses on the problem of crowd-sourcing a few of the sub-tasks in ontology engineering. Because little is known about the effectiveness of crowd-sourcing for ontology engineering, this is welcome and novel piece of research that contributes to the literature on methods for ontology construction.
The authors have clearly articulated three research questions: Which ontology engineering tasks can be crowd sourced? How should they be implemented? Can they scale? The authors present a good review of the prior work and suggest three broad areas: building vocabularies, aligning vocabularies and annotating data. Their own work falls in the category of building vocabularies.
They identify four major tasks that concern building vocabularies: verification of term relatedness, verification of relation correctness, verification of relation type, and verification of domain relevance. All four of these are micro tasks. Verification of relation correctness and relation type require a higher degree of knowledge engineering expertise than the other two tasks.
The authors have implemented a plugin in Protege to crowd source each of these four tasks.
Most of the features in the tool are straightforward and obvious. The feature to perform
recursive tasks takes advantage of the taxonomic structure of the ontology to efficiently
setup tasks to be crowd-sourced.
The evaluation is reasonably well thought out and uses metrics for time, cost, usability and the quality of the ontology produced. The authors use four small ontologies in the domains of finance, wine, tennis and climate change, and one large ontology for humans for their scaling experiments.
The results show that by crowd sourcing the authors are able to observe significant reductions in cost and time taken for domain relevance, subsumption correctness, and
instance of correctness tasks. The usability was observed at approximately 85%.
For the relation domain specification, the inter-rater agreement went down. This was
expected as this is a more complex task and the authors did nothing to ensure
good accuracy. For the scalability experiments, the authors were able to show similar improvements.
While I found the research reported in the paper to be systematic and thorough, but it is
still limited to very simple tasks. It is unclear how much fraction of the overall
ontology development cycle is devoted to each of the tasks studied. For example, if the
ontology relevance task takes only 1% of the overall ontology development time, and if
the authors even achieve 50% improvements, it makes little contribution to reducing
the overall cost.
The authors seem to be stuck in the old-fashioned paradigm for crowd sourcing where
each micro task is a low skill task. It is unclear if ontology development really
lends itself to such decomposition, and whether it will always require higher quality,
and perhaps, even paid crowd labor for the tasks. To see one example of going beyond
low-skill micro tasks, see the recent work on flash teams: http://stanfordhci.github.io/flash-teams/
The authors really need to be thinking beyond the micro task/low skill crowd labor model.
Given that ontology engineering will require higher level of skills,could they think of
ways in which some of the expertise required could be codified in the tool? or, the crowd
could be enlisted to go through some minimal training in exchange for paid work? One possible approach is to build in relation selection guidelines that provide a framework for
users for choosing relations. For example, consider the work reported in: http://cogsys.org/pdf/paper-9-3-17.pdf