Human Computation and Crowdsourcing meet the Semantic Web: A Survey

Tracking #: 1066-2277

Amna Basharat
I. Budak Arpinar
Khaled Rasheed

Responsible editor: 
Guest Editors Human Computation and Crowdsourcing

Submission type: 
Survey Article
Challenges associated with large-scale adoption of semantic web technologies continue to confront the researchers in the field. Researchers have recognized the need for human intelligence in the process of semantic content creation and analytics, which forms the backbone of any semantic application. Realizing the potential that human computation, collective intelligence and the fields of the like such as crowdsourcing and social computation have offered, semantic web researchers have effectively taken up the synergy to solve the bottlenecks of human experts and the needed human contribution in the semantic web development processes. In this paper, we present a comprehensive survey of the intersection of semantic web and the human computation paradigm. We adopt a two fold approach towards understanding this intersection. As the primary focus, we analyze how the semantic web domain has adopted the dimensions of human computation to solve the inherent problems. We present an in-depth analysis of the need for human computation in semantic web tasks such as ontology engineering and linked data management. We provide a 'collective intelligence genome' adapted for the semantic web as means to analyze the threads of composing semantic web applications using human computation methods. As a secondary contribution we also analyze existing research efforts through which the human computation domain has been better served with the use of semantic technologies. We present a comprehensive view of the promises and challenges offered by the successful synergy of semantic web and human computation. In conclusion, we discuss several key outstanding challenges and propose some open research directions.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 17/Jul/2015
Major Revision
Review Comment:

The paper presents a survey on solution that originates from the contamination of Human Computation, Crowdsourcing and Semantic Web.

= general comment =

The coverage of the state-of-the-art is adequate to a survey. The authors collected a large majority of the article published in the last 5-7 years in the Semantic Web area that uses Human Computation and Crowdsourcing techniques. However, I’m not satisfied by the result. Only readers, who want to check out a large number of short resumes of existing works, will find some value in this survey. If a reader is aware of the state-of-the-art this paper has almost no value because it lacks insight. I recommend the authors to take the opportunity to tell the readers what *they* have learnt by pulling together all this material. For instance, in Section 3, the authors did a tremendous effort in analysing the state-of-the-art, but where is the synthesis? What do they know now that they did not know before? Can they tell the reader something that was not stated before analysed just a fraction of the papers? The same applies to Section 4 and 5, the work is massive but light weight. The authors only express their criticism, but they do not provide insights that can lead the readers to broaden their understanding of the domain and perform further research. Also Section 7, where as a reader I was expecting to see the author’s point of view, I actually found a list of discussions and analyses of others, which the authors resumed.

= specific dimension =

== (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. ==

The authors make some inexact statements about the semantic web (see detailed comments), which may be misleading. I recommend changing them all.

The introduction to human computation and crowdsourcing is of average quality. I recommend increasing the level of formality. This is possible at least in the GWAP setting where concepts like input agreement, output agreement, and inversion problem are well formalised. The introduction to Semantic Web is probably not so important given the target audience; therefore, even if I found it to lightweight it is ok.

The insights of the authors are really missing together with some clear path for progressing in the field that only them (after having reviewed all this material) can provide. Details follow below.

== (2) How comprehensive and how balanced is the presentation and coverage. ==

The study is comprehensive. There are only few works that I know about not listed in the authors’ list.

== (3) Readability and clarity of the presentation. ==

The paper is readable and its presentation is linear

== (4) Importance of the covered material to the broader Semantic Web community. ==

The topic deserve a survey at this stage

= Detailed comments =
- page 1, left column: I would prefer to read a short introduction to 1.1. and 1.2 before starting 1.1
- page 1, left column: I disagree with the opening sentence of the paper: “after more than a decade of semantic web research, researchers remain challenged by the large scale adoption of the semantic technologies”. With and web data (RDFa, microdata, micro-formats) being adopted by most of the top domains of the Web, the Semantic Web can clearly say to have reach large-scale adoption.
- pag2 1, right column: there is a full-stop at new line.
- page 2, left column: I also disagree with the sentence “[semantic web vision], which was largely built on the vision of computers populating the web of machines”. The definition of semantic web, give by its inventors in their seminal paper on Scientific American Magazine [1], is: “The Semantic Web is not a separate Web, but an extension of the current one, in which information is given well-defined meaning, ***better enabling computers and people to work in cooperation***”.
- page 2, left column: the paragraph containing the sentence “In this paper, we not only take a detailed look at […]” should belong to section 1.2 where the authors present their contribution and not to 1.1 where they present the context.
- page 3, left column: before starting section 1.2 I would recommend to add a section describing the structure of the paper
- page 3, left column: the authors in section 2 claim that they want to describe the theoretical foundations of human computation, crowdsourcing and Semantic Web. I find the term “theoretical foundations” a bit strong for a section that stay on an abstract level, but lacks formality. I recommend not using this term.
- page 4, left column: Tim Berners-Lee is not the sole “inventor” of the Semantic Web; J. Hendler and O. Lassila were co-author of the famous Scientific American paper [1].
- page 4, right column: I disagree that Ontologies form the backbone of the semantic web. Ontologies (i.e., the formal and expressive ones used in knowledge intensive domains) have a little role in the semantic web. The RDF data model, the SPARQL query language, the Linked Data best practices and few lightweight shared vocabularies are the backbone of Semantic Web.
- page 4, right column: the crisp distinction between T-Box and A-box is typical of the knowledge representation community, the semantic web community has tried to soften the distinction. In RDFS and SKOS, which are both results of Semantic Web, such a distinction does not exist.
- page 5, right column: [35] : First, to -> [35]. First, to
- page 5, right column: I disagree with the way section 2.3.1 is presented. Of course humans are needed, but if I were the author, I would have argued about the lack of precision in system that try to automate the ontology engineering processes. In the way they wrote this section, it appears no effort was done to automate none of those processes. This is false. Human computation is useful when automated solution exists, but they lack precision. In this way, computers do the volume and humans improve the quality.
- section 2.3.1 lacks references. I recommend the authors to search for reviews, surveys and key papers on Ontology Development, Semantic Annotation, Ontology Learning, and Ontology Evaluation.
- pages 6-8: section 2.3.2 has the problem of section 2.3.1. Reading the section, one may get the idea that little effort exists to automate linked data annotation, production, quality assessment and query processing. This is not the case. The state-of-the-art is very rich. The automated systems may benefit from human computation and crowdsourcing in the same way image processing benefitted form the ESP game.
- page 7, end of the right column: I do not understand why the authors believe it is natural that query processing requires humans. If there is a task that can be fully automated on the semantic web is query processing.
- page 8, left column: I do not see any irony in the lack of interest in creating semantic content from the humans. So why do the authors use the term “ironically”? Moreover, bringing back a comment I did earlier in this list, how can the authors claim this in the times of, RDFa, microdata, … Publishers care about semantic content and since a couple of years we are witnessing a tremendous effort in annotating web pages with some form of semantic content. The authors could have observed this to be the most effective crowdsourcing effort for the semantic web, ever.
- page 9, left column: problem(task) -> problem (task)
- page 10, left column: the sentence “a generic approach to crowdsourcing approaches” can be reword without using the term approach twice.
- page 10, right column: the sentence “The classification is broadly classified” can be reword without using “classify” twice.
- page 14, table 2: what is an mGWAP?
- page 15, right column: investments(understanding -> investments (understanding
- page 16, right column: the sentence “that completion of a complete goal” can be reword without using the complete twice.
- page 16, right column: genome, however We have included -> genome, but we included
- page 17, left column: I’m not sure how I should read the sentence “Traditionally in the semantic web research much of semantic content creation tasks have been performed by experts.” This sentence could have been true in 2004, but since the rise of DBpedia, large part of Semantic Web data is crowdsourced without experts.
- page 19, left column:
- The authors say, “This leaves room for more studies to experiment and dwell further with more genes as presented in the genome.” Why shall this be true? Did other communities showed that it makes sense to do so? Where the usage of the other genes was successful? Why do the authors believe that bringing those experience in the semantic web domain can also lead to success?
- The sentence “One limitation that is felt […] of semantic content creation” is too long. I recommend to broken it down.
- page 19, left column, end of the page: if the authors found something so interesting to make they propose to change the “genome”, why haven’t they done it, yet? They did some changes, why not this one?
- page 24, right column: section 5.3 argues that the Semantic Web community missed an opportunity and that it should go and take the challenge, but it is unclear if any other community caught the opportunity and was successful. If that is the case, references are important and some insight about how to port those experiences to the Semantic Web would really interest the reader.
- pages 25-27, Section 7
- the first paragraph is too long as an introduction to the rest of the section
- many subsections are too short. A bullet list would have better served the authors’ need
- every subsection should end with the authors answering the following question: is the Semantic Web community in line with others in terms of finding? If not, does it make sense to try looking for similar findings? If yes, what’s the path the authors recommend to those that are willing to do this research?
- the references contains many errors:
- there are a large number of strange characters around
- references 5, 12, 14, 15, 33, 58, 60, 62, 63, 72, 73, 76, 82, 85, 88, 91, 94, 114
- in reference 17 the last authors is mentioned twice
- reference 46 misses a link to online publication
- in reference 49, the venue is not formatted according to the template.
- in reference 69, part Ii --> part II
- references 122 and 123 appears identical


Review #2
By Marcin Wylot submitted on 22/Jul/2015
Major Revision
Review Comment:

The article presents an overview of approaches from the intersection of two fields: Semantic Web and Human Computation. It analyses the approaches dealing with two questions: 1) how HC influences the SW, 2) how the SW influences HC. The problem itself is very interesting and emerging for both of the fields. The survey has a potential to be a good introduction to the intersection of those areas, and it is very relevant the Special Issue.

The authors provide a good motivation and background information to the problem in Section 1 and 2. The list of the approaches leveraging HC for the SW is comprehensive, and they are classified in multiple dimensions (task, genre, genome, etc.) in Sections 3 and 4. Following, the authors compare five of the systems in a more detailed way in Section 5. Section 6 aims to present the Semantic Web approaches facilitating Human Computation; this a bit vague, and should be extended. In Section 7, the authors attempt to highlight the prospective challenges and the research questions.

While I believe the article can provide a very good introductory text for new researchers at the intersection of the two fields, there is still a lot of work to be done (details below) for this paper to be ready for publication. Overall, the article is well structures at the top level, with some pitfalls at the lower levels (some sections are very short; details below). It is also very hard to digest, the sentences are quite long, and it is hard to follow the idea, so readability and clarity should be improved. There are many missing references, statements and assumptions without any support like a reference or an example (details below). In addition, there are assumptions made about the background of the reader that need be addressed; the reader is expected to know the surveyed approaches (details below). Moreover, some editorial work is also needed, as there are typos, capitalization of names issues, naming inconsistency (details below). The list of references has also a lot of incorrect characters. From my point of view, a major revision is required.

To improve the article, the authors can follow the detailed comments mentioned below, as well as work on similar issues throughout the article, due to the repetitiveness of many pitfalls I did not enlist all of them them here. The authors could also try to make the style lighter, so that an early PhD student does not get discouraged reading the survey.

Strong points:
S1) Has a big potential to become a good introductory text to get started on the covered topic.
S2) The survey appears to be quite comprehensive and suitable for the special issue.
S3) The approaches are classified along multiple dimensions.

Week points:
W1) Very hard to digest, convoluted sentences.
W2) Many missing references and statements without any support.
W3) Very demanding for the reader, it has to know the surveyed approaches to understand the points made by the authors.
W4) Some subsubsections are very short (one sentence, 3-5 lines). Section 6 is vague.
W5) Many typos.

Detailed comments

missing references, statements and assumptions without any support
Section 1.1
- Semantic technologies have been deployed in the context of a wide range of information management tasks, for which machine driven algorithmic techniques aiming at full automation do not reach a level of accuracy and reliability to ensure usable systems.
-- A source (reference) of this information is needed.

- Researchers have started augmenting automatic techniques with human computation capabilities in an effort to solve the inherent problems.
-- A source (reference) of this information is needed. Some examples would make it more credible.

- The challenge for the semantic web community, is to rethink the original semantic web vision, which was largely built on the vision of computers populating the web of machines[10].
-- It is unclear if the SW vision comes from the paper of A. Bernstein [10], or there is a missing reference to the original vision [2] by Tim Berners-Lee, James Hendler, and Ora Lassila.

- The entrance barrier for many semantic applications is said to be high, given the dependence on expertise in knowledge engineering, logics and more. In short, semantic web lacks the sufficient user involvement in various aspects.
-- A source (reference) of this information is needed.

- Semantic web research can be seen as experiencing a shift from increasingly expert driven to one embracing the larger community and the users involved in the semantic content creation process.
-- A source (reference) of this information is needed.

- Two major genres of research may be seen emerging in the last few years, in an attempt to bring human computation methods to the semantic web:
-- A source (reference) of this information is needed.

- While the potential is clearly evident in going about such a synergy, effectively realizing the synergy of semantic web and human compution will bear its own set of challenges.
-- A reference supporting the statement is needed.

Section 1.2
- As the primary focus, we analyze how the semantic web domain has adopted the dimensions of human computation to solve the inherent problems.
-- What kind of “inherent problems”? Reference, explanation, and examples are needed.

- the two most common genres in human computation namely Games With A Purpose (GWAP) and Micro-Task Crowdsourcing
-- References and explanations for the terms are needed. The explanations can as well be provided in Section 1.1.

- Recent research in crowdsourcing and semantic web has also seen the emergence of some workflow systems designed to meet the need of providing a generic framework for automating human-machine computation workflows.
-- Reference is needed. Which of “recent research”?

Section 2.11
- The problems fit the general paradigm of computation
-- -- Reference is needed. What is “the general paradigm of computation”?

Section 2.2
- Tim Berners-Lee envisioned a ’semantic web’
-- Reference is needed. [2]

- Tremendous amount of data is published on the Web according to the linked data principles.
-- More recent works focusing on such statistics are available [5, 7, 8].

Section 2.3.1
- A variety of tools are available
-- References and examples are needed.

- The notion of achieving an automated process of ontology evaluation generic enough to be applied across domains is hardly feasible
-- A reference supporting the statement is needed.

Section 2.3.2
- However the seamless consumption and integration of linked open data is challenged by the several quality issues and problems that the linked data paradigm is facing. As researchers remark, many of these quality issues are not possible to be fixed automatically rather, require manual human effort.
-- A reference supporting the statement is needed. Some specific examples would also be helpful.

- the LOD tends to emphasize the relationships and links between the entities, rather than classification of entities
-- A reference supporting the statement is needed.

Section 2.4
- After more than a decade of semantic web research, researchers remain challenged by the large scale adoption of the semantic technologies.
-- A reference supporting the statement is needed.

- content cannot be created automatically but requires to a significant degree, human contribution
-- A reference supporting the statement is needed.

- Research clearly indicates that combining human computation and semantic web is of mutual benefit to both domains
-- A reference supporting the statement is needed. Which research clearly indicates that? Why is it so clear?

Section 3.2.1
- Often, a common practice to allow assignments of the same task to multiple workers. Therefore the results may be aggregated using majority voting or other sophisticated techniques such as a probability distribution or by taking into account some estimate of the expertise and skills of the works.
-- Some references and examples are needed.

Section 3.1.3
- There are tradeoffs to both approaches.
-- What kind of tradeoffs? Who described them? Not specific enough. References are needed.

assumptions made about the background of the reader
Section 1.1
- 1) Mechanized Labour and 2) Games with a Purpose for the Semantic Web.
-- Assumption that the reader is familiar with those terms. A definition and a references are needed.

Section 1.2
- collective intelligence genome
-- A definition and a references are needed. The reference is provided too late.

Sections 3, 4, and 5
The authors talk about the surveyed approaches, but they assume the reader knows all of them. The reader has to read and understand all of the papers presented in the survey, before actually reading the survey. It would be helpful to have them briefly described in the article. Some level of technical details of the approaches could be provided.

Section 3.2.1
- PROTON ontology
-- An assumption the reader is familiar with the ontology. A reference and explanation are needed.

Section 5.2.4
- The specific research questions that the authors attempt to address
-- An assumption the reader knows the “research questions”. Details needed.

Section 5.2.5
- CrowdMap relies on CrowdFlower aggregation methods and uses Precision and Recall measures for evaluating the results.
-- Hard to understand if we do not know presented work. Details needed.

- Qualification questions were employed however according to [70] these did not affect the results much.
-- What kind of “qualification questions”. Again if we do not know the cited paper [70], it is difficult to understand.

typos, capitalization of names issues, naming inconsistency
Section 1.1
- computers or machines
-- What is the difference, how do you define the terms?

- semantic Web
--The authors use several ways to capitalize the name in the article: “semantic Web”, “semantic web”, “Semantic Web”.

First page, right column, there is a dot moved to the new line.

- ’The Global Brain Semantic Web’
-- Incorrect quotation marks [1] page 65. All quotation marks in the articles are incorrect.

- and the fields of the like such
-- Probably a missing word.

- compution
-- “computation” (?)

Section 2.1.2
- footnotes 1, 2, and 3 point to the same webpage, it seems it is by mistake

Section 2.2.2
I feel like the term of Linked Open Data is used in exchange with Linked Data, whereas it is not exactly the same [2,3, 4, 5, 6]

Section 2.3.1
- three key stages of Ontology Engineering stages described
-- One of the “stages” is probably redundant.

- Semantic Annotation Automation
-- a missing semicolon

Section 2.3.2
- doesnot
-- does not

Section 2.4
- useful semantic content as this content cannot
-- Probably one of the “content” is redundant.

- but requires to a significant degree, human contribution
-- significant degree of human (?)

Section 5.2.2
- illustrate some 14 distinct annotation
-- It looks like a typo, either “some” or “14”.

Section 5.2.3
- restrict ourselves to to combine
-- redundant “to”

- Dealing with Motivational, Cognitive and Error Diversity: Because people are involved
-- looks like a typo or some missing/redundant words

Some subsections are very short, it might be worth to either write more there or merge them. Sections: 2.11; 7.1.13; 7.2.1; 7.2.2; 7.2.5

Section 3.2.1
Some of the elements are described in a very vague way: Verification of Domain Relevance, Annotation of Text and Multimedia, Annotation of Web Content, Domain Specific Vocabulary and Relation Building

- It is obvious from Table 1 and Figure 5
-- Why is it obvious? The statement is quite strong, yet not specific enough.

Section 3.2.2
All of the elements described in the section could be more detailed. It is difficult to understand without knowing the previous works.

Section 4.3
- oftentimes
-- “The adverb oftentimes is an unnecessary variant of often. While using it is not an error, exactly, the word always bears replacement with the shorter word.” [9] There are more words like this one in the article.

Section 5.2.1
- some automated decision making is applied using probabilistic models to reduce the candidate mappings that need verification from the crowd
-- What kind of models? What “decision making is applied”? It is not specific enough.

Section 5.2.4
Some parts are vague: ZendCrowd, CrowdLink, CrowdTruth.

Section 7.3
- The interleaving of human, machine, and semantics even have the potential to overcome some of the issues currently surrounding Big Data.
-- What kind of issues? Details needed.

Tables and Figures
Titles of all the tables and figures could be more descriptive (self-contained), it is hard to understand it without analyzing the text of the article.

Strange characters in the references. Some titles have incorrect capitalization.

[1] Zobel, Justin. Writing for computer science. Vol. 8. New York NY: Springer, 2004.
[2] Berners-Lee, Tim, James Hendler, and Ora Lassila. "The semantic web."Scientific american 284.5 (2001): 28-37.
[3] Bizer, Christian, Tom Heath, and Tim Berners-Lee. "Linked data-the story so far." Semantic Services, Interoperability and Web Applications: Emerging Concepts (2009): 205-227.
[4] Heath, Tom, and Christian Bizer. "Linked data: Evolving the web into a global data space." Synthesis lectures on the semantic web: theory and technology 1.1 (2011): 1-136.
[5] Bizer, Chris, Anja Jentzsch, and Richard Cyganiak. "State of the LOD Cloud."Version 0.3 (September 2011) 1803 (2011).
[7] Auer, Sören, et al. "LODStats–an extensible framework for high-performance dataset analytics." Knowledge Engineering and Knowledge Management. Springer Berlin Heidelberg, 2012. 353-362.
[8] Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the linked data best practices in different topical domains." The Semantic Web–ISWC 2014. Springer International Publishing, 2014. 245-260.
[10] Bernstein, Abraham. "The Global Brain Semantic Web–Interleaving Human-Machine Knowledge and Computation." International Semantic Web Conference. 2012.

Review #3
By Deniz Iren submitted on 27/Nov/2015
Minor Revision
Review Comment:

In this paper the authors present a survey study which focuses on intersection of semantic web and human computation domains. They took a specific approach to systematically show how semantic web domain benefits from human computation, and how human computation is improved by using semantic technologies. This paper also accurately addresses challenges posed by the need of human intervention in semantic technologies. As claimed by the authors, this survey study has several contributions: a review of challenges posed by the need of human intervention in semantic web domain, a thorough analysis of how human computation was adopted by semantic web researchers and practitioners, and finally an analysis of how semantic technologies support human computation.
According to the authors’ refined classification of human computation genres are games with a purpose and microtask crowdsourcing. Human computation initiatives which can be classified into one or both of these genres, can be applied to semantic web, for ontology engineering and linked data management. Authors provide a comprehensive mapping between human computation applications and certain types of tasks which are applied in semantic web domain. They provide various examples for most cases of this mapping. The coverage of topics as well as citations to related works is adequate. However the manuscript seems to be prepared in late 2014 and has not been updated since. Authors are advised to do a brief update. Several recommendations can be found below.
Even though there are examples of both domains benefit each other, it is apparent that the number of cases in which human computation is used to assist semantic technologies is significantly higher. Understandably the authors focus more on utilization of human computing in semantic web.
The language of the manuscript is simple and highly understandable. Use of easy to understand figures and tables improve the quality of the paper as an introductory text for researchers and practitioners. The research approach and the design of this paper makes it easy to comprehend.
However, there are many typographical mistakes and several cases of inconsistent use of acronyms, terminology, bold fonts, uppercase/lowercase usage in section subheadings and keyword terms. Authors are advised to revise the manuscript thoroughly and with a help of a modern word processor. A list of errors was sent to the authors as well as the editor.
It is well known that human computation and crowdsourcing significantly aids semantic web research and applications. This paper provides a comprehensive list of human intervention needs in semantic technologies and a detailed mapping between various types of human computation applications and semantic web research. By doing so, this paper proves to be an important interdisciplinary contribution which may be useful for semantic web community.


Some other up to date papers which may contribute to this work:
Quality of human computation applied on semantic web:
Aroyo, L., & Welty, C. (2015). Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation. AI Magazine, 36(1), 15-24.
Applications on Linked Data Management:
Roengsamut, B., & Kuwabara, K. (2015). Interactive Refinement of Linked Data: Toward a Crowdsourcing Approach. In Intelligent Information and Database Systems (pp. 3-12). Springer International Publishing.
Acosta, M. (2015, May). A Hybrid Approach to Perform Efficient and Effective Query Execution Against Public SPARQL Endpoints. In Proceedings of the 24th International Conference on World Wide Web Companion (pp. 469-473). International World Wide Web Conferences Steering Committee.
Niche applications:
There are some applications which utilize semantic technologies to understand and categorize inputs made by crowd members. These inputs maybe in forms of social media posts, microbolog posts or basically inputs which have been somehow harvested from the crowd. There are several applications of such but here is an example for usage in a smart city research:
Bocconi, S., Bozzon, A., Psyllidis, A., Titos Bolivar, C., & Houben, G. J. (2015, May). Social Glass: A Platform for Urban Analytics and Decision-making Through Heterogeneous Social Data. In Proceedings of the 24th International Conference on World Wide Web Companion (pp. 175-178). International World Wide Web Conferences Steering Committee.

Review (fine detail)
[R01] – Page 1 – Paragraph 1
Extra space before dot (‘.’) at the end of the following sentence:
“The notion of ’The Global Brain Semantic Web’ [10] - a semantic Web interleaving a large number of human and machine computation – has come to be seen as a vision with great potential to overcome some of the issues of the current semantic web .”
[R02] – Page 2 – Paragraph 2
Space is missing between the word “indispensable” and the reference “[23]” in the following sentence:

“Humans are simply considered indispensable[23] for the semantic web to realize its full potential.”

[R03] - Page 2 – Subsection 1.2 - Paragraph 1
Typographical error in the following sentence. “by” instead of “be”
“These threads are considered useful for possible further investigation to be taken up be researchers.”
[R04] – References
There are some unrecognized characters in the following citations:
2, 3, 7, 17, 18, 19, 22, 25, 30, 33, 34, 40, 55, 62, 63, 66, 81, 86, 90, 93, 94, 105, 112, 114, 116, 120, 122, 123, 124.
Please check other references for characters which are not displayed correctly.
[R05] – Page 3 – Subsection 2.1.1
The reference 108 is not a dissertation. It is a conference talk, yet can be used as a reference here. Authors should either change the sentence or change the reference.
[R06] – Page 3 – Subsection 2.1.2 - Footnotes
Footnotes 1 and 3 should be corrected.
[R07] – Page 3 – Last Paragraph
Typographical error in the following sentence (“thse” instead of “these”):
“There are several fundamental issues in crowdsourcing, however, most prominent of thse are: nature of tasks that can be crowdsourced, reliability of crowdsourcing, crowdsourcing workflows.”
Also, authors mention fundamental issues, but the first and the last item in the list that follows are not phrased as issues. Sentence should be rephrased.
[R08] - Page 4 – Paragraph 1
Following sentence can be omitted as the same references were given a few sentences before.
“A number of surveys on existing crowdsourcing systems exist [24,59,117], which may be referred to for an in-depth analysis of the issues involved.”
[R09] – Page 4 – Subsection 2.2
Missing reference at the beginning:
“Tim Berners-Lee envisioned a ’semantic web’, capable of providing automated information access based on machine-processable semantics of data and heuristics that utilize this metadata.”
[R10] – Page 4 – Subsection 2.2
The word ‘aquisitioners’ does not exist in dictionary. Authors may consider using “acquisitor” instead.
[R11] – Page 4 – Subsection 2.2
Typographical error in following sentence (resaerch instead of research):
“…pioneers of semantic web resaerch claimed, was envisioned as the enabling force to build a brain of and for mankind...”

[R12] – Page 5 – Subsection 2.3 - title
Inconsistent usage of capital letters (lower case / upper case) in subsection.
“Human contribution in the process of semantic content creation”
This issue occurs throughout the paper. Authors are advised to review and correct inconsistencies in all subsection headings.
[R13] – Page 7 – Subsection titles
Inconsistent usage of bold letters in subsection headers.
“Linked Data Annotation and Production:”
Authors are advised to review and correct inconsistencies in all subsection headings.
[R14] – Page 8 – Subsection 3.1.1
Author’s name is written incorrectly in the following sentence:
“Siorpaes and Hepp [95] adopted the Lui von Ahn’s "games with a purpose" [107,109,110] paradigm for creating the next generation of the semantic web.”
[R15] – Page 9 – Paragraph 1
Authors should consider using “GWAPs” instead of “GWAPS”.
[R16] – Page 9 – Paragraph 1
Acronyms and abbreviations should be used consistently. Authors should either provide the meaning of the acronym in its firs usage and then always use the acronym, or not use the acronym.
This is an issue encountered overall the paper.
[R17] – Overall
Consistent usage of terminology, acronyms and abbreviations throughout the paper:
AMT or MTurk for Amazon Mechanical Turk
Mechanized labor, microtask crowdsourcing
HIT – Human Intelligence Task

[R18] – Page 9 – Subsection 3.1.2
Missing space between the word “problem” and paragraph ‘(‘ in the following sentence:
“…provides a platform where users (requesters) can post a given problem(task) that other users (turkers) can solve.”
[R19] – Page 10 – Paragraph 1
Authors should consider revising the following sentence to improve readability:
“In this section, we adopt a generic approach to crowdsourcing approaches inclusive of both types of tasks.”
[R20] – Page 10 – Paragraph 1
Inconsistent use of bold characters in subsection headings. Please see [R13].
[R21] – Page 10 – Subsection “Verifiability”
Authors should revise the following sentence and use a more suitable word instead of “requiring”. Perhaps “requesting”?
“Open ended tasks such as requiring the definition of a term or translation requires means to deal with iteration such as one discussed in [54,56].”
[R22] – Page 12 – Paragraph 1
Inconsistent usage of uppercase / lowercase characters in terms. This issue is encountered throughout the whole paper. Authors should revise all terms and write them in a consistent manner. Some of the terms are as follows:
Ontology Engineering, Linked Data, Linked Open Data, Linked Data Management, Human Computation, Collective Intelligence, Social Computing, …
[R23] – Page 13 – Paragraph 1
Authors should either provide a reference or a footnote in the following sentence, for the platform Silk:
“… human computation architecture that can be set-up by extending interlinking platforms such as Silk with direct interfaces to popular microtask platforms such as Amazon’s Mechanical Turk”
[R24] – Page 13 – Subsection: “Ordering or Ranking Facts”
The word “RISQ!” is written in capital letters but in the related reference ([115]) it is written in lowercase. Should be written consistently.
[R25] – Page 14– Paragraph 1
The word “quiz” should be written with lowercase letters:
“…approach to generate ground truth by requiring the players to answer questions on a Quiz”
[R26] – Page 15 – Paragraph 1
The reference “[67]” is repeated two times in a sentence:
“…in social computation [67], and demonstrate the importance and management of provenance of crowdsourced disruption reports [67].”
[R27] – Page 15 – Paragraph 2
The word “incorrecte” is written incorrectly:

[R28] – General: Figures
Figures have low resolution. In the version of the PDF that I have, some symbols cannot be understood and text is distorted. Authors should check the figures and if there is such a problem on their end as well, they should provide figures with better resolution