Using the Semantic Web in Digital Humanities: Shift from Data Publishing to Data-analysis and Knowledge Discovery

Tracking #: 2214-3427

Eero Hyvonen

Responsible editor: 
Guest Editor 10-years SWJ

Submission type: 
This paper envisions and discusses a shift of focus in research on Cultural Heritage semantic portals, based on Linked Data. While ten years ago the research focus in semantic portal development was on data harmonization, aggregation, search, and browsing (``1st generation systems''), the rise of Digital Humanities research is shifting the focus on providing the user with integrated tools for solving research problems in interactive ways (``2nd generation systems''). This trend sets new challenges for both computer scientists and humanist researchers.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 05/Jun/2019
Major Revision
Review Comment:

This manuscript does not seem to fulfill its role of a vision article. At its core, it discusses various iterations of Sampo systems co-created by the author, so it seems more of a historical overview (of one specific family of systems). The "new challenges" promised in the abstract, which would be the main focus point in a vision, are largely absent. They are enumerated as part of the conclusion/summary, but are so broad that they apply to any domain ("knowledge extraction, data visualization, machine learning, and knowledge discovery").

I fail to see how this will inspire readers, and as such am not sufficiently confident in its added value. So unfortunately, I cannot recommend acceptance.

Detailed comments:

- When mentioning things like "document centric models, such as Dublin Core and its dumb down principle", it is not clear whether you consider them (partial) successes or failures; this would likely be good to mention.

- From the abstract and 1.2 section, it is not clear whether the second generation is the future, or whether it is already happening (and to what extent).

- To what extent is the Sampo model new? The example about combining data from different sources sounds like early Web 2.0 ideas, which surely have their merit, but are not novel anymore. Do these belong in a vision article?

- The conclusion does not provide or combine insights, but is rather a summary with a very generic listing of future challenges that seem hardly specific to the digital humanities.

Review #2
By Rafael Goncalves submitted on 01/Jul/2019
Minor Revision
Review Comment:

The vision paper discusses the shift in focus of cultural heritage / digital humanities research from the so-called "first generation systems" that facilitate data integration and search, to the so-called "second generation systems" that provide users with tools for solving research problems in interactive ways (i.e., without (as much) need for technical expertise in the underlying data representation languages, standards, etc.).

Main comments

The paper is easy to read and overall it is well structured. At times it is a bit vague and would benefit from more concrete examples (see detailed comments).

Throughout the paper, "humanist research(er)" should be replaced with "humanities research(er)". I can be a humanist researcher and not a researcher of humanities.

It's mentioned that there's been a great deal of effort in developing languages and standards to facilitate data aggregation and integration, however, it's unclear from the paper (section 1.1) how these advances have played a role in cultural heritage research.

Generally I think the paper would benefit from focusing less on the Sampo series of services (Section 4) and more on the envisioned impact of the shift to 2nd generation systems; are we there yet? what will it take to get there? who can help, and how?

Detailed comments

L40: "From a SW research point of view..." - This sentence is too long and difficult to understand your point. Consider it breaking apart.
L43: "and structured data in different forms" - Examples would be helpful.

L34: "As a result" - Result of what?
L43: Consider explaining what CIDOC and LRM are; there's no description of them in the paper.
L4: "In the following" - In the following "section"?
L5: "will be 1st generation CH..." - Will be "referred to as"?

L29: "tools are not integrated with LD formats" - Sounds odd to say they're not integrated with formats. They might not 'support' those formats.
L31: "into forms required" -> into "formats"? What formats are those?
L1: Fig. 1 should either be better described in the text, or have a more self-contained caption explaining the figure.
L15: "solving DH research problems" - It would be helpful to have an illustrative example of such problems, and to hear how semantic web technologies help(ed) solve these problems.
L39: Could not access CultureSampo service -- server error. Unclear whether TravelSampo is online/working -- I couldn't find a link to the portal, only documentation about it.

L16: "mutually aligned metadata" - How were they aligned; automatically, manually? What specific standards were used for the alignments?
L19: "data is automatically linked" - How is this done automatically?
L23: "it can be enriched (linked)" - Again, how is this done? A more concrete example or scenario would be helpful here.
L38: "Sampo model... tested in a series of several practical case studies" - I suggest briefly describing these case studies to make the paper a bit more self-contained.

L50: "target group is analyzed" - What does this mean; how exactly was the group/data "analyzed"?
L1-2: "analysis in DH is done partly by machine partly by human" - What does each of these do/analyze, and how?
L7: "In statistic charts ... histograms are used" - Are used for what? I think the sentence is missing words / the intended meaning is unclear.
L21: "there are lots software packages" -> "there are a lot of".
L32: "and the problems are solved using faceted search" - It's unclear how these problems are solved just based on search; wouldn't a "1st generation system" suffice then?

L11: "some kind of commonness or average in them" - Dealing with the study of life histories, "average" sounds odd. What's the intended meaning here with "commonness or average"? Could you give an example?
L48: "finding interesting/serendipitous connections" - Interesting in what sense? What's the goal of this application?

L33: It would be helpful to have a description of these tools that solve digital humanities problems; how they work, what kinds of problems they can or help solve, etc.

Review #3
By Peter Haase submitted on 28/Jul/2019
Minor Revision
Review Comment:

The author describes the development of semantic technologies in cultural heritage, from first generation systems that focus on publishing, integration, and search, to second generation systems with a focus on interactive analytical tooling that support users in concrete research tasks.
The article is well written and interesting. It combines a general description of the history of the area with personal experience in the form of a concrete case study.

For a 10 year vision paper, there is relatively little focus on challenges, what is next, or visionary elements generally.
Perhaps there is room for adding some remarks in this direction.
As part of the discussion of the state-of-the-art / related work, the author might consider mentioning other visionary approaches in the digital humanities / cultural heritage, such as

Minor comments:

- please use consistent punctuation after references, e.g. “. [17]” “. [13]” (on page 3)
- “se-rendi-pious” (separation, page 3)