The Semantic Web identity crisis: in search of the trivialities that never were

Tracking #: 2227-3440

Authors: 
Ruben Verborgh
Miel Vander Sande

Responsible editor: 
Guest Editor 10-years SWJ

Submission type: 
Other
Abstract: 
For a domain with a strong focus on unambiguous identifiers and meaning, the Semantic Web research field itself has a surprisingly ill-defined sense of identity. Started at the end of the 1990s at the intersection of databases, logic, and Web, and influenced along the way by all major tech hypes such as Big Data and machine learning, our research community needs to look in the mirror to understand who we really are. The key question amid all possible directions is pinpointing the important challenges we are uniquely positioned to tackle. In this article, we highlight the community’s unconscious bias toward addressing the Paretonian 80% of problems through research—handwavingly assuming that trivial engineering can solve the remaining 20%. In reality, that overlooked 20% could actually require 80% of the total effort and involve significantly more research than we are inclined to think, because our theoretical experimentation environments are vastly different from the open Web. As it turns out, these formerly neglected “trivialities” might very well harbor those research opportunities that only our community can seize, thereby giving us a clear hint of how we can orient ourselves to maximize our impact on the future.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Aidan Hogan submitted on 23/Jun/2019
Suggestion:
Accept
Review Comment:

The central position of the paper is that the Semantic Web research community should focus more on the research challenges that arise when putting the Semantic Web into practice, and less on trending topics that, perhaps, do not relate directly to the original vision of the Semantic Web.

The paper begins by putting forward the idea that the community lacks identity in terms of the research questions it addresses. Next the authors discuss how, despite detailed results in terms of the theory and practice of semantics, it remains unclear how "much" semantics makes sense for the Web. The next section discusses how the Web has been largely ignored, and that assumptions that hold for local experiments do not hold in the Web environment. Thereafter, the authors discuss the relation between the Semantic Web (Linked Data) and Big Data, arguing that while the former encourages diversity, the latter encourages uniformity, and thus loses some essential aspects of data. Addressing Machine Learning, the authors argue that both approaches (the inductive/imprecise paradigm of learning and the deductive/precise paradigm of logic inference) are complementary. Taking a step back, the authors argue that a lot of what the research community tends to "dismiss" as "engineering" actually would yield important research questions, and that the 20% left as an exercise for the reader is actually where 80% of the work may be required. The authors also call for further dogfooding, where we create useable tools we ourselves are willing to use.

I think that the authors enjoyed writing this paper, and this comes across in the lively discussion that gets to the heart of various philosophical issues with Semantic Web research. I think it meets a lot of the criteria one would expect from a position paper, stoking a conversation that seems important to have. In general I find myself agreeing more often than I disagree. Overall, I recommend an accept.

On the other hand, I do have some criticism of the paper, which is perhaps to be expected of a paper of this nature. Admittedly, given the (subjective/personal) nature of the paper, I am not always sure how to turn this into directly constructive criticism, but I leave the following remarks that perhaps the authors can take into consideration.

* My main criticism of the paper is that, in my opinion, it is too broad (or maybe better phrased as too "divergent"). Per the first paragraph of this review, it is not easy to summarise precisely what the common theme of the position is. The general thrust of the paper is that what the research community often dismisses as "engineering" actually is where the real challenges lie. But the authors also take various detours that are hard to connect with this central position, roughly summarised as a critique of the community's tendency to jump on bandwagons. As a critical paper, rather than focusing on one aspect and laying siege to it, the authors instead conduct a type of guerilla warfare, taking a couple of shots at one thing, before moving on to the next target and doing the same. As a result, the paper itself is perhaps, in my opinion, guilty of the same crime it accuses the Semantic Web community of: lacking focus/identity.

* The paper is quite critical of the community in certain aspects, but somehow avoids identifying or addressing head-on the main theme. Put another way, is it harmful for the Semantic Web to have these papers addressing Knowledge Graphs, or Blockchain, or Big Data, or Deep Learning, or "Descriptive Logics"? Of course not. But it may be harmful if this is all the community is concerned with. The real theme here is that there is not enough work looking at the challenges of putting Semantic Web "theory" into practice, on the Web. Instead of being direct about this, the paper comes across as somewhat accusatory in terms of certain topics being pursued within the Semantic Web communuty, which, in my opinion, borders sometimes on being pretentious (which perhaps the authors have licence to be in their position paper). The authors put a lot of focus in Section 6 on research that works in the real-world, for example, but such research is often forged from years of research that never worked in the real-world. I think a better meta-narrative would be to be more positive, to avoid taking shots at people interested in other aspects or forms of research, to say we need more work in a particular direction, and to lead by example (which the authors are in a position to do).

* The style in which the paper is written sometimes makes it unclear what the authors mean. The following are places where I really struggled to understand what the authors wish to say, and would have liked more clarity:

- "In search of the trivialities that never were" When I first saw the title I had no idea what the authors wanted to say. My best guess now is that it continues "that never were ... trivialities", but this best guess was not easy to arrive at. I like the title, but it's hard work.

- "have also frequently been labeled" By whom?

- "The major search engines backing Schema.org is illustrative of this fact, but also the increasing popularity of the shape languages ..." I don't really understand so well the relation between the two, or how shape languages illustrate a preference for "vocabularies" over ontologies.

- "Upon closer reflection, our fears about the Web" (what fears?) "are probably justified, our scientific conclusions and their presumed external validity perhaps a little less." I am a bit lost here. It seems like the authors are taking a shot at peer review, but I'm not sure what the target is precisely.

- "default semantics of simple SPARQL queries" I cannot really guess at what semantics the authors are referring to, or what they mean by "simple SPARQL queries".

- "'Linked' as bigger than 'Big'" I am a bit lost here.

- "Big Data solutions derive their strength from a rigorous, extensive schema ..." Again I disagree here, though maybe the authors mean something else by "schema".

- "highly normalised triple format" I do not follow what this means (specifically the word "normalised").

- There are various claims and sentences in the Big Data section that I either did not understand, or did not agree with, or both. The section could be revised for clarity.

- "Almond and Snips are directly ..." I don't know what (who?) these are.

- "Maybe this is the better way ..." I get the sense that there's some "meta-commentary" here, but I'm not quite able to understand what it is; the reference to reinforcement learning seems too specific in this context otherwise.

- "Credibility and fairness aside ..." Not sure what this is in reference to.

In general, I think the lively style is a plus for the paper, but I think the authors should be careful about sacrificing clarity for the sake of style.

MINOR COMMENTS:
- "let alone they would" -> "let alone would they"
- "t[h]reats to validity"
- "as [a] compl[e]ment to machine learning"

Review #2
By Armin Haller submitted on 28/Jul/2019
Suggestion:
Minor Revision
Review Comment:

The paper discusses an identity crisis that semantic Web research is facing when it comes to solving the remaining 20% of an engineering challenge and argues that it may as well require 80% of the time to solve this challenge. I couldn't agree more with the assessment, how often have I spoken to Web developers telling me that "they are not using RDF and SPARQL, because it is awful to program with." To which I often respond, that JavaScript is also terrible. But then again, while RDF/SPARQL have been developed somehow elegantly in a top-down fashion, JavaScript has largely been developed bottom-up and there is now a TON of tool support. The same can not be said of the semantic Web.

The authors rightly point to several programming issues that have not been addressed yet and discuss to quite some length some of the issues with Linked and at the same time (Big) Data. They point to some success stories where there is either a direct measurable benefit to developers (schema.org) or developers can restrict the expected data model to a particular graph shape (with SHACL and ShEx). Some deeper analysis of the potential leap forward SHACL and ShEx could enable for semantic Web engineering would have been beneficial.

I would have also liked to see a deeper discussion of which parts of our engineering stack are ready for prime time, and which parts of the software engineering challenges that existed forever have not been addressed properly yet, e.g. the lack of a proper rdf-to-object mapping (something that we saw in the early days with Sesame's Alibaba and some research attempts in Ruby on Rails), the lack of easy-to-use Web developer's instance viewing/editing tools for triple stores (similar to PHPMyAdmin for SQL databases), schema analysis tools for triplestores or Web annotation frameworks supporting RDFa and JSON-LD. However, there are potentially many more of these engineering challenges that the authors probably thought of, but have not necessarily spelt out. As another example, we as a community also don't have a ``linked data'' search engine or a widely-used all-encompassing ontology repository (beyond domain repositories) that would help developers identify ontologies and Linked Data that they can reuse.

Overall, this vision paper is a solid opinion piece that argues for a stronger focus on the more practical 20% of the theoretical problem that may as well turn out to take 80% of our time going forward. With a bit more explicit guidance what these challenges are, it would make even a stronger contribution to the vision issue of the SWJ.

Review #3
By Axel Polleres submitted on 01/Sep/2019
Suggestion:
Accept
Review Comment:

This is a timely and excellent contribution, fulfilling the purpose of the 10years-special issue: it is well written, provocative and - in many respects - spot on, arguing that the SW community should (re-)fosuc on what it is good at and on what is was founded for: the Web. The authors argue sharply and in an opinionated manner --> which is not bad for such a vision/position statement at all.

I have to admit I regret to not having read the paper earlier - I just gave a keynote at DEXA which made a couple of similar points at DEXA, cf. http://polleres.net/presentations/20190827DEXA_keynote.pdf (which I don't mean to be cited or anything, I think the article is mostly complete in what it wants to pursue, but maybe the authors may want to have a look) while coming from a slightly different angle - I think the community has achieved quite soem take-up, but maybe not as visible and maybe not in the open and decentralized way we wished for.

Anyway, there we need not to agree 100% and I think the conclusions are similar: there's a lot left to be done and to work on, to make the story continue... and here is where the authors might think of maybe adding a more conciliable end, i.e. highlighting/summarizing the open research questions and topics, and - as they suggest prioritizing them from their perspective. This is IMHO the missing part, where the current end of the conclusions comes across rather a bit too negative (while these open challenges *are* spread over the paper... I think it would be worthwhile/valuable to make them more explicit again in the conclusions.

I have a couple of editorial/detailed remarks, to follow, which should help the final version:

p.2 when you first mentione SHEX,SHACL, I thought it may be worthwhile to give references... think of the people not tied closely to our community - if you want to make the paper readable for them as well, you shouldn't expect them to know all acronyms of specs and technologies. Might make sense to read over the paper again with this in mind, which would make it more accessible.

p.3

and we risk baby being thrown --should be-->
and we risk being the baby thrown
or
and we risk the baby being thrown
?

ex Tela quod libet --should be--> ex falso quodlibet

p.4

"Linked Data" as bigger than "Big"
-->
"Linked Data" is bigger than "Big"

"Big Data solutions derive their strength from
a rigorous, extensive schema, which strongly contrasts
with rdf’s highly normalized triple format.
While there have been solutions that leverage Big Data technologies
to address rdf use cases such as querying [16], they
require reformatting data to fit the Big Data paradigm."

I think this needs to be donwtoned, as it's too narrow (doesn't apply to *all* big data technologies). Suggestion:

"Many Big Data solutions derive their strength from
a rigorous, extensive schema, which strongly contrasts
with rdf’s highly normalized triple format."
While there have been solutions that leverage Big Data technologies
to address rdf use cases such as querying [16], they
often require reformatting data to fit the Big Data paradigm."

p.5

"By keeping
data in millions of small personal data stores close to
people, we are in a much better position to safeguard
people’s most precious digital assets. The challenge
then of course is in connecting these distributed pieces
of data at runtime, which the Solid project [21] does
through Linked Data."

while I agree this is one challenge, which is interesting to work on, it should be mentioned that there are many more challenges here, whithin reversing network effects and/or providing appealing/convenient user experience, that additionally need to be overcome, not all of which technological. might be worth a mention.

As for the AI and ML section, my feeling was - honestly that this one got a bit dragged away an wasn't as clear and understandable as the other ones, I have a couple of remarks/questions there:

"Developing such approaches is crucial to reduce the high manual
currently required for participating in the SemanticWeb."
-->
"Developing such approaches is crucial to reduce the high manual effort
currently required for participating in the SemanticWeb."

"For instance, semantics and inference
can pre-label data that improve the accuracy of models"
-->
For instance, semantics and inference
can pre-label data that improves the accuracy of models" ???
I am not sure, I got what you want to say here? More details/reformulate?

"Or, post-execution explainability could be achieved by
reasoning over semantic descriptions of nodes."
--> what do you mean by a node here? not clear again what this means.

"Some more fundamental
questions also need to be answered, such as training
a model under the open world assumption."
Again, I do not understand what you mean here exactly by open world, can you be a mit more specific? Example? I mean, aren't most/all ML applications in AI learning from a partial observation of the works and generalise the models...?

"Semantic inference and first-order logic might lead to
less spectacular conclusions"
... less spectacular than what?

"Maybe this is the better way to position ourselves in one
of the next waves to come: reinforcement learning."
??? how to you get to reinforcement learning here now? I think you need to explain this jump a bit (it might actually be justified to pull this out of the drawer - referring to the 2001 SW article again and argue with agents needing to plan and expose behavior to act rational on our behalf, and that... and that the community went down that route already at some point (semantic web services) to some degree, but also forgetting about the premises: who would annotate/formally describe semantic descriptions of preconditions and effects of services?

p.6

In section 6, when you call for prioritizing, as mentioned, I would appreciate if you came up with an even subjective such prioritization.

Likewise here:
"It is impossible
to tell whether the remainder is trivial or not; and many
of the experiences above reveal that some of the most
complex research problems appear exactly there"

which are these complex problems exactly?
Again, I would find it valuable to have named them (in the opinions of the authors): which do you think are the hard nuts to crack? Open questions? What is the kind of research needed and viable?

"Such endeavours have not been attempted at the research level, let alone they
would be ready for implementation by skilled engineers."

FWIW, I think there *were* at least attempts in this direction, e.g. ActiveRDF
http://www2007.wwwconference.org/htmlpapers/paper272/index.html
and I think this was not the only one, i.e., to easily wrap RDf into dynamic typing programming languages for developers.

"We have been wrong before" --> concrete examples where?

"This brought us as a community into a disconnect with
the place where we can make a difference: the Web.
There, new technologies still emerge every day—just
not ours." --> this is something I am not convinced of
(we may disagree, which is ok). Web industry and big players have hired some of the brightest minds from this community and the shift away from strings to entities in Web search demonstrates that something has changed/made an impact. However: behind closed doors and maybe to a smaller extent and in the technologies being implemented differently/proprietarily and a different role than we had envisioned.

"To this end,
positioning semantic technologies as compliment to machine
learning is a necessity."
-->
To this end,
positioning semantic technologies as *complement* to machine
learning is a necessity.

Plus, as mentioned above, in the end, I'd be happy about a more conciliable, hopeful summary/end.
I think the paper - as it ends now, too much gives the impression pure rant (which it isn't but the abrupt end might be read as such.

HTH, Axel Polleres