A map without a legend: The semantic web and knowledge evolution

Tracking #: 2192-3405

Jérôme Euzenat

Responsible editor: 
Guest Editor 10-years SWJ

Submission type: 
The current state of the semantic web is focused on data. This is a worthwhile advance in web content processing and interoperability. However, this does only marginally contribute to knowledge improvement and evolution. Understanding the world, and interpreting data, requires knowledge. Not knowledge cast in stone for ever, but knowledge that can seamlessly evolve; not knowledge from one single authority, but diverse knowledge sources which stimulate confrontation and robustness; not consistent knowledge at web scale, but local theories that can be combined. We discuss two ways in which semantic web technologies can greatly contribute to the advancement of knowledge: semantic eScience and cultural knowledge evolution.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Armin Haller submitted on 31/Jul/2019
Major Revision
Review Comment:

The paper is a thoughtful opinion piece on the state of the semantic Web and its current focus on data rather than on knowledge. The author rightly argues that currently semantics on the Web are mostly used to help machines parse data, but rarely help make knowledge explicit or if it is made explicit it is not shared. Although the paper is well motivated and I couldn't agree more on its insight, in particular, in the second part it appears a bit like a laundry list of examples and state-of-the-art, but lacks a bit in clarity and a formal structure. Two example domains/aspects of knowledge sharing are discussed, i.e. how it contributes to improve scientific practices, and how knowledge seamlessly evolves and how this can be studied in an effective way. For both aspects current research and applications are presented, but what I really would have liked to see is a systematic analysis of what is currently missing on the semantic Web to build the 'legend' to our map. There are some discussions of issues, but they are not made explicit very well, in particular in the second aspect, they are not discussed at all until the very last paragraph. In the first aspect, there is some mentioning of the need of identifiers, the need of a search engine, the need to express evaluation methods, and the need to express datasets among other things. But it would be beneficial to discuss what is explicitly missing in current semantic Web research and solutions to address these and maybe other challenges for a web of knowledge. Missing for example, is a discussion how Wikidata fits in this picture. The author argues that Wikipedia is one of the wonderful and precious successes of the Web, but then argues that "knowledge does not have to be centralised: diversity is source of disputation and robustness". Wikipedia is certainly centralised (although it lives of distributed contributions) and there is no other significant encyclopaedia left on the Web. So why can't the same happen with Wikidata?
Also, there is other knowledge that is already made explicit on the Web, at least in the eCommerce domain where knowledge is made explicit on more than 40% of all webpages through schema.org annotations (largely for the purpose to increase the ranking in Google). Google is picking up this knowledge and at least partially re-publishes it through its knowledge panels and rich snippets, not in the form we would like, i.e. as RDF triples, but that could be discussed as one of the issues (it is briefly mentioned in the introduction that companies do such things).

In the second aspect, as mentioned, I feel the issues and challenges are a bit lost in the discussion. I, for example, fail to see what aspects of evolutionary computation and genetic programming can be applied to knowledge evolution. More detail here would certainly benefit the reader, e.g. what does the author mean with "Knowledge evolution can indeed be implemented as a mechanism which makes knowledge evolve seamlessly while it is used." How would that look like? What do we need for that in terms of ontologies, databases and software artefacts?

In the conclusion the author raises the challenges that we need to "complement [the semantic web] by explicit knowledge expression and sharing". What are the challenges in doing so are not entirely clear from the paper and it would be immensely useful if they are made more explicit.

Review #2
By Agnieszka Lawrynowicz submitted on 08/Aug/2019
Minor Revision
Review Comment:

The paper argues that Semantic Web should be based on knowledge more than in the current state, where it is based mostly merely on data (after a wave of interest in Big Data in research and industry).
The paper provides very nice cross-disciplinary motivation on why more knowledge-oriented approach is needed, discussing humanity as a species that is able to transfer and share knowledge between individual entities, using explicit symbols.
Then the paper points to that nowadays machines mostly process data to learn new knowledge often from scratch over and over, and this knowledge is not made available to humans in an understandable way, but actioned immediately, and then it does not allow for the body of human knowledge to grow.
This seems like a regression from what were the goals of the Semantic Web.
Then the paper also gives two cases on how sharing explicit knowledge helps to increase the common body of human knowledge: in eScience and in knowledge evolution inspired by experimental cultural evolution.

The paper reads well and I agree with the general theses of the paper.

Below, I mention some less clear or arguable issues that could be clarified in the final version of the paper:

*** Definition of "knowledge" ***
The central notion of the paper is "knowledge" and it would be very helpful to have some definition in the paper. Is it understood as "justified true belief"?
I am raising this issue since the paper discusses "knowledge" in many aspects, including also conflicting knowledge that can co-exist in the form of micro-theories.
The paper generally criticizes approaches for learning knowledge from data from scratch. But what kind of knowledge is it that is being learnt? I would guess that it might be "local knowledge" that is learnt by some machine learning algorithm for a limited scenario, concerning "dynamic" knowledge such as on whether to recommend some product to a particular customer or what will be energy demand for some household at the particular time.
When it comes to "global knowledge", then increasingly this is re-used by such, machine learning based approaches via feature engineering, and resources such as Wikipedia or knowledge graphs are increasingly used. I do hope that this trend is kept and knowledge is transferred.
Then, when knowledge evolves when used for a particular task in a "local" scenario, learnt from scratch, can it become obsolete for other scenarios on a more global scale?

*** Knowledge representation for machines ***
"The semantic web could be characterised by one of its early slogans: a web for machines."
I think it should be processable by both: machines and humans.
This may not be the only case of having the Web machine understandable when this understanding is given by human programmers who share common vocabularies, developed for common tasks, and then machines are expected to "understand" those and provide back human-understandable knowledge and results.
I can imagine the case, when some knowledge representation is understandable to machines, and in machine-to-machine sceanrio, and if the machines were autonomous, then when they evolve they could even develop their own language, understandable to them, but not really to humans?
This is not necessarily the case that when a language is easily interpretable by humans it is also better for a machine. Maybe just numbers would be easier for a machine to grasp?
Therefore I think, it needs to be more stressed that Semantic Web languages should provide a common platform to exchange and evolve knowledge between humans, between machines, and between humans and machines.

*** Evolutionary computation ***
The second case discussed in the paper (experimental cultural evolution) is very interesting and inspiring.
It is also less developed and clear than the first one (eScience).
The paper mentions here evolutionary computing and genetic programming approaches, where usually those approaches produce offsprings from their parents by artificial selection and mutation, which enables a population to evolve.
This seems not to be the case of cultural evolution. Though the author gives some hints how it differs, and that there is no inheritance used in cultural evolution, it would be very helpful to have more explanation on this topic and how it should work in the Semantic Web scenario.


A very nice piece of writing:

1. articulating very clearly the value of explicitly expressed knowledge, both in humans and machines, that can be communicated (as opposed to actionable but implicit knowledge that has to be relearned all the time).

2. unashamedly expressing the ambition that such explicit knowledge in format that are interpretable by machines can contribute to the next step in the knowledge ecosystem (storytelling, teaching, book writing, monasteries, universities, semantic webs)

3. using eScience as a good illustrator for what could be achieved (a good choice, because eScience is a field where more progress in "real semantics" has been made than elsewhere

A minor complaint would be that the final section on knowledge dynamics (and the role of evolutionary mechanisms in knowledge dynamics) is rather disconnected from the main thesis of the rest of the paper. The whole "in defense of explicit knowledge" argument of the paper could have been done without that final section.

Finally, I'd like to point out that Euzenat's whole argument about the value of explicit knowledge in a form processable by machines is also very relevant to the major debate that's raging currently in Artificial Intelligence: should we not just fully rely on statistical techniques that learn actionable patterns from data. This paper is a clear articulation of the viewpoint that the answer to this question is 'no':

"Nowadays, web users are not expected to provide knowledge, nor to access it. It seems that they are mere data provider, mostly through their actions, e.g. click, buy, like. These data are machine processable, but not open. They are kept secret, in silos, to the exclusive exploitation of a single organisation. They are processed by corporations which eventually learn knowledge from that data. But this knowledge, in turn, is not shared nor even prone to be communicated because not necessarily expressed in an articulated language. Instead, it is directly actioned. Hence, knowledge does not improve."

Amen to that.