TheyBuyForYou Platform and Knowledge Graph: Expanding Horizons in Public Procurement with Open Linked Data

Tracking #: 2618-3832

Authors: 
Ahmet Soylu
Oscar Corcho
Brian Elvesæter
Carlos Badenes-Olmedo
Tom Blount
Francisco Yedro Martínez
Matej Kovacic
Matej Posinkovic
Ian Makgill
Chris Taggart
Elena Simperl
Till C. Lech
Dumitru Roman

Responsible editor: 
Jens Lehmann

Submission type: 
Tool/System Report
Abstract: 
Public procurement is a large market affecting almost every organisation and individual; therefore, governments need to ensure its efficiency, transparency, and accountability, while creating healthy, competitive, and vibrant economies. In this context, open data initiatives and integration of data from multiple sources across national borders could transform the procurement market by such as lowering the barriers of entry for smaller suppliers and encouraging healthier competition, in particular by enabling cross-border bids. Increasingly more open data is published in the public sector; however, these are created and maintained in siloes and are not straightforward to reuse or maintain because of technical heterogeneity, lack of quality, insufficient metadata, or missing links to related domains. To this end, we developed an open linked data platform, called TheyBuyForYou, consisting of a set of modular APIs and ontologies to publish, curate, integrate, analyse, and visualise an EU-wide, cross-border, and cross-lingual procurement knowledge graph. We developed advanced tools and services on top of the knowledge graph for anomaly detection, cross-lingual document search, and data storytelling. This article describes the TheyBuyForYou platform and knowledge graph, reports their adoption by different stakeholders and challenges and experiences we went through while creating them, and demonstrates the usefulness of Semantic Web and Linked Data technologies for enhancing public procurement.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Fathoni A. Musyaffa submitted on 21/Dec/2020
Suggestion:
Minor Revision
Review Comment:

Quality/importance/impact:
The paper describes the TheyBuyForYou platform and knowledge graph. It provides a specific use case of how knowledge graphs can be utilized for the public procurement domain. The use case can subsequently enhance the procurement data processing, enrichment and analysis pipeline. This can potentially improve public procurement transparency and accountability. In addition, a large knowledge graph is produced by the TheyBuyForYou project, in which some of the popular ontologies are reused to model the procurement data and related corporate information. The authors also work with both public institutions and private companies that adopt and evaluate their platform. Some evaluations for other aspects (e.g., data storytelling) are also provided along with the platform’s current limitation.

Readability:
The paper is rather easy to understand, however, there are some parts that need to be reformulated. It is often because the sentences were written in a lengthy manner within one sentence. For example:
- “These led to the emergence of national public procurement...” (Page 2)
- “Automatic storytelling technology available so far...” and subsequent sentences. (Page 14)
- “Results, see Table 2, were quite promising...” (Page 20)
- “During the data upload process, …” (Page 20, due to the relative clauses, brackets, and commas following the sentence), and likewise
- “As the tool is unable…” (Page 21)
Such sentences should be reformulated.

Additional comments/feedback:
- On page 6, it is stated that external vocabularies and ontologies are reused where appropriate. Are there any parameters to determine whether certain vocabularies and ontologies are deemed appropriate in this case?
- Figure captions could be made more explanatory to improve the readability. For example, the captions of Fig. 1 and Fig. 2 can be accompanied by short sentences regarding what the OCDS/euBusinessGraph ontology is for. The extra explanation would also be helpful for Fig. 7, explaining how the statements below each chart can be associated with the visuals of respective charts (the association is not yet intuitive).
- To achieve readability for a wider audience (e.g. government entities, mostly with a lacking background of data analysis), the charts and graph in Fig. 10 should be associated with the explanation provided on Page 18 (e.g., it is not yet explanatory which part of Fig. 10 a translates to the referred “large transaction” mentioned in Case 1 on page 18). It would also be interesting to show the result of D-Tree algorithms after the screenshot in Fig. 5 is run.
- Is it correct that the reconciliation API in Fig. 3 is not connected to another component (e.g., triple store, OC API, and OO API) in the architecture? This seems to contradict point 2 (reconcile suppliers) on page 9.
- (Minor) It would be great to make the mentioned technical tools explained shortly so that the paper is self-contained without having to open the links on the footnotes. For example, the mention of Velocity templates on page 9. Also, the way it is formulated leads to an ambiguous notion whether “allows specifying how the REST API will look like” is meant for Velocity templates or meant for the R4R tool.
- The description of the actor who performs the data ingestion process is missing (except for the data storytelling section). To what extent do the buyers/companies publishing the data are involved in the data ingestion pipeline? If they were involved, how far can they keep up with the learning curve of semantic technologies?
- An explanation regarding the JSON-XML-RDF pipeline might be missing. If the initial JSON data does not contain the hierarchy (as implied in point 4 of page 9), how was the hierarchy obtained by transforming the JSON data into XML?
- (Minor) On page 18, a pipeline for document processing is mentioned (col. 2 line 25). A graphical representation of this process would be helpful. Also, lemmatization is mentioned in the process. The explanation of how the lemmatization of different languages (non-English) is performed would also be interesting.
- (Minor) On page 21 (section 9), the terms semantic web, knowledge graph, and linked data are mentioned. For a wider audience (i.e., from a non-semantic web community who might be interested in learning semantic-based procurement tools), a brief explanation regarding those three different terms would also be helpful.

Review #2
Anonymous submitted on 09/Feb/2021
Suggestion:
Accept
Review Comment:

This paper presents the open linked data platform, called TheyBuyForYou, consisting of a set of modular APIs and ontologies to publish, curate, integrate, analyze, and visualize an EU-wide, cross-border, and cross-lingual procurement knowledge graph. Data dump of the Knowledge Graph (KG) in RDF format is published monthly since January 2019, see https://zenodo.org/record/4498267 from February 2021.

1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided).
It is a well structured and high-quality paper about the TheyBuyForYou platform.
The authors provide evidence for adoption (the Spanish company OESIA, the City of Zaragoza, the Italian company CERVED, the Ministry of Public Administration in Slovenia). A wider adoption - publishing open data about procurement by governments is expected in the following years. The paper lacks a discussion on new business models and the benefits for Europe as a whole.

Besides semantic interoperability, the paper showcases the use of knowledge graphs for advanced analytics. The consortium is committed to maintaining the KG in the context of already funded innovation projects.

(2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool.
The paper is well structured, with all elements of a tool/system paper.
Authors point to different initiatives that have been established to harmonize and increase the interoperability of procurement, corporate and financial data. The integration of KGs with other data sources (i.e. ingesting new data including OpenOpps and OpenCorporates) is already agreed upon.
The only concern is the large-scale adoption of the data, e.g. only 76 downloads of the knowledge graph dump for the whole period.

One typo - please put 'the' in front of Ministry of Public Administration (page 15, line 12; page 17, line 46).

Review #3
By Giuseppe Futia submitted on 06/May/2021
Suggestion:
Minor Revision
Review Comment:

The manuscript reports details on TheyBuyForYou, a linked data platform for constructing and publishing a KG in the public procurement domain. The platform consists of a modular architecture to provide advanced services built on top of the KG, such as anomaly detection, cross-lingual document search, and data storytelling. The KG is available using traditional REST APIs and SPARQL endpoints.

The manuscript is well written, the documentation available on GitHub is rich, and the impact is relevant for two main reasons. On the one side, the current implementation of procurement platforms takes into very little consideration specific aspects, such as software and data integration; (ii) on the other side, in many cases, they ignore government needs, including transparency towards its citizens. In my humble opinion, TheyBuyForYou is able to address both limitations.

I appreciated the heterogeneous nature of the stakeholders who have adopted TheyBuyForYou (two public administrations and two companies), emphasizing the platform's capability to enable and support business intelligence services.

The evaluation of the advanced services is well conducted, with interesting results. As a minor comment, I suggest that the authors further discuss the motivation behind the selected approaches. An interesting research line focused on adopting graph structures for anomaly detection (https://arxiv.org/pdf/1404.4679.pdf). In the case of cross-lingual document search, the recent development of Transformer architectures seems to be very promising (https://huggingface.co/transformers/multilingual.html).

To summarize, I believe that the authors' contribution is relevant and it is suitable for the publication in 'Tools and Systems Report' of SWJ.