LinkedSpending: OpenSpending becomes Linked Open Data

Tracking #: 679-1889

Authors: 
Konrad Höffner
Michael Martin
Jens Lehmann

Responsible editor: 
Natasha Noy

Submission type: 
Dataset Description
Abstract: 
There is a high public demand to increase transparency in government spending. Open spending data has the power to reduce corruption by increasing accountability and strengthens democracy because voters can make better informed decisions. An informed and trusting public also strengthens the government itself because it is more likely to commit to large projects. OpenSpending.org is a an open platform that provides public finance data from governments around the world. In this article, we present its RDF conversion LinkedSpending which provides more than five million planned and carried out financial transactions in 627 datasets from all over the world from 2005 to 2035 as Linked Open Data. This data is represented in the RDF Data Cube vocabulary and is freely available and openly licensed.
Full PDF Version: 
Tags: 
Reviewed

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Juergen Umbrich submitted on 03/Jul/2014
Suggestion:
Accept
Review Comment:

Thanks for addressing the comments. I'm happy with the latest revision.

Just some minor issues which can be easily fixed for the camera ready version:

consistent writing of "Linked Data" ( Linked data vs. linked data, vs. Linked Data)
"Datasets can can submitted" -> duplicate can
Fig 2: coloured syntax does not really work here (numbers in URLs are coloured )
"1. New datasets are frequently added" -> add the average number of new datasets per day from the later section
Same paragraph ( p3 , 3.2: 1. [...] 2. [...]" -> would format it as list
Quality of Fig 3,6,7 are low on a print out and should be improved
Fig 4. remove vspace from caption

Question: p4 last paragraph: Why are the authors not performing a change detection on existing datasets if there can be minor changes in those existing datasets?

p7 from 0^22 up to 32 ? -> from 0 to 32 ?

Fig 8. the y scale of the fig goes only to 100, but the datasets consist of 627 datasets ? please fix the scale or explain

Review #2
By Andreas Hotho submitted on 25/Jul/2014
Suggestion:
Accept
Review Comment:

The author presents a way to convert the OpenSpending data into
RDF and make this data freely available as linked open data.

Given the new version and the answers to the reviews, I think that
the work makes a considerably progress. As far as I see all the
criteria for such a paper are fulfilled. I'm happy with the
current version and I have only minor comments.

I suggest to explain the differences between the 627 datasets in
the abstract and 732 in sec. 3. While later this will be explained
I suggest a remark in sec. 3 as the difference is not immediately
clear.

sec. 2: ... datasets Such ... --> a dot is missing

The sec. 2 motivation is new and quite nice but does not nicely
fits into the structure of the paper. I suggest to integrate 2
within the introduction.

Review #3
By Oscar Corcho submitted on 27/Sep/2014
Suggestion:
Minor Revision
Review Comment:

I acknowledge the fact that the authors have considered most of my previous comments and improved the quality of the dataset.

I have a major concern about the sustainability of the dataset, which I consider extremely important to be able to accept this paper as a Linked Dataset description: although the authors claim that they do a weekly update, the dumps are only made available for two specific points in time (September 2013 and March 2014). Why is that happening? What is going to be the sustainability of the dataset in the future? With the query at [1] (link to the results at [2]), I get that last updates were done in April, which is not so good.
[1] select distinct ?x ?y where {?x a ; ?y}
[2] http://bit.ly/1rkXsfQ

Before going further, I would really like to have an explanation of what will happen in this respect. This will be fundamental in order to know whether the paper can be accepted or not as a Linked Dataset description, IMO.

Now I will move into providing my review according to the three sets of topics that are consdiered for the Linked Dataset descriptions: (1) Quality of the dataset. (2) Usefulness (or potential usefulness) of the dataset. (3) Clarity and completeness of the descriptions.

(1) Quality of the dataset:
The quality of the dataset has improved since the last review that was provided, taking into account most of my comments. I enumerate some of them:
- The threshold for selecting or not a dataset has been updated and it looks now much more sensible and less ad-hoc.
- There is still the issue of considering whether properties with the same name in different datasets really refer to the same property. Some reconciliation has been done though in terms of time and geographical information, which is good.
- I am happy with how the aspect related to slices has been dealt with in the paper.
- I am also happy with the initial interlinking that has been done with LinkedGeoData. I understand that more work could be done in this respect, but I also understand that this would require a huge effort.

(2) Usefulness:
This is clearly a useful dataset. It has a strong dependency on OpenSpending, but given that this is a well-maintained project, it seems that the data will become more and more useful over time.

(3) Clarity and completeness
The descriptions are of good quality, and it is acknowledged that the source code is made available and documented.

Final set of comments to improve readability:
- In the second paragraph, the links to CORDIS or Greece public spending look very ad-hoc and weird. I would even suggest removing them as they are such a small set of data in the context of the whole dataset that is presented here, but I leave this decision up to the authors.
- I would suggest adding the namespaces in table 1 to services like prefix.cc, since not all of them are there. I would also recommend renaming the caption of table 1 to Namespaces and prefixes used in the paper.
- You comment in section 3.1 something that I do not understand: "Apart from the fixed data cube meta model, the structure of each dataset is completely up to the creator". I cannot understand this point and I suggest removing it since it does not add anything to the description. In fact, I do not agree with it unless explained differently. Or are you referring to a generic data cube model instead of RDF DataCube? It may be good to join then this data cube model with the RDF Data Cube model in section 3.1, so that everything is easier to understand.
- Be careful in general with cross-references to sections. For instance, in section 4 you refer to the RDF Data Cube vocabulary as being described in Section 3, but it is actually described in section 4 in the end.

Typos
- "datasets Such" --> "datasets. Such"
- "allows to serve" --> "serves"
- "can can" --> "can be"
- "adressed" --> "addressed"
- "skos:ConceptSchema" --> "skos:ConceptScheme"