Public spending as LOD: the case of Greece

Michalis Vafopoulos
Marios Meimaris
Ioannis Anagnostopoulos
Agis Papantoniou
Ioannis Xidias
Giorgos Vafeiadis
Michalis Klonaras
Vasilis Loumos

Jens Lehmann

Dataset Description
The PSGR project is the first attempt to generate, curate, interlink and distribute daily updated public spending data in LOD formats that can be useful to both expert (i.e. scientists and professionals) and naïve users. The PSGR ontology is based on the UK payments ontology and reuses, among others, the W3C Registered Organization Vocabulary and the Core Business Vocabulary. RDFized data are linked to product classifications, Geonames and DBpedia resources. Online services contain advanced search features and domain level information (e.g. local government), simple and complex visualizations based on network analysis, linked information about payment entities and SPARQL endpoints. During February 2013, the growing dataset consists of approximately 2 mil. payment decisions valued 44,5 bil. euros forming 65 mil. triples.
Solicited Reviews:
Review #1
By Hugh Glaser submitted on 27/May/2013
The paper gives a comprehensive description of a comprehensive project to make Greek spending data available and useable as Linked Data.
It is well-written and readable, and extremely well-structured following the paper guidelines, making it easy to follow.
There is much of interest here, and the text raises few questions it does not also answer.
I have no negative comments, and note that I found many of the discussions, such as that concerning ontology choices, interesting.

I was left with a couple of minor things, such as a wondering about the effectiveness of the crowd-sourcing, and what is the architecture that requires the load-balancing.
One question I would like to see an answer for is about the errors detailed on page 8. Can the authors make some statement about how likely it is that there remain significant errors of this type after their cleaning?

Typos and smaller comments:
This is quite a long list, but not to be critical - simply to help the paper read more fluently.
You might want to say that PSGR is "Public Spending GReece", if it is?
Page 1
Para 2: has been -> was, in such -> to such an
Para 3: 5 states -> 5 describes, the section -> section, the last -> the last section
Para 4: in daily scale -> on a daily basis,
common English usage is to use "." for the decimal point, not "," (even though it is not standard)
mil. is not a common abbreviation for million:- either use 2M etc. or 2 million
similarly bil. is 44.5B or 44.5 billion
In the table, again, "," is used for the decimal point, but in this case is also used to group larger numbers; I suggest leaving "," for the grouping, and using "." for the decimal point throughout the paper.
Page 2
Para 5: it … payments -> the ability to discover and annotate individual payments is also supported.
Para 6: semestrial? I assume this means every six months, although it is not in my dictionary - you might like to make it clearer
Para 7: assesment -> assessment
Para 8: argue -> argues
Figure 1: clearly not readable at normal zoom levels, but readable by zooming in a long way - is this a problem?
Para 9: few -> a few
Page 4
Para 0: of an -> of: or could be an but not both, the most -> most
Para 3: submisison -> submission
Page 5
Figure 2: There is no reference to it from the text (as is conventional)
Again, Figure 2 is a little difficult to read at a normal zoom level, but may be OK
Para 1: deployed -> been deployed
Para 5: class in -> class in the
Page 6
Para 5: States follow -> States follows, way to -> way for
Table 2: until -> in
Para 6: associated to -> associated with, fullfilled -> populated or established (fullfilled -> fulfilled, by the way), to form -> so that the formation of, become -> becomes
Page 7
Para 0: they have been -> we have, lemma? - I'm not certain I can interpret this use as you intend
Para 7: pointed -: identified?
Page 8
Para 0: For being -> To be, the for -> for, irrelevant for -> irrelevant to

Review #2
By Prateek Jain submitted on 02/Jun/2013
Minor Revision
The work 'Public Spending as LOD: The case of Greece' provides information about a dataset which captures various aspects of money spent by various organizations in Greece. The source of the data is from a publicly available endpoint which provides information under CC-BY 3.0 license.

The work fits quite well for the CFP of the Special issue. More specifically the work provides information
for the different aspects related to the dataset as outlined in the CFP. I have a few questions related to the work and hopefully the authors can answer them in future revisions

1. Were there any interesting discoveries/discrepancies which were identified as a result of the project or the use of Semantic Web technologies? This will be very interesting and relevant for community in general. It will make the case more concrete wrt identifying the potential uses of the dataset.

2. The authors should give more details about maintaining the provenance of the data, especially if it is related to financial data. What kind of changes were made, why, who? These are all interesting questions and are very relevant for financial data.

3 Can the authors justify the use of CC-BY-3.0 for the purpose of this dataset? What was the motivation behind that.

Minor Comments:

1. Minor typos

Section 3.1, decisions can not -> cannot

Review #3
By Danh Le Phuoc submitted on 19/Jun/2013
Major Revision
The PSGR dataset is generated, curated, interlinked to distribute daily updated public spending data in LOD formats. It is used to build advanced search features and domain level information (e.g. local government), simple and complex
visualizations based on network analysis, linked information about payment entities and SPARQL endpoints. The dataset is interesting and useful for several potential applications. However, the paper provides to much
administrative/descriptive information other than expected technical details for a Linked dataset. Therefore, reviewer sugests the paper should be revised to as following.

Section 3 and section 7 needs to be rewritten to describe technical aspects of the datasets such as properties, links, data schema, ontology diagrams, how the data is mapped or created,…etc. The description with diagrams and
examples and modular description will be better to read the the dataset description. Please consult the papers of the previous volume.

Section 5 is not clear how the linked dataset is used to build visualization, it be more convincing and readable if it comes with screen shot to show examples how to the links, properties are used to created the advanced

Section 6 should be revised to show how it is done other than how it looks like and what functionalities supported. More technical details would be highly recommended.

Section 7 should include more figures about number links, classes, outgoing links, links to external data sources. The details how to link the data/resource nodes together other than only what types of links are created.