Review Comment:
The article proposes an approach that makes use of a geographic reference dataset for matching (data linking) and visualizing thematic data described by heterogeneous spatial references.
The article first describes related work in instance matching and thematic mapping and then describes the proposed approach and defines it in terms of set theory notation. The proposed approach is illustrated with two thematic and open datasets of historical monuments in the city of Paris (France): the first dataset has direct spatial references in the form of coordinates (latitude, longitude) – from DBPedia, while the second has indirect spatial references in the form of addresses (literal values) - Merimee. The geographic reference dataset consists of a set of polygons representing individual buildings in Paris (BD_PARCELLAIRE), and a set of structured addresses, each georeferenced to a geometric point (BD_ADRESSE). All the datasets were converted into RDF and stored in local triple stores.
The article presents an interesting case study of georeferencing and geovisualization in the Web of Data. In my opinion, the strength of the contribution lies in the fact that this case study was done in the Web of Data (RDF), which brings a different set of challenges than a similar case study with a desktop GIS or online mapping tools; there are also unique (promising) opportunities for the future. These challenges and opportunities should be elaborated in the discussion. At the same time, it should be acknowledged that the georeferencing and geovisualization techniques in the case study are not novel, but draw on existing best practices in cartography, geoinformatics and geographic information science.
Detailed comments:
The title refers to ‘geographic reference data’, while the article in some places refers to ‘background reference geodataset’. Terminology should be used consistently, e.g. use ‘background geographic reference dataset’ only.
Geocoding is an important part of the approach presented in the article, but literature on this topic is not present in the Related Works section. This should be added. See for example, Goldberg et al. (2007) From text to geographic coordinates: the current state of geocoding, URISA Journal.
According to section 3, instance matching is usually based on measures that compare spatial references of the same type. This needs to be better qualified, as there are other ways of instance matching (some discussed in section 2), such as comparing property/attribute values or comparing descriptions instances, which could involve multiple attributes.
The proposed approach references DBPedia locations (coordinates) to buildings in the BD_PARCELLAIRE dataset (poylgons) based on shortest distance. Any reason why a ‘within building’ or ‘within buffer around the building’ was not done? The same question arises for matching BD_ADRESSE to BD_PARCELLAIRE. Since these are two official reference datasets, one would expect their quality to be such that there is a known relationship between a building and an address e.g. the address is within a building or at a specific distance/location from the building. Using the shortest distance for matching needs to be explained/justified.
Section 5 describes visualization of the results to illustrate the usability of the proposed approach. A large part of this section explains current knowledge, e.g. descriptions of grouping and amalgamation, and the algorithm for feature amalgamation. While this article presents an interesting new case of amalgamation in the Web of Data, amalgamation itself is not new. The focus should shift to the challenges and opportunities of doing amalgamation in the Web of Data, rather than an explanation of amalgamation itself. Also, can anything be said about the performance? Was the visualization produced in an acceptable time period?
A major shortcoming of the article is that there is no discussion of the results. For example, would this approach be generally applicable to all kinds of datasets and all kinds of spatial references? The approach is based on the assumption that points (very specific) can be generalized to polygons (larger area). This works for locations and buildings, but what about other datasets? Or if a polygon dataset is not available?
The results of the proposed approach depend on the quality of the datasets that are used. For example, if more of the addresses in the Merimee dataset were incomplete or invalid, this would have resulted in poorer/better matching. The same applies to the coordinates in the DBPedia dataset. This needs to be acknowledged when the results are discussed. It should be recommended that future work should tests the approach against larger datasets of varying quality – to evaluate the time performance, as well as the quality of the results.
The approach is tailored to specific datasets. For example, in other countries/regions, dataset of building polygons exist with attributes of the building address. In such cases the approach could be simplified:
Merimee → BD_PARCELLAIRE
DBPedia → BD_PARCELLAIRE
Such assumptions and limitations of the approach (pointed out above) should be thoroughly discussed in a separate discussion section. The discussion of the results should also emphasize the contribution of the article to data matching and geovisualization in the Web of Data specifically.
Language
The language in the article is generally acceptable, however, I found it difficult to follow the flow in some parts of the article. For example, the references to different matching tasks and sub-tasks in section 4.4 are difficult to follow. Consider adding an overview of tasks or assigning unique names to different tasks to make it easier to follow (e.g. by adding matching task numbers or unique names to Figure 3). This should also be applied to other parts of the article to improve its readability.
There are some typos and grammatical errors that need to be corrected.
|