Review Comment:
This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.
The article deals with an important yet difficult issue, namely the use of direct spatial references for finding links between geospatial resources. It proposes a survey of 10 point set distance measures found in the scientific literature. The effectiveness of these measures for link discovery between geospatial resources is eventually assessed against a benchmark geographic dataset specifically created for this work. This systematic analysis aims at determining what measure should be preferably used for link discovery between geospatial resources.
The methodology applied for the systematic survey of the literature is clearly presented. The authors give the list of search engines, digital libraries and journals used for finding material on point set distance measures, the keywords applied on them for searching articles as well as the inclusion and exclusion criteria used for filtering the retrieved articles. The resulting list of distance measures is quite significant but seems incomplete: the article only focuses on distances between locations, although other geometrical properties like shape, size or orientation may be used to match geographical features (See for example (Arkin et al., 1991)). Besides, the retrieved distance measures consider geometries as ordered sets of discrete points and exclude distance measures handling geometries as continuous shapes (for example, the minimal distance between a given point and its projection on the exterior ring of a polygon). This is probably due to the use of a too restrictive list of keywords for articles search: why not using “geographical data matching”? The fact that geographical information science journals such as IJGIS, Transactions in GIS or Geoinformatica are not included in the list of surveyed journals is also surprising since this issue of geographical data matching based on vector geometries comparison has been addressed for many years by the geographical information science community. Unfortunately, this leads to a partial overview of the domain.
The article first explains very clearly the need for a direct spatial reference criterion for geospatial resources linking and describes how these direct spatial references are represented by the mean of vector geometries. The vector representation of Malta in 3 different datasets (Nuts, DBPedia and LinkedGeoData) is used to illustrate two common types of discrepancies between geographical datasets, here named measurement discrepancy and discrepancy in granularity by the authors. This shows well how heterogeneities between vector geometries may make link discovery difficult. However, this example provides an incomplete overview of geometrical heterogeneities that commonly occur between geographical datasets. First, it is assumed, although not explicitly stated, that all geometries are defined in the same coordinate reference system, namely WGS84. Besides, there are many types of geometrical heterogeneities caused by different levels of detail between geographic datasets that have been discussed more thoroughly by (Lemarié and Raynal, 1996), (Devogele et al., 1996) or (Devogele et al., 1998). As the list of research questions addressed in this article is based on this unique example of discrepancies, it remains too limited with regards to the announced objective of discussing what measure should be preferably used for geometry-based link discovery in any cases. Indeed, it is most regrettable that the use of different distance measures depending on the type, on the density or on the level of detail of the vector geometries to be compared is neither investigated nor discussed. Moreover, the combined use of several geometrical or topological properties is not covered, although it has been acknowledged as an efficient approach especially for link discovery between linestrings in network matching applications (Walter and Fritsch, 1999) (Lüscher et al., 2007) (Mustière and Devogele, 1998).
The systematic description of the retrieved distance measures in section 4 is quite clear, although the definition given for Hausdorff distance is in fact the oriented Hausdorff distance from S to T. Giving the distance value obtained from each measure on the Malta dataset example is also a good way to illustrate the behavior of each measure. However the value computed for the Fréchet distance seems unlikely with regards to the input geometries. As a matter of fact, when computing the discrete Fréchet distance with the open source GIS platform GeOxygene on WKT geometries described by points ordered in the same way and with the same coordinates as those presented in Fig. 1 for DBpedia and Nuts, I get approximately 34.6 km.
The experiments designed to answer the research questions stated initially are rather clearly presented. Although I agree with the authors on the need for benchmark geographic datasets, I am concerned that the modifiers proposed in this article may generate unrealistic datasets with invalid geometries which could lead to biased conclusions. As I understand how these modifiers work, the points used to describe polygons are discarded or displaced randomly, without any attempt to preserve neither the overall shape nor the topological relations of each original polygon. When applied on datasets representing a partition of space like the Nuts dataset, this may generate a topologically invalid dataset with self-intersecting polygons and sliver polygons. In such conditions, it is not surprising that the Fréchet distance, which is shape sensitive, may appear less robust than the mean distance against granularity and measurement discrepancies. Why not reusing simplification or displacement algorithms proposed in the field of cartographic generalisation for generating datasets with different levels of detail? (See (Regnauld and McMaster, 2007) for an overview of generalisation operators). Or more simply, why not using real datasets with different levels of detail, since Nuts codes could be used as keys for generating a reference link set? In addition, the proposed benchmark dataset, consisting only of polygons and points representing statistical units, seems too limited to draw valid conclusions for all types of geometrical link discovery use cases. For example, how would the presented distance measures behave on line-shaped road networks or on buildings modelled as polygons and points in dense urban areas? How effective would they be on heterogeneous land use classifications represented by polygons and requiring 1:n or n:m links discovery?
Finally, it would have been interesting to discuss the choice of the great-circle distance measure as a basic distance for all the evaluated measures. As a matter of fact, this distance assumes a spherical Earth and generates approximated distance values. Why not using a formula for distance on the ellipsoid as recommended in (Chrisman and Girres, 2013)? Or, assuming that distance errors would not affect link discovery results on a limited area, why not projecting first all data on a plane in order to work on Cartesian coordinates?
Arkin, E.M., Chew, L.P., Huttenlocher, D.P., Kedem, K., Kedem, K. et Mitchel, J.S.B. (1991). An efficient computable metric for comparing polygonal shapes. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991, vol. 13, num. 3, pp. 209-216.
Chrisman, N. and Girres J.-F. (2013). First, do no harm : Eliminating systematic error in analytical results of gis applications. In Proceedings of 8th International Symposium on Spatial Data Quality (ISSDQ'13), 30 May - 2 June 2013, Hong-Kong (China).
Devogele T., Trévisan J. and Raynal, L. (1996). Building a Multi-Scale Database with Scale-Transition Relationships. In Proceedings of 7th International Symposium on Spatial Data Handling, Delft, The Netherlands, M. Moleenar and M-J. Kraak (Eds.), Taylor & Francis, pp. 337-351.
Devogele T., Parent C. and Spaccapietra S. (1998). On spatial Database Integration. International Journal of Geographic Information Science - Special Issue: Interoperability in GIS, Taylor & Francis, vol. 12, num. 4, pp. 335-352.
Lemarié C., and Raynal L. (1996). Geographic data matching : First investigations for a generic tool. In Proceedings of GIS/LIS 96, ACSM Annual conference and Exposition, Denver, pp. 405-420, November 1996.
Lüscher, P., Burghardt, D. and Weibel R. (2007). Matching road data of scales with an order of magnitude difference. In Proceedings of the XXIII International Cartographic Conference, Moscow, August 2007.
Mustière, S. and Devogele, T. (2008). Matching networks with different levels of detail. In GeoInformatica, vol. 12, num. 4, pp. 435-453.
Regnauld N. and McMaster R. B. (2007). A Synoptic View of Generalisation Operators. In Mackaness, W., Ruas, A., Sarjakoski, T. (Eds.), Generalisation of Geographic Information: Cartographic Modelling and Applications, Chapter 3. Published on behalf of the International Cartographic Association by Elsevier.
Walter, V. et Fritsch, D. (1999). Matching Spatial Data Sets: Statistical Approach. In International Journal of Geographical Information Science, 1999, 13(5), p.445-473.
|