Abstract:
With the growing use of graph-based data on the web and concerns around the quality of published data, validating knowledge graphs has become increasingly important. The Shapes Constraint Language (SHACL) is a World Wide Web Consortium (W3C) recommendation to validate RDF graphs against predefined constraints.
Multiple SHACL engines have been developed that offer overlapping functionalities but also differ in several aspects (in terms of the data formats they can deal with, support of constraints and inference, reporting of constraint violations, and early detection of invalid entities, among others).
Some of these engines have been evaluated using performance benchmarks that rely entirely on partial or synthetic datasets, with little to no emphasis on conformance, which limits their applicability to full-scale real-world scenarios. Moreover, as application demands grow in terms of validation processing speed and quality, a good balance between efficiency and reporting correctness, completeness, and comprehensiveness has become critical. In this paper, we present the ERA-SHACL-Benchmark, a comprehensive benchmark for evaluating SHACL engines based on real data and shapes used by the European Agency of Railways (ERA) Register of Infrastructure (RINF) System . Our benchmark includes a suite of tests designed to assess engine correctness by comparing generated reports to expected outcomes, measure performance in terms of load time, validation time, and memory usage, and evaluate the completeness and comprehensiveness of the generated validation reports.