S-Match: an open source framework for matching lightweight ontologies

Paper Title: 
S-Match: an open source framework for matching lightweight ontologies
Authors: 
Fausto Giunchiglia, Aliaksandr Autayeu and Juan Pane
Abstract: 
Achieving automatic interoperability among systems with diverse data structures and languages expressing different viewpoints is a goal that has been difficult to accomplish. This paper describes S-Match, an open source semantic matching framework that tackles the semantic interoperability problem by transforming several data structures such as business catalogs, web directories, conceptual models and web services descriptions into lightweight ontologies and establishing semantic correspondences between them. The framework is the first open source semantic matching project that includes three different algorithms tailored for specific domains and provides an extensible API for developing new algorithms, including possibility to plug-in specific background knowledge according to the characteristics of each application domain.
Full PDF Version: 
Submission type: 
Tool/System Report
Responsible editor: 
Jie Tang
Decision/Status: 
Accept
Reviews: 

This is a revised manuscript, which has now been accepted for publication. The reviews below are for the original submission.

Review 1 by Ming Mao:

This paper describes an open source semantic matching framework, called S-Match, which tackles the semantic interoperability problem by transforming tree-like data structures into lightweight ontologies and establishing semantic correspondences between them. The framework includes 3 algorithms to do basic semantic matching, minimal semantic matching and structure preserving semantic matching. The S-Match architecture also provides an extensible API for developing new algorithms and plug-in specific background knowledge, which brings in great flexibility to exploit different matching algorithms. As an open source ontology matching framework, S-Match will definitely lower the barriers for people to take the advantage of semantic technologies.

The paper is well-written, and logic is clear thus easy to follow. However it would be better if the authors describe more in details about how classifier and decider package work and explain whether two Oracles are needed in the architecture due to the performance issue.

Review 2 by Wei Hu:

In this paper, the authors introduces the overview of S-Match, which is an ontology matching tool that is continuously developed since several years ago. Currently, it is an open source tool and has proposed a lot of solid ideas for the ontology matching community.

In general, this paper is clearly written, easy to understand and has enough details. The reason why I gave a minor revision is that I expect the authors to add some comparison and citation to existing works.

(1) Please add a (brief) introduction to describe some performance (presicion, recall, run-time, etc.) on published benchmarks, such as OAEI. The purpose is not to compare with others, but at least can give users an intuition of the strength and weakness of the tool.

(2) Falcon-AO is also an open source tools under the Apache license. For the GUI, COMA++ gives a similar display panel. Although the intention of the paper is to introduce the architecture of S-Match, some important references should be highlighted.

My reviews according to each section is as follows.

In Sect. 1, since S-Match has been improved over years, so I would suggest the authors to add a very short description about the development history. For example, the year the project started.

In Sect. 2, when referring to the tree structure, can you deal with directed acyclic graph? Besides, please provide some example (tree structured) ontologies here? For example, UNSPSC, Yahoo directory?

In Sect. 3, I think citing the book "Ontology Matching" instead of [8] is better.

In Sect. 3.1, lack a blank between "of" and "humanistic discipline".

In Sect. 3.2, since considering the "more general", "less general", etc. relations, computing the minimal semantic matching is very important, which can prevent many trivial mappings in practice.

For Sect. 3.3, it seems that SPSM violates the natural heterogeneity of ontologies. Can you give an explanation on this?

For Fig. 5, what do "offline" and "online" mean?

I think that Fig. 6 is straightforward to understand. So it is not necessary to give such a detailed explanation.

Review 3 by Shenghui Wang:

This paper described the S-Match framework for matching lightweight ontologies which are transformed from taxonomies, catalogues, web directories, etc. S-Match is one of the mature ontology matching systems, openly available. The extensible API, command line and GUI interfaces provide the access to the users with different needs. It is highly valuable for the ontology matching community, as well as for the journal. As a system paper, it is clearly written with relatively detailed descriptions about the matching algorithms, the framework and different interfaces.

Comments:

1. One thing not clear to me is that how well S-Match performs in the standard OAEI campaign. The authors only mentioned they have been providing the datasets for the OAEI in the past 5 years, which is indeed rather valuable for the community, but does not strength the paper itself.

2. The input formats are tab indented and XML, while more and more data are in RDF triples (or even OWL). It would be more flexible to have extra components transforming standard Semantic Web formats into the format the system can work with.

3. I presume the matching is currently done only at the schema level. How about using the instance data in the matching procedure?

4. The literature review is very poor. With half of the references being self-citations, the authors only pointed to the general book by Euzenat and Shvaiko. It would be much more convincing if some similar systems are also introduced, preferably with performance comparison.

5. The example given in Section 3.3 (Structure Preserving Semantic Matching) does not show the advantage of the SPSM algorithm. Instead, the resulting mappings seem to be problematic to me. This SPSM algorithm may be more suitable for matching functions such as web service descriptions or APIs, as the authors claimed. It is better to have the corresponding examples to support this claim.

6. Listed in the project website, quite a few external projects have used the S-Match, although it is not clear to which extend the listed projects used or are still using the system. The paper would be much more strengthened if examples of usage are given.

Tags: 

Comments

Summary: The work presents a lightweight ontology matching system (S-Match). The work claims that the system can be utilized for matching various kinds of data structures such as directory structure, catalogs into light weight ontologies which can be utilized for matching using S-Match. There are three different matching algorithms which are available via S-Match namely basic matcher, minimal matcher and structure preserving matcher. S-Match tool provides a GUI as well as an extensible API for using the system. The tool is definitely NOT a prototype, as it scales fairly well with the size of the ontologies. Further, it is open source, fairly easy to use and available for download as well.

Comments: The paper is very well written and describes the various components of the system very nicely. However there are few comments, which can help in making the current draft stronger

1. For a systems paper, in my personal opinion its helpful if the authors can present an example showing the various steps of processing of the input. It helps in understanding the tasks performed by various components of the systems.
2. I have personally used S-Match and I think it is fairly sturdy and robust system to use. However, the one issue I faced with S-Match is the form of input which is expected by the system. The reliance on tab intended format, makes it difficult to use it without doing some pre processing on the input data. Besides, Protégé it is hard to identify any other tools which can be useful for conversion from standard ontology serialization formats (RDF/XML, Turtle) to tab intended format.
3. Similarly, the output produced by the system is not the easiest to process as it produces the output in a line by line format, using somewhat of a directory structure representation. Example,
Thing\Human\Female\Mother> Thing\Mother\GrandMother

While I agree with the authors, it is fairly intuitive to read and understand this for a human being, but it requires quite an effort if applications have to consume this form of mapping. In certain scenarios such as large ontologies, it is quite a task to process them manually. The future work mentioned by authors to convert the format of results to Alignment API is a step in the right direction.
4. Is there any plan to extend S-Match to use sources besides WordNet? WordNet though is a standard oracle for ontology matching tools, but probably it is not best suited in case of domain specific ontologies such as life science ontologies ? Further it does not really captures all the intended senses.
5. Minor point: I personally do not agree with the example presented in paper of car being semantically equivalent to an automobile. Probably it should be subclass relation.
6. Are there any thoughts about extending the GUI client (and or the CLI) to specify a cut off threshold for finding the matches? This can probably help in improving the precision of the results retrieved.
7. The authors have cited themselves quite heavily. 8/16 citations are self citations. Apart from one paper of WordNet, the remaining 7 are Ontology Matching workshop reports. While, I understand they are relevant, but at the same time there are work by other authors in same spirit, which can be cited. For example: http://semmf.ag-nbi.de/doc/index.html . I personally do not have a paper in this field yet and hence, I am not seeking citation of my own paper.
8. The authors can probably show via an example as to how the information related to path of node, plays a role in the matching process. This is an important component, which requires more details.
9. The authors probably need to include a discussion about the possible usage of the tool. On surface level, yes ontology matching is an application. Another usage, which the authors mention in passing, is Web service composition. But are there are any real use cases for the same? Any scenarios where the tools has been successfully utilized and is having a wide spread implication?

Dear Prateek,

Thank you for such a detailed comment. I would like to address some points of your comment below.

2. [cut]However, the one issue I faced with S-Match is the form of input which is expected by the system. The reliance on tab intended format, makes it difficult to use it without doing some pre processing on the input data. [cut]
This format is useful because it contains the only things which are actually used for matching and therefore it leaves no chance for confusion. Richer formats may contain other things (such as properties, or data types) which are not used, but their presence in the format may confuse some users into believing they are used.

In addition, this is arguably the simplest format possible, thus making it a good exchange format. It is a least common denominator among all the things S-Match is able to match (ontologies are not the only things out there). It also very easy to edit - a simple text editor is available everywhere.

Finally, in a recent release we've added OWL and SKOS readers into S-Match, see here:
http://semanticmatching.org/javadocs/it/unitn/disi/smatch/loaders/contex...
and here
http://semanticmatching.org/javadocs/it/unitn/disi/smatch/loaders/contex...

3. [cut]Similarly, the output produced by the system is not the easiest to process as it produces the output in a line by line format[cut]
The reasoning from 2. applies here as well.

[cut]The future work mentioned by authors to convert the format of results to Alignment API is a step in the right direction.[cut]
This is already implemented, see demos/alignapi folder in the distribution.

4. [cut]Is there any plan to extend S-Match to use sources besides WordNet?[cut]
S-Match is already allows you to use other sources, more information on that here: http://sourceforge.net/apps/trac/s-match/wiki/HowToUseOtherKnowledgeBases For example, there are other WordNet-like databases, like The Stanford Wordnet Project (although it contains some bugs in data files and the complete version is unaccessible). Moreover, there is GeoWordNet, which we are planning to make available in other formats.

In addition, S-Match contains easy to implement interfaces which allow you to plug in your own knowledge base:
http://semanticmatching.org/javadocs/it/unitn/disi/smatch/oracles/ILingu...
http://semanticmatching.org/javadocs/it/unitn/disi/smatch/oracles/ISense...

5. [cut]Minor point: I personally do not agree with the example presented in paper of car being semantically equivalent to an automobile. Probably it should be subclass relation.[cut]
We are working to make it easy to personalize a knowledge base.

6. [cut]Are there any thoughts about extending the GUI client (and or the CLI) to specify a cut off threshold for finding the matches? This can probably help in improving the precision of the results retrieved.[cut]
The S-Match already contains configuration system, which allow you to tweak every parameter of the matching, peek into s-match.properties - it contain usage comments. And it is described in the S-Match Manual: http://sourceforge.net/apps/trac/s-match/wiki/Manual

8. [cut]The authors can probably show via an example as to how the information related to path of node, plays a role in the matching process. This is an important component, which requires more details.[cut]
This is described in theoretical papers, see for example, Lightweight Ontologies section: http://semanticmatching.org/publications.html

9. [cut]The authors probably need to include a discussion about the possible usage of the tool.[cut]
One can find the list of projects where S-Match is used here: http://semanticmatching.org/projects.html In way similar to mentioned by you, S-Match is used in the OpenKnowledge project.

best regards,
Aliaksandr Autayeu