ML-Schema: An interchangeable format for description of machine learning experiments

Tracking #: 2134-3347

Gustavo Publio
Agnieszka Lawrynowicz
Larisa Soldatova
Pance Panov
Diego Esteves1
Joaquin Vanschoren
Tommaso Soru

Responsible editor: 
Guest Editors Semantic E-Science 2018

Submission type: 
Ontology Description
In this paper, we present the ML-Schema, proposed by the W3C Machine Learning Schema Community Group. ML-Schema is a top-level ontology that provides a set of classes, properties, and restrictions for representing and interchanging information on machine learning algorithms, datasets, and experiments. ML-Schema, a canonical format, resulted of more than seven years of experience of different research institutions. We discuss the main challenge in the development of ML-Schema, which have been to align existing machine learning ontologies and other relevant representations designed for a range of particular purposes following sometimes incompatible design principles, resulting in different not easily interoperable structures. The resulting ML-Schema can now be easily extended and specialized allowing to map other more domain-specific ontologies developed in the area of machine learning and data mining.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Stefan Dietze submitted on 01/Mar/2019
Major Revision
Review Comment:

This ("ontology") paper describes "ML Schema", an upper-level vocabulary for representing machine learning experiments. The schema (specification from 2016 available at is the outcome of a community effort (W3C ML Schema Community Group) and is presented in this paper.

- Mostly easy to read and follow paper.
- Timely topic of relevance for the SWJ community.
- Some initial tool support available (e.g. export from OpenML)

Negative points:
- Schema appears to still lack maturity in parts (eg when it comes to a clear definition of concepts and a sound distinction between classes/subclasses and instances, i.e. TBox/ABox)
- It seems unclear if the schema is generic enough to capture all kinds of ML equally well (details below).
- Adoption is unclear: given that the schema is out there for three years and is a long time in the making, I was disappointed not to see any proof of actual adoption, e.g. data about real-world ML experiments being publicly shared using ML schema.
- Presentation is not ideal and raises questions (details below).
- No actual instances are provided (or KBs based on the schema) and it is unclear what kind of inference is meant to be supported by the schema (if any) and how it actually supports interoperability. Here, some real-world use cases/data and examples on how it facilitates certain competence questions would be useful.

Detailed comments:
The topic is timely and the presented approach of offering a top-level ontology able to link/bridge between different related vocabularies seems reasonable. The paper is easy to read and some tooling is mentioned, e.g. able to export OpenML data following the ML schema.

While the paper is reasonably well-written, the structure and lack of a general overview makes certain parts hard to follow. For instance, the set of properties in Section 2.2 are hard to understand and assess without a more general overview of the entire model and classes first. Some questions arise already here about the distinction between the schema and the instance-level: e.g. hasHyperParameters is defined as relation between an "implementation" (of an ML "algorithm") and its hyperparameter. First of all, the relation should be named "hasHyperParameter" (as it is instantiated for a single hyperparamter). Second, the distinction between "model" and "implementation" at schema- and instance level seems blurred. A model supposedly is the instantiation of a particular implementation (e.g. a trained random forest "model" for a particular task, using the Weka implementation). Here, an "implementation" (and/or an "algorithm") has hyperparameters ("number of trees") but the model itself would be associated with the instances of the hyperparameters (e.g. "number of trees=20"). How is this intended and what is the modeling approach behind it?

Typically, different "configurations" of models are tested (eg the same SVM model trained on the same data but with different hyperparameters or a model for the same task but using different variations of training data (eg balanced/unbalanced). Wouldn't it make sense to introduce the notion of a configuration here? Also, such matters are very different depending on the task type (supervised/unsupervised) and I don't see how these differences are accounted for by the schema.

A minor comment in this regard: "hasOutput" may also be confusing, given that here "output" refers to the model itself, but in traditional neural network settings, one would use "output" to refer to the prediction output of a model.

Also, what do you mean with "entities" in the description of the "hasQuality" property and with "information content entity" in the "specifiedBy" property? That also seems rather unclear.

A similar problem arises with some of class descriptions in the paper. Looking at Table 1 ("Task"), it describes as "example classes" different ML task types (classification, clustering etc). These would be *subclasses* I suppose? Then you describe an example instance ("Classification on Dataset Iris"), but at the same time, you refer to the OpenML "TaskType" where I suppose, instances of task types are the actual types (eg "classification", "clustering") and not the actual tasks ("classification of X"). In general, it is unclear in the tables what is meant by "relation with aligned ontologies" (what kind of "relation", equivalence?).

The same problem is apparent with the "EvaluationMeasureClass" (Table 8): the "subclass" examples are "ClassificationMeasure", "RuntimeMeasure" etc and the individuals are describes as "RMSE" etc. Wouldn't RMSE just be another subclass (a specific type of measure) and the instance would be "RMSE= 0.6"? The authors should be more clear here and better define the schema and also illustrate its use with examples. Atm, one can only assume that it's entirely thought through and instantiating the model will raise a number of questions.

When describing the "data" class, you define it as "a data item composed of data examples". What do you mean with data examples? Instances? What would be properties here? Aren't there vocabuluaries such as DCAT which could be used here in addition? IMO describing the dataset in a reproducable way is a huge challenge, but one could refer to VoiD or DCAT and the like and make sure that a URL/identifier is provided from which data could be obtained. This would be one of the most crucial properties and contributions for ensuring that the ML models actually are reproducable and understandable.

The "model" class states: "we define Model as a generalisation of a set of training data able to predict values for unseen instances". This is (a) unclear (what do you mean with a "generalisation of a set of training data") and (b) would cover certain task types (eg classification/regression) but not others (clustering) which are unsupervised. This has reinforced my overall impression that the schema is not as generic as it intends to be and may not cover ML in all its diversity. Wrt classification/regression, the model should be the output of a particular "run", which in turn is a "run" of a "data"/"implementation" combination.

Similar doubts apply to the "run" class. A "run" of a clustering "implementation" (say k-means) is a very different case than a "run" of a classification "implementation". One spits out clusters (i.e. the direct outputs), the other spits out a model from which to generate outputs (eg labels/classes). The schema seems not to cater for this kind of diversity. Also, and this seems even more crucial, it's not clear if the "run" here reflects the training stage (of supervised models) or the test/classification stage.

In Table 9 ("Study") it is also not clear what is meant here ("a collection of runs").

In summary, while I believe this is a worth-while effort, the paper (and schema) requires more clarity, important questions regarding the instantiation of the model should be addressed and examples of use should be provided, to illustrate its impact and demonstrate that the schema actually adds value to the problem of understanding, finding, interpreting ML experiments. Atm, this is not supported by the paper, even though the latest schema spec is out there since 2016.

I do hope you'll find this feedback useful.


More minor comments:
- Figure 1 is not very clear and not very well described.
- beginning of section 3.4 "a a prior...."

Review #2
Anonymous submitted on 22/Apr/2019
Review Comment:

This manuscript was submitted as 'Ontology Description' and should be reviewed along the following dimensions: (1) Quality and relevance of the described ontology (convincing evidence must be provided). (2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology.

Concerning relevance, this topic is very relevant because a common framework to represent machine learning experiments would be very useful for the community. However, concerning quality I see several drawbacks:
- The use cases are not such. I see theoretical use cases, no specific cases of usage. I miss real RDF examples used. Three use cases are presented in the paper: the first is focused on provenance, but no real use cases are provided. The second use case is focused on OpenML, where it is said that the OpenML site uses the proposed MLS ontology. I have double checked this without positive results. As fas as I can see, there is no such ontology neither uses MLS. For instance, I can get the RDF of the page (a flow) from my browser with

And this is the result:




The third use case is focused on deep learning, but there are no real cases, only a theoretical explanation and the mention of a possible extension of MLS.
Also related to the quality of the ontology, I have tested the proposed MLS ontology with Oops! ( and there are 4 important pitfalls.

Concerning "Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology" I would recommend a careful check of the paper to lint typos and confusing sentences. See below in typos section.

As it seems there are no real usages, my recommendation is to provide "converters". In the paper is mentioned an OpenML converter. Perhaps, with some usage examples we could see real usage cases. Another converters for MyExperiment or many other e-Science platforms would be valuable.

Concerning the structure of the paper, I consider that 5 pages in 11 for section 3 (alignment) is excessive. There is too much detail is section 3 compared to the lack of detail of the remaining sections. Additionally, I miss in section 3 an explanation about the reasons to choose the alignments in MLS, all the effort is focused on describing the alignment.

Abstract: "main challenge" --> "main challenges"
Intro: "for mapping of" --> "for mapping"
Intro: "For example, MLS ontology...the generated results". Rephrase. The references to figure 1 are unclear. Also the figure is unclear to me. The concepts "vertical/Horizontal interoperability are not described/referred".
Section 2.3.2: "used in OpenML". I do not see any reference to Exposé in OpenML.
Section3. The tables 1 to 9 have a header with "property" and "value". At first sight I confused "property" with the properties of the MLS ontology. I recommend to change it to "attribute" or even change the layout of the table.
Section 3.2.4: "Sharing the problem.... high levels of..." --> "Sharing the solution... uses levels of..." Or may be I misunderstood the paragraph.
Section 3.6: "Table 3.6.2" --> "Table 6"??
Table 10: In the column of OpenML there are several N/A that I do not agree: Software (e.g. weka), Implementation (weka version, algorithm version), Study (there are studies in OpenML, but perhaps are not referred in the RDF generated by the webapp), Data (also like Study), Feature, FeatureCharacteristics (in OpenML there are many graphs showing details of data, flows and executions).
Section 4.2: "contains millions of machine learning experiments". As far as I can see, there are 9 million executions. For me, an experiment is a task. There are 89 thousand tasks.
Section conclusions: "easily extendable". I would relax this sentence. There is no evidence of this ease.
Section conclusions: "We demonstrated" --> "We show".

Review #3
Anonymous submitted on 23/May/2019
Review Comment:

The authors present ML-Schema, which was proposed by the W3C ML Schema Community Group, as a top level ontology that provides classes, properties and restrictions for representing and interchanging information on ML algorithms, datasets, experiments etc. ML-Schema, a canonical format, resulted of more than 7 years experience at varied research institutions. In the paper itself, the authors discuss the main challenge in the development of ML-schema, which have been to align existing ML ontologies and other relevant representations designed for a range of particular purposes following sometimes incompatible design principles. The result is different not easily interoperable structures. The authors claim that the resulting schema can now be easily extended and specialized by allowing to map other more domain-specific ontologies developed in the areas of machine learning, data science and data mining.

Overall, I think this is an important resource and it has credibility, being proposed by multiple people spanning various institutions and also having the backing of the W3C. I also like how the authors compare the various aspects of the resource to existing ontologies and schemas in this space. My only criticism really is that the paper can read a little like a specification document at times, and does not seem to add much conceptual value beyond the resource itself. That being said, I believe the goal of a good resource paper is to serve as a sound description of the underlying resource and explain the challenges, relevance and relation to existing work, all of which the authors seem to have covered. Hence, my decision is an accept.