Lifecycle Models of Data-centric Systems and Domains
This is a revised submission, now accepted for publication. The reviews for the revision are below, followed by those for the original submission, which was accepted with minor revisions, are below.
Review 1 by Tomi Kauppinen
The new version is sufficiently taking into account the suggestions I made in my review.
Review 2 by Todd Pehle
The clarifications from the first review appear to be sufficiently addressed so I accept as is. I'm still unsure about the image that was clipping some of the text in section 2.2. The author couldn't reproduce and it wasn't mentioned as an issue from other reviewers, so perhaps I have an outdated .pdf reader. I'll have to check on that!
Reviews for the original submission:
Review 1 by Tomi Kauppinen
As the submission is a survey article, the review is organized according to the criteria for survey articles listed at http://www.semantic-web-journal.net/reviewers.
(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.
The article introduces the topic by giving an overview of existing lifecycle models from the literature and by comparing them using what authors call the Abstract Data Lifecycle Model. This seems to be a good approach as it allows to discuss similarities and differences between existing models. As a result the phases and features of the examined models were summarized in Tables 1 and 2, and further explained in the text. Taking all this, the text serves well as an introductory text on the topic.
(2) How comprehensive and how balanced is the presentation and coverage.
Quite a few lifecycle models were presented in Section 2, and a reader gets a feeling of a full coverage of relevant models. However, proposals related to handling versioning, provenance and trust are also dealing with lifecycle of data and thus could have been included in the discussion or at least mentioned why they were left out. Otherwise a reader might be misled to believe that every possible aspect related to lifecycle of data is covered by this survey article.
(3) Readability and clarity of the presentation.
The text was mostly easy to read. One minor comment though: author sometimes uses "we" and sometimes "I" to refer to the author. I suggest author would harmonize this for the text to be more readable.
(4) Importance of the covered material to the broader Semantic Web community.
Taking into account the points related to (1) and (2) the article serves as a nice survey article on the topic and hence is important as such. What could be perhaps added would be a discussion of what is missing in the current lifecycle models (e.g. issues related to provenance, versioning and trust) and how/if lifecycle models should be enhanced in the future to serve better in creating linked data.
Review 2 by Todd Pehle
"It is therefore crucial to have a common understanding of where and what fixed points are in which discussions can be anchored":
Since this is primary purpose of the research, may use a bit of clarification on "fixed points". For example, fixed points with respect to? Or perhaps a bit more concrete description may be good.
2.2 Lifecycles in eLearning
Minor issue at least on the .pdf that I downloaded. The image in Figure 4 on page 4 clips some of the text making it difficult to read on this particular page.
2.3 Lifecycles in Digital Libraries
"...it becomes obvious that it isn't really a lifecycle model for data, but rather a lifecycle model for ontology...":
Some in the Semantic Web may consider ontology schema along with instance data to all be considered data. It may be worth pointing out how data-centric system lifecycles may vary or differ amongst instance data-based lifecycles, metadata-based lifecycles and schema-based lifecycles.
2.5 Lifecycles in Databases
Just as a potential suggestion, this section may benefit from expanding on other lifecycles in the database realm in addition to CRUD due to relevance of database data to the Semantic Web realm. For instance, the paper cites examples of ontology lifecycles. It may be good to also cite database logical model lifecycles and compare to ontology lifecycles. It may serve as a way to study lifecycle differences between data built for a single, closed world domains(DB) vs. multiple, open world domains(ontology). As such, the comparison itself may be outside the intended scope of the paper.
One other point I noticed as a reader was that at the beginning of the paper I presumed I understood what the term "lifecycle" itself means. As I came across the CRUD lifecycle example I realized I hadn't previously thought of CRUD as a data lifecycle, but instead more as operations on data. Hence, from my view at least, it may be good to cite the definition (if there is one) of a lifecycle. Perhaps this is best done in the introduction of the paper. For example, do they exhibit: temporal flow or state, specify an ordered set of tasks, mandate a beginning and an end, etc.?
3 The Abstract Data Lifecycle Model
"The alternative approach...would be to begin with the abstraction and then use it to classify a selection of instances...":
Just from a reader standpoint, I'd be curious to understand if this alternative top-down approach is not applicable, not correct or just simply wasn't selected as an abstract data lifecycle design methodology for this paper.
3.1 Lifecycle Phases
I like the identification of phases. I also wonder if phase state, inputs/outputs or relationships between phases should also be made explicit? Perhaps this is left as part of the definition of the phases themselves?
Where would data exploitation fit into the lifecycle phases? Is it in creation phase or refinement phase or other? It seems like a distinction should be drawn between knowledge acquisition of raw data and acquisition of knowledge based on data exploitation.
3.2 Distinction Data vs. Metadata
Referencing ontology "models" as data, is there or should there be a distinction made between instance data, metadata AND models?
3.4 Actor Features
Actor Humaness: Would a "Human and Machine" Actor also be needed? Or perhaps ADLMs can have exhibit both "human" and "machine" roles via multi-inheritance? I only mention this because many tools that produce data are classified as manual, fully automated OR semi-automated. However, perhaps this is not the intended granularity the author is seeking.
4.1 Semantic Web Lifecycle Phases
There's a small grammatical error (I think) in the phrase "Planning must precede any creation of refinement of data". I think it was intended to read "creation OR refinement" of data.
Does reasoning on the Semantic Web fit under Refinement? Since reasoning is discussed frequently in Semantic Web, it may be
worth dedicating a few sentences to the discussion.
Under Termination section, the sentence referencing "it is not possible...to completely terminate any piece of data..." brings up a good point and question that could be elaborated. Namely, does "data" in ADLM represent an instance of a single statement or multiple distributed serializations of the same piece of data?
Curious if there are any "Web Data" lifecycles that could be cited? Since the Web of Documents and Web of Data will co-exist in the same information space of the Web, it would be interesting to see similarities and differences between lifecycles for unstructured data vs. structured data. Perhaps this could also be reserved for future work.
I think the paper does a good job of crossing both a wide spectrum of specific data application domains as well as generalized data domains (instance, conceptual, metadata).
Curious if real-time data lifecycles exist or if there is are differences in lifecycle models for these types of domains?
The paper has good clarity and well written!