SemanticTafsir: Building a Cultural Heritage Ontology and Knowledge Graph from the Quranic Exegesis of al-Tabari

Tracking #: 3884-5098

Authors: 
Amna Binte Kamran
Amna Basharat
Misbahur Rehman

Responsible editor: 
Guest Editors 2025 OD+CH

Submission type: 
Full Paper
Abstract: 
Tafsir, the classical exegesis of the Quran, represents a cornerstone of Islamic intellectual and literary tradition. Rooted in the teachings of the Prophet Muhammad and elaborated by early scholars, tafsir provides interpretive insights into Quranic verses through historical, linguistic, theological, and jurisprudential lenses. Among the most authoritative and influential works in this tradition is Tafsir al-Tabari, a comprehensive commentary compiled by Muhammad Ibn Jarir al-Tabari in the 9th century CE. Despite the foundational role of such works in the Islamic heritage, they remain largely underrepresented in structured, semantically annotated digital forms. This paper introduces SemanticTafsir, an OWL ontology and an RDF-based knowledge graph designed to semantically model Tafsir al-Tabari and support its exploration as a rich cultural and intellectual resource. The ontology captures the structural, thematic, and referential components of the text, including Quranic verses, layered commentary, embedded hadith, narrator chains, and interpretive themes. Developed using established ontology engineering methodologies, SemanticTafsir reuses and aligns with external vocabularies including SemanticHadith, Schema.org, and DBpedia to ensure semantic coherence and interoperability within the broader Linked Data ecosystem. Our core contribution lies in automating the semantic transformation of TEI-encoded tafsir manuscripts into a knowledge graph that preserves both the literary structure and scholarly nuance of the original work. The pipeline produces RDF representations that support advanced querying, cross-referencing, and thematic exploration, enabling users to navigate complex exegetical relationships at scale. We evaluate the ontology in terms of logical consistency, ability to resolve competency questions, and representational fidelity. The resulting knowledge graph is accessible via SPARQL endpoint and supports multilingual and semantically rich querying for scholars in Islamic studies, cultural heritage research, and digital humanities. By bridging classical Islamic exegesis with Semantic Web technologies, SemanticTafsir contributes to the digital preservation, accessibility, and scholarly engagement with a core component of global cultural heritage. The ontology and knowledge graph are openly available at: https://github.com/A-Kamran/SemanticTafsir
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Nicolò Pratelli submitted on 02/Jul/2025
Suggestion:
Minor Revision
Review Comment:

The article "SemanticTafsir: Building a Cultural Heritage Ontology and Knowledge Graph from the Quranic Exegesis of al-Tabari" introduces a significant contribution to the digital representation of Islamic intellectual heritage. It details the creation of an ontology and knowledge graph based on the tafsir of al-Tabari. The project transforms TEI-encoded manuscript data into an OWL-based semantic framework, making the rich exegetical tradition not only machine-readable but also interoperable within the broader LOD ecosystem.
This contribution is highly original. While previous efforts have focused on representing Quranic text or hadith individually, this initiative semantically model a tafsir corpus, specifically Tafsir al-Tabari. The ontology developed here successfully captures verse-level commentary, narrator chains, hadith citations, and thematic structures. Moreover, the resulting knowledge graph is accessible through a SPARQL endpoint, supporting complex queries and enabling new scholarly interactions with classical Islamic sources. The open-access nature of the ontology, the accompanying source code, and query examples on GitHub reflect a strong commitment to transparency and reusability.
The technical aspects of the ontology are rigorous. Built in OWL 2 using Protégé and tested with reasoners, the ontology exhibits logical consistency and sound modeling principles. It applies recognized ontology design patterns such as part-whole relations, n-ary structures, and enumerated value sets. Particularly notable is the reuse of SemanticHadith ontology for modeling hadith, along with alignments to Schema.org, Dublin Core, DBpedia, and Wikidata. These design choices ensure interoperability with existing cultural heritage datasets. Nevertheless, the article would benefit from more explicit discussion of the specific OWL 2 profile used (DL, EL, or Full), as this impacts reasoning capabilities and scalability.
Although the article is well written and structured, there are areas where greater clarity or additional detail would strengthen the work. First, the potential role of CIDOC CRM is notably absent. Given that CIDOC CRM is an ISO standard widely used in cultural heritage informatics to support interoperability, it could provide a valuable bridge between SemanticTafsir and other heritage knowledge graphs. Its mention would also allow readers to better situate this ontology among other domain-spanning efforts such as the Hypermedia Dante Network (HDN), which semantically models literary sources including Dante’s Divine Comedy. Given that the article references digital work on Dante, it seems particularly appropriate to include HDN as a more recent and relevant comparison than the brief reference to Dante in the introduction currently allows.
The article's discussion of hadith is also somewhat underdeveloped. While Figure 1 provides a motivational scenario involving hadith, the actual meaning, role, and relationship of hadith within tafsir are not well explained in the main text. A more robust explanation of how tafsir utilizes hadith, and how this is reflected in the ontology, would help readers unfamiliar with Islamic exegetical traditions. Similarly, Figure 2, which outlines the conceptual model, is not adequately explained. The color coding of classes and properties is not described either in the caption or in the accompanying text. Moreover, while the authors claim to use owl:equivalentClass and owl:equivalentProperty to align new ontology terms with external vocabularies like Schema.org, DBpedia, and SemanticHadith, the actual mappings are not explicitly illustrated within the article itself. Although these alignments are available in the supplementary material, the lack of a summary or mapping table in the main body of the text limits transparency. Including such a table, even in abbreviated form, would significantly improve clarity and support reusability. Listing the most critical classes and their external equivalents would not only showcase the ontology’s interoperability but also help readers understand how SemanticTafsir fits within the broader semantic web landscape.
Another area for enhancement is the modeling of narrator types. In Section 3.7, the ontology defines narrator types as individuals in a value set, such as sahabi and rawi. However, the article does not clarify whether these are linked to corresponding entities in Wikidata, such as Q17638669 for rawi. Including these links would enrich the ontology’s semantic network and allow for better integration with global knowledge graphs. It would also help address the typological ambiguity of modeling narrator types as individuals rather than subclasses.
The reasoning capability described in the article appears to be limited to ontology validation, not inference at the knowledge graph level. It would be worth exploring whether and how reasoning could be extended to the populated graph itself, particularly in support of inference-driven queries or consistency checks. The article states that multiple reasoners were used (HermiT, Pellet, FaCT++) but does not indicate which was adopted for production use or what reasoning profiles were tested. Greater clarity on this front would enhance the methodological robustness of the work.
Regarding evaluation, the article references a set of competency questions as part of the ontology design and testing process. While it is commendable that SPARQL queries were used to validate the graph’s expressiveness, the criteria for assessing the results, particularly in terms of accuracy and completeness, are not defined. Were human experts consulted? Were any benchmarks used? Providing even a brief explanation of the evaluation methodology would significantly enhance confidence in the conclusions drawn.
Figure 4, which outlines the knowledge graph construction framework, is a good example: it employs various types of arrows and visual markers to represent different processes or relationships, but these visual distinctions are not explained either through a legend or in the figure caption. As a result, readers are left to infer the meaning of directional flows, stages of transformation, or distinctions between components. Including a legend or providing a more detailed caption that explicitly clarifies the function of each visual element, especially the different kinds of arrows, would significantly improve interpretability.
A minor yet noticeable typographical issue appears in Section 3.7, where Figure 3 is erroneously referenced as “??” instead of by its proper number.
Looking ahead, the project’s future directions are promising and aligned with contemporary trends in digital humanities and knowledge representation. Extending the ontology to include other tafsir texts, developing a natural language interface for query construction, and aligning with additional Islamic knowledge domains (e.g., fiqh, theology) are all logical next steps. In particular, a focus on CIDOC CRM compatibility would significantly enhance the graph’s ability to interoperate with museum and manuscript data across disciplines. The potential for machine learning applications trained on the annotated graph is another exciting avenue, especially for automated tagging and content analysis of unstructured tafsir texts.
In conclusion, SemanticTafsir represents a well-constructed, original, and technically sophisticated effort to model Islamic interpretive literature using semantic web technologies. It bridges classical scholarship and modern informatics, providing scholars, educators, and technologists with a robust tool for exploring and preserving tafsir literature. With improvements in visual clarity, ontology mapping transparency, and methodological reporting, this project could become a foundational infrastructure for the semantic representation of Islamic knowledge.

Review #2
Anonymous submitted on 06/Oct/2025
Suggestion:
Major Revision
Review Comment:

The paper describes SemanticTafsir, an ontology designed to represent the structural, thematic, and interpretive features of classical tafsir literature. The authors detail the scope of the ontology, provide a set of competency questions, and describe how they have reused existing vocabularies and ontologies. The ontology is then used for the semantic transformation of TEI-encoded manuscripts into a knowledge graph, which is publicly available through a SPARQL endpoint.

The paper is very well written and tackles an interesting problem. The ontology is well motivated, well documented, and accessible through a persistent/stable URL (PURL), while a GitHub repository provides open access to the full source code and related files of the knowledge graph construction part. I found very interesting the section describing the analytical capabilities of the knowledge graph. It adds value to the paper and helps readers better understand the significance of this work.

In my view, the major issue of this work concerns the choice of vocablaries and ontologies, as well as the related claims about semantic interoperability. Schema.org, DBpedia, and Wikidata are not formal ontology standards. They are community-driven schemas with a strong focus on modeling Web information. Why not consider CIDOC CRM, which is an ISO standard ontology for cultural heritage documentation and is widely used for semantic interoperability in this particular domain? Given that the manuscript is about a "Cultural Heritage Ontology" it is surprising that the only standard ontology (ISO 21127) for the cultural heritage domain is not even mentioned in the paper. An explanation is at least needed to justify the reason for not considering this formal ontology.
This issue is closely connected with semantic interoperability: CIDOC CRM has been widely used for modeling cultural heritage and humanities data. Extending CIDOC CRM for your case would facilitate integration with other cultural heritage datasets and knowledge bases that also make use of CIDOC CRM.

Also, it is not very clear to me why previous works and existing models cannot be used to model tafsir literature. Which ontologies are used for modeling similar forms of data? What do they cover? Which of your requirements cannot be satisfied by these previous works? For example, in the introduction, the authors mention "...these efforts have not fully addressed the complexity of tafsir literature", but no further details are provided. A clear comparative discussion would help make clear the originality of your work.

Other comments:
- It would help providing a table containing the classes and properties used from other vocabularies/models (and a comment on how exactly they are used in your case).
- Section 3.7: "...is illustrated in Figure ??, as rendered in Protege".

Review #3
By Alessia Bardi submitted on 08/Apr/2026
Suggestion:
Accept
Review Comment:

This full paper describes an ontology for the semantic representation of tafsirs, that are
interpretations and commentaries of Quran. The ontology is used to generate a knowledge
graph for the Tsafir al-Tabari.
The ontology fills a gap in semantic modelling on islamic studies and heritage, as it is
the first ontology designed to model the tafsirs and their three interpretation layers:
legal, linguistic, theological.

The methodologies for the definition of the ontology and the construction of the knowledge graph are strong and sound.
They are clearly described and include details on design choices.
Authors followed best practices of Semantic web design and Open Science.

The quality of writing is excellent: reading is fluent, the paper is well structured and authors included information about Quran and al-tabari, which is very helpful for those who are not expert in Islamic studies.

The ontology is available at a purl URL, which is supposed to be stable and available in
the long-term.
Authors put all resources (ontology, data and code) in a github repository.
The README file is present and links to the different resources and entry points.
The knowledge graph is available via SPARQL endpoint on GraphDB and stored as data on the
github repository.
The provided resources appear to be almost complete for replication and reuse: the license
file is missing.

I suggest the authors to publish the data and the software also in Zenodo, to ensure long-term persistency and get DOIs as recommended by the principles of Open Science.

The supplementary material is a PDF with a description of the classes and properties of
the ontology. It looks redundant considering that the actual ontology is available online and
on the github repository. It can be removed, unless there is an explicit request from the editor.

Some minor corrections:
- page 2 row 9: I would not say they are “recent”, but rather “established”
- page 5 row 51: URL in note already in text. Probably the Github URL should be added here
- page 6 row 12: "is" instead of "s"
- page 9 row 46: fix reference to figure
- page 15 row 41-45: the major themes are listed twice in two close sentences.
- page 20 reference [33]: use the proper citation to the journal article instead of semanticscholar https://www.jatit.org/volumes/Vol96No3/3Vol96No3.pdf
- page 21 reference [68]: Use of suggested citation at https://oops.linkeddata.es/