A new retrieval-augmented generation (RAG) approach for querying and constructing large-scale knowledge graphs

Tracking #: 3854-5068

This paper is currently under review
Authors: 
Nilay Tufek Ozkaya
Burak Yigit Uslu
Valentin Philipp Just
Tathagata Bandyopadhyay
Aparna Saisree Thuluva
Marta Sabou
Allan Hanbury

Responsible editor: 
Guest Editors 2025 LLM GenAI KGs

Submission type: 
Full Paper
Abstract: 
Large Language Models (LLMs) have demonstrated remarkable capabilities in extracting knowledge from and generating new content based on various types of resources, particularly text-based ones. Besides unstructured data, LLMs have also shown promising results when leveraging structured but semantically complex resources such as ontologies, schemas, and knowledge graphs. However, the practical use of large-scale semantic artifacts as direct input to LLMs is constrained by prompt size and token limitations. To address this issue, it is necessary to employ Retrieval-Augmented Generation (RAG) systems to preprocess and segment these large resources effectively. In this paper, we propose a novel RAG-based architecture, which includes LLM-based Named Entity Recognition and Disambiguation (NERD) and Entity Linking (EL) solutions, tailored for large-scale semantic artifacts, using OPC UA information models—an industrial standard—as a foundation. Within this framework, we implement and evaluate three distinct use cases that combine LLMs with the proposed RAG system: (i) semantic artifact validation, (ii) information retrieval, and (iii) information model generation. Each use case demonstrates strong performance, achieving F1-scores of up to 100\%, thereby validating the effectiveness of the approach. Furthermore, we evaluate the generalizability of the system across two different domains, confirming its robustness and applicability in diverse industrial contexts.
Full PDF Version: 
Tags: 
Under Review