Context-Aware Composition of Agent Policies by Markov Decision Process Entity Embeddings and Agent Ensembles

Tracking #: 3531-4745

Nicole Merkle
Ralf Mikut

Responsible editor: 
Agnieszka Lawrynowicz

Submission type: 
Full Paper
Computational agents support humans in many areas of life and are therefore found in heterogeneous contexts. This means that agents operate in rapidly changing environments and can be confronted with huge state and action spaces. In order to perform services and carry out activities satisfactorily, i.e. in a goal-oriented manner, agents require prior knowledge and therefore have to develop and pursue context-dependent policies. The problem here is that prescribing policies in advance is limited and inflexible, especially in dynamically changing environments. Moreover, the context (i.e. the external and internal state) of an agent determines its choice of actions. Since the environments in which agents operate can be stochastic and complex in terms of the number of states and feasible actions, activities are usually modelled in a simplified way by Markov decision processes so that, for example, agents with reinforcement learning are able to learn policies, i.e. state-action pairs, that help to capture the context and act accordingly to optimally perform activities. However, training policies for all possible contexts using reinforcement learning is time-consuming. A requirement and challenge for agents is to learn strategies quickly and respond immediately in cross-context environments and applications, e.g., the Internet, service robotics, cyber-physical systems. In this work, we propose a novel simulation-based approach that enables a) the representation of heterogeneous contexts through knowledge graphs and entity embeddings and b) the context-aware composition of policies on demand by ensembles of agents running in parallel. The evaluation we conducted with the "Virtual Home" dataset indicates that agents with a need to switch seamlessly between different contexts, e.g. in a home environment, can request on-demand composed policies that lead to the successful completion of context-appropriate activities without having to learn these policies in lengthy training steps and episodes, in contrast to agents that use reinforcement learning. The presented approach enables both context-aware and cross-context applicability of untrained computational agents. Furthermore, the source code of the approach as well as the generated data, i.e. the trained embeddings and the semantic representation of domestic activities, is open source and openly accessible on Github and Figshare.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 10/Oct/2023
Review Comment:

This paper undergoes a second round review. The authors have adressed properly my comments and in my opinion, also the comments of the other reviewer. As the original submission was considered as having potential and being a good fit for the journal, after implementing the requested improvements in the presentation and clarifying few issues the paper can be accepted.

Review #2
Anonymous submitted on 02/Nov/2023
Review Comment:

The paper presents an approach leveraging explicit knowledge of a domain-specific knowledge graph to train and simulate agents operating in the context of an arbitrary Markov Decision Process. A compelling experimental evaluation is provided in the domain of household agents.

The paper is a conglomerate of different techniques, from knowledge graphs to deep learning, and reinforcement learning with no small amount of engineering work on top. While many parts are similar to what was published previously, I believe the combination is novel and thus represents an example of standing on the shoulders of giants.

Significance of the results:
The proposed approach clearly outperforms the used baselines. Since it leverages formal, and possibly distributed, knowledge to achieve day-to-day goals, I believe it brings us closer to fulfilling the original goal of the Semantic Web. I thus think the paper has high potential to be highly cited and influential.

Quality of writing:
The paper is reasonably well-written and does a good job of guiding the reader through the complexities of the proposed approach. There are minor problems here and there that I list at the bottom of the review.

The provided resources seem to be well-documented, complete, and sufficient to reproduce the results presented in the paper. Since they are published to GitHub, they should remain available sufficiently long.

Minor remarks:
* Eq. (2): Replace 'x' with \times
* Eq. (5): HighBloodpressure -> HighBloodPressure
* Page 11, lines 21-22: it is unclear what "lambda" is in this context. I'd recommend removing it since it is just a confusing example.
* Page 15, lines 31-32: I think there should be a closing bracket after "subsequent state id"
* Fig. 4 has too low a resolution and thus looks bad scaled up so much.