Thesis topic

Control of Smart Agents Expressions for Human-Agent Interactions and Extended Reality Applications

  • Type


One of the big pillars of the current industrial revolution is eXtended Reality (XR). This topic has attracted a lot of interest as much in academia as in the industry (Meta, Apple, Google, Nvidia, etc.), due to its potential socio-cultural and economical impact.


In the context of the collaborative Wal4XR project, which gathers 5 universities in Wallonia and Brussels around XR, this PhD thesis will focus on exploring systems capable of controlling the expressed emotion/mood of virtual agents in an interactive setup. The goal is to enable an agent to interact autonomously with a human user, and to adapt its behavior so as to match the mood/affective state imposed by another human monitor.


Interactive virtual agents are important tools for human-machine interaction, specifically in extended reality media, and they have been the topic of research for several decades. When dealing with an autonomous interactive agent, the agent must be able to correctly perceive user input and automatically respond to it in an appropriate way according to the agent’s purpose. The challenge here is to deal with not only verbal responses that are semantically well adapted to the input, but also non-verbal responses like laughter, nodding, etc. Including non-verbal expressions in a human-agent interaction will improve the experience for the user [1]. Work such as [2,3] explore initial solutions to persistent problems such as the controllability of the expressions generated, the accuracy of the response provided (in different modalities) and the match between the non-verbal expressions and the semantic text generated.


Concretely, this work will focus on AI systems for generating verbal and non-verbal expressions in several modalities (audio, facial expressions, gestures, etc.) autonomously in an interaction scenario with a user, following pre-defined affective states. The developed solution will be tested in a use-case experiment. The contributions of this thesis are 1) the joint use of verbal and non-verbal expressions for an autonomous interactive agent communication repertoire in an XR scenario, 2) the control of the expressed emotion or mood of the autonomous interactive agent. Indeed, although work exists on reactive agents, control over the emotional dimensions of the interactive agent remains a challenge.


[1] Deepali Aneja, Rens Hoegen, Daniel McDuff, and Mary Czerwinski. 2021. Understanding Conversational and Expressive Style in a Multimodal Embodied Conversational Agent. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 102, 1–10.
[2] Geng, Scott, Revant Teotia, Purva Tendulkar, Sachit Menon, and Carl Vondrick. “Affective Faces for Goal-Driven Dyadic Communication.” arXiv preprint arXiv:2301.10939 (2023).
[3] Wei Zhao, Peng Xiao, Rongju Zhang, Yijun Wang, and Jianxin Lin. 2022. Semantic-aware Responsive Listener Head Synthesis. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7065–7069.

About this topic

Related to
Thierry Dutoit
Kevin El Haddad

Contact us for more info