Thesis topic

Interaction With Smart Virtual Agents

Type

Doctorate

Description

The work will focus on systems capable of controlling the emotion/mood generated by the virtual agent and perceived by the user. To this end, it will explore methods for end-to-end systems for reactive agents using multimodal data as well as verbal and non-verbal expressions in parallel. The contributions of this thesis are 1) the joint use of verbal and non-verbal expressions in an XR scenario, 2) the control of the emotion or mood generated. Indeed, although work exists on reactive agents, control over the emotional dimensions of the interactive agent remains a challenge.

Virtual humans are an important tool for human-machine interaction in extended reality media and have been the subject of research for several decades. When dealing with an autonomous agent that is expected to perform a task for which it has been designed, the agent must be able to correctly perceive user input and automatically respond to it in an appropriate way that depends, of course, on the situation and the application. The challenge here is to deal with verbal responses that are semantically well adapted to the input, such as LLMs like Chatgpt (with text only), alongside non-verbal responses (laughter, nodding, etc.) that improve the perception of interactions and make them more realistic for the user [1]. Work such as [2,3] is exploring initial solutions to persistent problems such as the controllability of the expressions generated, the accuracy of the response provided (in different modalities) and the match between the non-verbal expressions and the semantic text generated.

[1] Deepali Aneja, Rens Hoegen, Daniel McDuff, and Mary Czerwinski. 2021. Understanding Conversational and Expressive Style in a Multimodal Embodied Conversational Agent. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 102, 1–10. https://doi.org/10.1145/3411764.3445708
[2] Geng, Scott, Revant Teotia, Purva Tendulkar, Sachit Menon, and Carl Vondrick. “Affective Faces for Goal-Driven Dyadic Communication.” arXiv preprint arXiv:2301.10939 (2023).
[3] Wei Zhao, Peng Xiao, Rongju Zhang, Yijun Wang, and Jianxin Lin. 2022. Semantic-aware Responsive Listener Head Synthesis. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7065–7069. https://doi.org/10.1145/3503161.3551580

Interaction With Smart Virtual Agents

Description

About this topic

Contact us for more info