HUMBO: Bridging Response Generation and Facial Expression Synthesis


Abstract in English

Spoken dialogue systems that assist users to solve complex tasks such as movie ticket booking have become an emerging research topic in artificial intelligence and natural language processing areas. With a well-designed dialogue system as an intelligent personal assistant, people can accomplish certain tasks more easily via natural language interactions. Today there are several virtual intelligent assistants in the market; however, most systems only focus on textual or vocal interaction. In this paper, we present HUMBO, a system aiming at generating dialogue responses and simultaneously synthesize corresponding visual expressions on faces for better multimodal interaction. HUMBO can (1) let users determine the appearances of virtual assistants by a single image, and (2) generate coherent emotional utterances and facial expressions on the user-provided image. This is not only a brand new research direction but more importantly, an ultimate step toward more human-like virtual assistants.

Download