Today we use information technology, tomorrow we’ll use robots… machines of all kinds are an integral part of our daily life. Yet, exchange between machines and us, the users, can sometimes be challenging. How can one communicate naturally with a robot, a computer or an IT program? Herein lies the purpose of the embodied conversational agent, an interface allowing for a genuine dialogue between man and machine. IMT has been carrying out research in this field, which straddles the domains of cognitive science and artificial intelligence, observing human-human communication in view to model human-agent communication. At Télécom ParisTech, Catherine Pelachaud has been working for several years to humanize machines…
When we speak to another person, we use a whole range of codes, vocal intonations and facial expressions, often unconsciously. We can display a head nod, a gentle smile… subtle details which may seem insignificant but, which in fact, are key to genuine interaction between people. “The development of an embodied conversationalagent is a way of modeling this communication. We are trying to create an autonomous virtual agent, capable of communicating both verbally and non-verbally” explains Research Director Catherine Pelachaud.
In concrete terms, the aim is to develop a humanoid-style virtual character, who can talk to the user using codes of human communication as naturally as possible. As virtual agents become increasingly realistic, their scope of application is also expanding. In addition to making robots more convincing in their communication with users, and enhancing web sites with autonomous virtual agents capable of dialoging, these systems can also be used to provide other services. “As an example, we are currently working on a European project named Verve, centered around older people who are scared to leave the house or are scared of crowded places”, explains Catherine Pelachaud. “What we are trying to do is develop agents who can communicate with the patient in a virtual environment”. The aim is to overcome the patient’s anxiety through initial confrontation in a controlled environment, which is more reassuring than the real world. Another project applying this technology is the European Tardis project, launched in 2011 and scheduled to conclude in 2014. This research project is geared to help young people with social difficulties in their job search. The idea is to create an embodied conversational agent capable of acting as a virtual recruiter, with whom users can learn to communicate and present themselves more effectively, preparing for situations such as job interviews. Catherine Pelachaud’s team are working on a dozen similar projects, the majority of which were launched within the EU’s Seventh Framework Programme for Research (FP7) and cover fields ranging from ‘serious games’ to robotics. … These projects bring together a large variety of research fields (cognitive science, computer science and even psychology). With such a multi-disciplinary approach it makes it often difficult to categorize the project, even for those involved. But, if pushed to identify a common denominator for the many projects currently underway, its name would be “Greta”.
Greta, the first steps
In 1999, Catherine Pelachaud developed the first embodied conversationalagent (known as Greta), as part of the MagiCster project. “The aim of this EU project was to create a character capable of speaking in a believable way,” explains the researcher. “We had to make sure that gestures, facial expressions and speech were coherent. The notion of believability was essential, and is still a crucial topic today”. The first stage was therefore to analyze human communication as closely as possible. Meticulous observational work was required before programming could begin. “We base our model from existing literature in human and social sciences, theoretical models and video analysis,” continues Catherine Pelachaud. “In Italy, I also worked a lot with Isabella Poggi, a psycholinguist at Roma Tre University, who has a keen interest in human communicative movement and facial expressions”. Once the database was in place, the researchers could begin to design their virtual agent with the ability to mimic human expressions. And so, in 1999, Greta was born. At this stage this female avatar was unable to communicate with a real user, but was able to adapt her facial expressions to match her speech, which was produced using a synthesized voice. Greta’s development can be broken down into three clear phases: what to say, how to say it and how to behave while speaking and listening.
When the body communicates
This virtual agent soon became a stepping stone for further research, with constant improvements to the technology. Ten years after this original breakthrough, the EU Semaine project (in which Télécom ParisTech is a partner) (completed at the end of 2010) produced four unique virtual agents based on the Greta system. Each virtual characters was given a carefully defined emotional trait (aggressive, cheerful, pragmatic etc.) and was able to communicate autonomously with a human user in real time. “In reality, these agents were not capable of understanding what the users said,” explains Catherine Pelachaud. “In order to do that, it would have been necessary to understand and interpret vocabulary: this is an extremely difficult problem and we have still not cracked it.” And yet, how can we talk about communication when the virtual speaker doesn’t understand a word that is spoken? For this reason, the Télécom ParisTech team opted to focus primarily on non-verbal behavior. In the case of the Semaine project, a webcam and microphone were used to extract visual and acoustic cues of the human user in real time. This data was then transferred directly into the Greta system, enabling the virtual agent to react accordingly. The agent then selected what to say from a repertoire of sample phrases. But, more importantly, the embodied conversational agents are able to visibly react to the speech of their interlocutor, using head nods, frown or smiles, etc. “This non-verbal communication is essential in a conversation,” Catherine Pelachaud explains, “If you say something and the person in front of you gives no reaction whatsoever, it’s highly unnerving!” By using this feedback, the virtual agents project the illusion of real communication between two interactants – an illusion which is constantly being improved as these projects evolve.
A growing field
Catherine Pelachaud and her team of fifteen researchers at Télécom ParisTech are currently focusing on fields as varied as laughter, emotions, social interactions and even synchronization with the speaker. In the highly specialized field of embodied conversational agent research, IMT specializes in developing computational model of non-verbal communication, building on the foundations laid by the Greta system to make the reactions and behavior of virtual agents increasingly believable. Whilst tangible applications are still rare in everyday life, progress is being made at a rapid pace. Greta is now ‘open source’, enabling teams from anywhere in the world to make their own improvements to the embodied conversational agents. With each research unit adding a building block to the concept, could we one day envisage natural communication with a virtual character or robot? “It’s still very complicated: this would involve understanding the content of speech, interpreting behavior and remembering everything that has been said. I’m sure we’ll get there one day… But I’m not going to attempt to set a date!” Catherine Pelachaud concludes. In other words, the loquacious androids of science fiction are still some way off: at the moment, the machines are still a long way from matching the complexity of human communication.
Since her first computer graphics lessons at the university, Catherine Pelachaud discovered a true passion for this unique field of research – a passion which has not dimmed over the ensuing twenty years. In 1991, she gained a PhD in Computer Science from the University of Pennsylvania in Philadelphia, completing a thesis entitled “Communication and Coarticulation in Facial Animation” under the supervision of a human body animation specialist and a vocal intonation expert. Catherine pursued her work at Philadelphia until 1993, subsequently moving to the University of Rome where she spent a decade pursuing her research into embodied conversational agents. After several years of teaching in Paris, in 2008 she was appointed Director of Research at the LTCI (Laboratoire Traitement et Communication de l’Information – Laboratory of Information Communication and Processing), a CNRS and Télécom ParisTech joint research unit.