A very large proportion of human-human communication is non-verbal (Mehrabian & Ferris 1967), carried by facial expression, glance, gesture and posture. It is well-known in the field that, despite the biological universals underpinning some nonverbal behaviour, cultural differences involve variations in expressive behaviour (e.g., Elfenbein and Ambady 2002). For example, the extent to which people look directly at each other, the meaning of specific gestures and facial expressions, and the context in which they should be used vary over cultures. Thus, it is not obvious whether a smile is a sign of joy or of embarrassment, without being adapted to the cultural context. Proper responses to expressive behaviour are, however, a vital part of successful interaction (Kappas and Descôteaux 2003).
These behaviours are used to track the intentions and goals of the interaction partner and to support the Theory of Mind, the human ability of attributing mental states such as intentions, beliefs, and values, not only to oneself but to others as well.
Because the processing of expressive behaviour takes place largely below the level of consciousness, culturally-specific assumptions about what it means can be particularly difficult to deal with in intercultural interactions (Singelis 1994). They may be heavily involved in the generation of Negative Red Flags (Seelye 1996) in which human interaction suddenly breaks down as the expectations of one or both interaction partners are suddenly confounded. Rising into consciousness the processing of expressive behaviour is an important issue in education in intercultural interaction since, as with the negative affective states already referred to, it allows people to understand and modify their behaviour. While this has typically been carried out through descriptive case studies (Singelis 1994), the use of responsive and interacting virtual agents offers a much more dramatic and engaging avenue for dealing with this issue.
While the eCute showcases could have easily support the use of avatars controlled by users as well as autonomous characters, the latter have strong advantages.
1 Consistent and repeatable expressive behaviour
Asking users to explicitly control expressive behaviour poses a high cognitive load, and unless such users are experts, and/or briefed actors, inconsistent or erroneous behaviour may result. The use of whole body and complete facial motion tracking could be used as an alternative control mode but still does not guarantee consistent and repeatable behaviour as well as being expensive and instrusive.
2 Availability to the end user
Autonomous characters can be easily multiplied to deal with multiple users while one actor could not interact one-to-one with a whole class of children. Characters are also available at any time of day or night, far from the case with human interaction partners, who may also live in wildly different time zones. Finally, there are costs in both time and potentially in money in organising remote human-human interaction, while characters once developed can be freely used.
3 Flexibility and scope.
Once a parametrisable character has been developed, it can be used in an indefinite number of scenarios and across an indefinite number of cultural variables. In particular, it can be used with a controlled degree of exaggeration to portray synthetic cultures. A substantial part of the research so far carried out on culture in virtual agents involves the adaptation of user interface characters to a specific user ‘s culture.
This research is strongly motivated by the study conducted by Lee and Nass (2000), which showed that users tend to prefer to interact with a virtual agent that has a similar cultural background. Three of the eCUTE partners have carried out CUBE-G, a project that uses Hofstede‘s dimensions. Virtual agents are used to model nonverbal communication aspects of two national cultures, German and Japanese. During a conversation with virtual agents, the cultural background of a user is inferred by sensing their nonverbal behaviour using a Nintendo Wii remote controller. Then the nonverbal behaviour of the virtual agents is dynamically adapted according to the culture inferred.
The table below compares the work carried out in the CUBE-G project with that to be carried out in the eCUTE project.
To a lesser degree, virtual agents have also been adapted to specific cultures that are intentionally different from the user ‘s culture. For example, in the Tactical Language Training System (Johnson et al 2004), users interact with autonomous characters from a foreign culture in order to train the culture ‘s spoken language and gestures.
The goal is to teach communicative skills in languages that are less commonly taught in USA, such as Arabic, Chinese or Russian. Learning such languages with traditional courses can be very time- consuming, due to their un-familiar writing systems and cultural norms. However this system only addresses overt communicative aspects of a culture, namely spoken language and gestures. Iacobelli and Cassell (2007) examined the influence of ethnicity on the interaction behaviour of children and found that ethnicity was not only being determined by the outward appearance of the character, but also by specific verbal and nonverbal behaviour patterns. From a technical point of view, the problem arises of how to ensure consistency across multiple channels of communication. To make characters adaptable to cultural differences in interaction behaviour, a set of parameters or rules is needed that enable us to influence the system processes in a consistent manner.
Jan et al. (2007) describe an approach to modify the behaviour of characters by cultural variables relying on Hofstede’s dimensions, see (Hofstede, 2001). The variables are set manually in their system to simulate the behaviour of a group of characters. Starting from an empirical study of Japanese and German communicative behaviours, partners of the eCUTE consortium investigated how to interpolate behaviour along different cultural dimensions resulting in a parameterized computational model for culture-specific generation of verbal and non-verbal behaviour. They recorded and annotated about 20 hours of culture- specific interactions from Germany and Japan in three standardized scenarios (first meeting, negotiation, status difference) in order to empirically ground the behaviour of characters representing the German and the Japanese culture; see Rehm and colleagues (2007).
A number of approaches in this area concentrate on learning environments or interactive role-plays with virtual characters. Khaled et al. (2006) focus on cultural differences in persuasion strategies and present an approach of incorporating these insights into a persuasive game for a collectivist society. Johnson et al.(2004) describe a language tutoring system that also takes cultural differences in gesture usage into account. The users are confronted with some prototypical settings and apart from speech input, have to select gestures for their avatars. Moreover, they have to interpret the gestures by the tutor agents to solve their tasks. Warren et al. (2005) as well as Rehm et al. (2007) aim at cross-cultural training scenarios and describe ideas on how these can be realised with virtual characters.
A different approach is described by Isbister and colleagues (2000) who do not focus on training culture-specific behaviours, but rather aim at encouraging discussion among members of different cultures (in their case American and Japanese students). They realized a cross-cultural video conferencing system (between American and Japanese students) that features a so-called Helper agent, which intervenes if communication between participants is disrupted. The agent then tries to find safe topics which allow each interlocutor to commence the conversation. To make the agent acceptable to participants from both cultures and to prevent the agent from corresponding to either an American or a Japanese stereotype, it was designed to be equally (un)familiar for both cultures by taking on the appearance of a dog.