Workshop ISIAA 2017 – Author Index |
Contents -
Abstracts -
Authors
|
B C D E G H J K L M N O P R S T U V W Z
Bechade, Lucile |
![]() Lucile Bechade, Kevin El Haddad, Juliette Bourquin, Stéphane Dupont, and Laurence Devillers (University of Paris-Saclay, France; University of Mons, Belgium) This paper presents a data collection carried out in the framework of the Joker Project. Interaction scenarios have been designed in order to study the e ects of a ect bursts in a human-robot interaction and to build a system capable of using multilevel a ect bursts in a human-robot interaction. We use two main audio expression cues: verbal (synthesised sentences) and nonverbal (a ect bursts). The nonverbal cues used are sounds expressing disgust, amusement, fear, misunderstanding and surprise. Three di erent intensity levels for each sound have been generating for each emotion. ![]() ![]() |
|
Biancardi, Beatrice |
![]() Beatrice Biancardi, Angelo Cafaro, and Catherine Pelachaud (CNRS, France; UPMC, France) In this abstract we introduce the design of an experiment aimed at investigating how users' impressions of an embodied conversational agent are influenced by agent's non-verbal behaviour. We focus on impressions of warmth and competence, the two fundamental dimensions of social perception. Agent's gestures, arms rest poses and smile frequency are manipulated, as well as users' expectations about agent's competence. We hypothesize that user's judgments will differ according to his expectations, by following the Expectancy Violation Theory proposed by Burgoon and colleagues. We also hypothesize to replicate the results found in our previous study concerning human-human interaction, for example high frequency of smiles will elicit higher warmth and lower competence impressions compared to low frequency of smiles, while arms crossed will elicit low competence and low warmth impressions. ![]() ![]() |
|
Blache, Philippe |
![]() Philippe Blache (Aix-Marseille University, France) This paper presents a new framework for implementing a dialogue manager, making it possible to infer new information in the course of the interaction as well as generating responses from the virtual agent. The approach relies on a specific organization of knowledge bases, including the creation of a common ground and a belief base. Moreover, the same type of rules implement both inference and control of the dialogue. This approach is implemented within a dialogue system for training doctors to break bad news (ACORFORMed). ![]() ![]() |
|
Bouallegue, Ammar |
![]() Rim Trabelsi, Jagannadan Varadarajan, Yong Pei, Le Zhang, Issam Jabri, Ammar Bouallegue, and Pierre Moulin (Advanced Digital Sciences Center, Singapore; SAP, Singapore; Al Yamamah University, Saudi Arabia; National Engineering School of Tunis, Tunisia; University of Illinois at Urbana-Champaign, USA) This paper addresses the issue of analyzing social interactions between humans in videos. We focus on recognizing dyadic human interactions through multi-modal data, specifically, depth, color and skeleton sequences. Firstly, we introduce a new person-centric proxemic descriptor, named PROF, extracted from skeleton data able to incorporate intrinsic and extrinsic distances between two interacting persons in a view-variant scheme. Then, a novel key frame selection approach is introduced to identify salient instants of the interaction sequence based on the joint energy. From RGBD videos, more holistic CNN features are extracted by applying an adaptive pre-trained CNNs on optical flow frames. Features from three modalities are combined then classified using linear SVM. Finally, extensive experiments have been carried on two multi-modal and multi-view interactions datasets prove the robustness of the introduced approach comparing to state-of-the-art methods. ![]() ![]() |
|
Bourquin, Juliette |
![]() Lucile Bechade, Kevin El Haddad, Juliette Bourquin, Stéphane Dupont, and Laurence Devillers (University of Paris-Saclay, France; University of Mons, Belgium) This paper presents a data collection carried out in the framework of the Joker Project. Interaction scenarios have been designed in order to study the e ects of a ect bursts in a human-robot interaction and to build a system capable of using multilevel a ect bursts in a human-robot interaction. We use two main audio expression cues: verbal (synthesised sentences) and nonverbal (a ect bursts). The nonverbal cues used are sounds expressing disgust, amusement, fear, misunderstanding and surprise. Three di erent intensity levels for each sound have been generating for each emotion. ![]() ![]() |
|
Cafaro, Angelo |
![]() Beatrice Biancardi, Angelo Cafaro, and Catherine Pelachaud (CNRS, France; UPMC, France) In this abstract we introduce the design of an experiment aimed at investigating how users' impressions of an embodied conversational agent are influenced by agent's non-verbal behaviour. We focus on impressions of warmth and competence, the two fundamental dimensions of social perception. Agent's gestures, arms rest poses and smile frequency are manipulated, as well as users' expectations about agent's competence. We hypothesize that user's judgments will differ according to his expectations, by following the Expectancy Violation Theory proposed by Burgoon and colleagues. We also hypothesize to replicate the results found in our previous study concerning human-human interaction, for example high frequency of smiles will elicit higher warmth and lower competence impressions compared to low frequency of smiles, while arms crossed will elicit low competence and low warmth impressions. ![]() ![]() |
|
Campbell, Nick |
![]() Emer Gilmartin, Brendan Spillane, Maria O’Reilly, Ketong Su, Christian Saam, Benjamin R. Cowan, Nick Campbell, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) Conversation proceeds through dialogue moves or acts, and dialog act annotation can aid the design of artificial dialog. While many dialogs are task-based or instrumental, with clear goals, as in the case of a service encounter or business meeting, many are more interactional in nature, as in friendly chats or longer casual conversations. Early research on dialogue acts focussed on transactional or task-based dialogue but work is now expanding to social aspects of interaction. We review how dialog annotation schemes treat non-task elements of dialog -- greeting and leave-taking sequences in particular. We describe the collection and annotation, using the ISO Standard 24617-2 Semantic annotation framework, Part 2: Dialogue acts, of a corpus of 187 text dialogues and study the dialog acts used in greeting and leave-taking. ![]() ![]() ![]() Emer Gilmartin, Marine Collery, Ketong Su, Yuyun Huang, Christy Elias, Benjamin R. Cowan, and Nick Campbell (Trinity College Dublin, Ireland; Grenoble INP, France; University College Dublin, Ireland) Social or interactive talk differs from task-based or instrumental interactions in many ways. Quantitative knowledge of these differences will aid the design of convincing human-machine interfaces for applications requiring machines to take on roles including social companions, healthcare providers, or tutors. We briefly review accounts of social talk from the literature. We outline a three part data collection of human-human, human-woz and human-machine dialogs incorporating light social talk and a guessing game. We finally describe our ongoing experiments on the corpus collected. ![]() ![]() |
|
Chaminade, Thierry |
![]() Thierry Chaminade (Aix-Marseille University, France) Anthropomorphic artificial agents, computed characters or humanoid robots, can be sued to investigate human cognition. They are intrinsically ambivalent. They appear and act as humans, hence we should tend to consider them as human, yet we know they are machine designed by humans, and should not consider them as humans. Reviewing a number of behavioral and neurophysiological studies provides insights into social mechanisms that are primarily influenced by the appearance of the agent, and in particular its resemblance to humans, and other mechanisms that are influenced by the knowledge we have about the artificial nature of the agent. A significant finding is that, as expected, humans don’t naturally adopt an intentional stance when interacting with artificial agents. ![]() ![]() ![]() Matthieu Riou, Bassam Jabaian, Stéphane Huet, Thierry Chaminade, and Fabrice Lefèvre (University of Avignon, France; Aix-Marseille University, France) In this paper, we present a brief overview of our ongoing work about artificial interactive agents and their adaptation to users. Several possibilities to introduce humorous productions in a spoken dialog system are investigated in order to enhance naturalness during social interactions between the agent and the user. We finally describe our plan on how neuroscience will help to better evaluate the proposed systems, both objectively and subjectively. ![]() ![]() |
|
Collery, Marine |
![]() Emer Gilmartin, Marine Collery, Ketong Su, Yuyun Huang, Christy Elias, Benjamin R. Cowan, and Nick Campbell (Trinity College Dublin, Ireland; Grenoble INP, France; University College Dublin, Ireland) Social or interactive talk differs from task-based or instrumental interactions in many ways. Quantitative knowledge of these differences will aid the design of convincing human-machine interfaces for applications requiring machines to take on roles including social companions, healthcare providers, or tutors. We briefly review accounts of social talk from the literature. We outline a three part data collection of human-human, human-woz and human-machine dialogs incorporating light social talk and a guessing game. We finally describe our ongoing experiments on the corpus collected. ![]() ![]() |
|
Cowan, Benjamin R. |
![]() Emer Gilmartin, Brendan Spillane, Maria O’Reilly, Ketong Su, Christian Saam, Benjamin R. Cowan, Nick Campbell, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) Conversation proceeds through dialogue moves or acts, and dialog act annotation can aid the design of artificial dialog. While many dialogs are task-based or instrumental, with clear goals, as in the case of a service encounter or business meeting, many are more interactional in nature, as in friendly chats or longer casual conversations. Early research on dialogue acts focussed on transactional or task-based dialogue but work is now expanding to social aspects of interaction. We review how dialog annotation schemes treat non-task elements of dialog -- greeting and leave-taking sequences in particular. We describe the collection and annotation, using the ISO Standard 24617-2 Semantic annotation framework, Part 2: Dialogue acts, of a corpus of 187 text dialogues and study the dialog acts used in greeting and leave-taking. ![]() ![]() ![]() Emer Gilmartin, Marine Collery, Ketong Su, Yuyun Huang, Christy Elias, Benjamin R. Cowan, and Nick Campbell (Trinity College Dublin, Ireland; Grenoble INP, France; University College Dublin, Ireland) Social or interactive talk differs from task-based or instrumental interactions in many ways. Quantitative knowledge of these differences will aid the design of convincing human-machine interfaces for applications requiring machines to take on roles including social companions, healthcare providers, or tutors. We briefly review accounts of social talk from the literature. We outline a three part data collection of human-human, human-woz and human-machine dialogs incorporating light social talk and a guessing game. We finally describe our ongoing experiments on the corpus collected. ![]() ![]() ![]() Brendan Spillane, Emer Gilmartin, Christian Saam, Ketong Su, Benjamin R. Cowan, Séamus Lawless, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) This paper introduces ADELE, a Personalized Intelligent Compan- ion designed to engage with users through spoken dialog to help them explore topics of interest. The system will maintain a user model of information consumption habits and preferences in order to (1) personalize the user’s experience for ongoing interactions, and (2) build the user-machine relationship to model that of a friendly companion. The paper details the overall research goal, existing progress, the current focus, and the long term plan for the project. ![]() ![]() |
|
Curry, Amanda Cercas |
![]() Amanda Cercas Curry, Helen Hastie, and Verena Rieser (Heriot-Watt University, UK) In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success. Consequently, evaluation of these systems is notoriously hard. In this paper, we review current evaluation methods, focusing on automatic metrics. We conclude that turn-based metrics often ignore the context and do not account for the fact that several replies are valid, while end-of-dialogue rewards are mainly hand-crafted. Both lack grounding in human perceptions. ![]() ![]() |
|
Devillers, Laurence |
![]() Lucile Bechade, Kevin El Haddad, Juliette Bourquin, Stéphane Dupont, and Laurence Devillers (University of Paris-Saclay, France; University of Mons, Belgium) This paper presents a data collection carried out in the framework of the Joker Project. Interaction scenarios have been designed in order to study the e ects of a ect bursts in a human-robot interaction and to build a system capable of using multilevel a ect bursts in a human-robot interaction. We use two main audio expression cues: verbal (synthesised sentences) and nonverbal (a ect bursts). The nonverbal cues used are sounds expressing disgust, amusement, fear, misunderstanding and surprise. Three di erent intensity levels for each sound have been generating for each emotion. ![]() ![]() |
|
Dondrup, Christian |
![]() Christian Dondrup, Ioannis Papaioannou, Jekaterina Novikova, and Oliver Lemon (Heriot-Watt University, UK) Working in human populated environments requires fast and robust action selection and execution especially when deliberately trying to interact with humans. This work presents the combination of a high-level planner (ROSPlan) for action sequencing and automatically generated finite state machines (PNP) for execution. Using this combined system we are able to exploit the speed and robustness of the execution and the flexibility of the sequence generation and combine the positive aspects of both approaches. ![]() ![]() |
|
Dupont, Stéphane |
![]() Lucile Bechade, Kevin El Haddad, Juliette Bourquin, Stéphane Dupont, and Laurence Devillers (University of Paris-Saclay, France; University of Mons, Belgium) This paper presents a data collection carried out in the framework of the Joker Project. Interaction scenarios have been designed in order to study the e ects of a ect bursts in a human-robot interaction and to build a system capable of using multilevel a ect bursts in a human-robot interaction. We use two main audio expression cues: verbal (synthesised sentences) and nonverbal (a ect bursts). The nonverbal cues used are sounds expressing disgust, amusement, fear, misunderstanding and surprise. Three di erent intensity levels for each sound have been generating for each emotion. ![]() ![]() |
|
El Haddad, Kevin |
![]() Lucile Bechade, Kevin El Haddad, Juliette Bourquin, Stéphane Dupont, and Laurence Devillers (University of Paris-Saclay, France; University of Mons, Belgium) This paper presents a data collection carried out in the framework of the Joker Project. Interaction scenarios have been designed in order to study the e ects of a ect bursts in a human-robot interaction and to build a system capable of using multilevel a ect bursts in a human-robot interaction. We use two main audio expression cues: verbal (synthesised sentences) and nonverbal (a ect bursts). The nonverbal cues used are sounds expressing disgust, amusement, fear, misunderstanding and surprise. Three di erent intensity levels for each sound have been generating for each emotion. ![]() ![]() ![]() Catharine Oertel, Patrik Jonell, Kevin El Haddad, Eva Szekely, and Joakim Gustafson (KTH, Sweden; University of Mons, Belgium) In this paper we are describing how audio-visual corpora recordings using crowd-sourcing techniques can be used for the audio-visual synthesis of attitudional non-verbal feedback expressions for virtual agents. We are discussing the limitations of this approach as well as where we see the opportunities for this technology. ![]() ![]() |
|
Elias, Christy |
![]() Emer Gilmartin, Marine Collery, Ketong Su, Yuyun Huang, Christy Elias, Benjamin R. Cowan, and Nick Campbell (Trinity College Dublin, Ireland; Grenoble INP, France; University College Dublin, Ireland) Social or interactive talk differs from task-based or instrumental interactions in many ways. Quantitative knowledge of these differences will aid the design of convincing human-machine interfaces for applications requiring machines to take on roles including social companions, healthcare providers, or tutors. We briefly review accounts of social talk from the literature. We outline a three part data collection of human-human, human-woz and human-machine dialogs incorporating light social talk and a guessing game. We finally describe our ongoing experiments on the corpus collected. ![]() ![]() |
|
Gilmartin, Emer |
![]() Emer Gilmartin, Brendan Spillane, Maria O’Reilly, Ketong Su, Christian Saam, Benjamin R. Cowan, Nick Campbell, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) Conversation proceeds through dialogue moves or acts, and dialog act annotation can aid the design of artificial dialog. While many dialogs are task-based or instrumental, with clear goals, as in the case of a service encounter or business meeting, many are more interactional in nature, as in friendly chats or longer casual conversations. Early research on dialogue acts focussed on transactional or task-based dialogue but work is now expanding to social aspects of interaction. We review how dialog annotation schemes treat non-task elements of dialog -- greeting and leave-taking sequences in particular. We describe the collection and annotation, using the ISO Standard 24617-2 Semantic annotation framework, Part 2: Dialogue acts, of a corpus of 187 text dialogues and study the dialog acts used in greeting and leave-taking. ![]() ![]() ![]() Emer Gilmartin, Marine Collery, Ketong Su, Yuyun Huang, Christy Elias, Benjamin R. Cowan, and Nick Campbell (Trinity College Dublin, Ireland; Grenoble INP, France; University College Dublin, Ireland) Social or interactive talk differs from task-based or instrumental interactions in many ways. Quantitative knowledge of these differences will aid the design of convincing human-machine interfaces for applications requiring machines to take on roles including social companions, healthcare providers, or tutors. We briefly review accounts of social talk from the literature. We outline a three part data collection of human-human, human-woz and human-machine dialogs incorporating light social talk and a guessing game. We finally describe our ongoing experiments on the corpus collected. ![]() ![]() ![]() Brendan Spillane, Emer Gilmartin, Christian Saam, Ketong Su, Benjamin R. Cowan, Séamus Lawless, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) This paper introduces ADELE, a Personalized Intelligent Compan- ion designed to engage with users through spoken dialog to help them explore topics of interest. The system will maintain a user model of information consumption habits and preferences in order to (1) personalize the user’s experience for ongoing interactions, and (2) build the user-machine relationship to model that of a friendly companion. The paper details the overall research goal, existing progress, the current focus, and the long term plan for the project. ![]() ![]() |
|
Gross, Stephanie |
![]() Brigitte Krenn, Stephanie Gross, and Lisa Nussbaumer (Austrian Research Institute for Artificial Intelligence, Austria) In human communication, pronouns are an important means of perspective taking, and in particular in task-oriented communication personal pronouns are an indicator of who has to do what at a certain moment in a given task. The ability of handling task-related discourse is a factor for robots to interact with people in their homes in everyday life. Both, learning and resolution of personal pronouns pose a challenge for robot architectures as there has to be a permanent adaptation to the human interlocutor’s use of personal pronouns. Especially the use of ich, du, wir (I, you, we) may be irritating for the robot’s natural language processing system. ![]() ![]() |
|
Gustafson, Joakim |
![]() Catharine Oertel, Patrik Jonell, Kevin El Haddad, Eva Szekely, and Joakim Gustafson (KTH, Sweden; University of Mons, Belgium) In this paper we are describing how audio-visual corpora recordings using crowd-sourcing techniques can be used for the audio-visual synthesis of attitudional non-verbal feedback expressions for virtual agents. We are discussing the limitations of this approach as well as where we see the opportunities for this technology. ![]() ![]() |
|
Hastie, Helen |
![]() Amanda Cercas Curry, Helen Hastie, and Verena Rieser (Heriot-Watt University, UK) In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success. Consequently, evaluation of these systems is notoriously hard. In this paper, we review current evaluation methods, focusing on automatic metrics. We conclude that turn-based metrics often ignore the context and do not account for the fact that several replies are valid, while end-of-dialogue rewards are mainly hand-crafted. Both lack grounding in human perceptions. ![]() ![]() |
|
Huang, Yuyun |
![]() Emer Gilmartin, Marine Collery, Ketong Su, Yuyun Huang, Christy Elias, Benjamin R. Cowan, and Nick Campbell (Trinity College Dublin, Ireland; Grenoble INP, France; University College Dublin, Ireland) Social or interactive talk differs from task-based or instrumental interactions in many ways. Quantitative knowledge of these differences will aid the design of convincing human-machine interfaces for applications requiring machines to take on roles including social companions, healthcare providers, or tutors. We briefly review accounts of social talk from the literature. We outline a three part data collection of human-human, human-woz and human-machine dialogs incorporating light social talk and a guessing game. We finally describe our ongoing experiments on the corpus collected. ![]() ![]() |
|
Huet, Stéphane |
![]() Matthieu Riou, Bassam Jabaian, Stéphane Huet, Thierry Chaminade, and Fabrice Lefèvre (University of Avignon, France; Aix-Marseille University, France) In this paper, we present a brief overview of our ongoing work about artificial interactive agents and their adaptation to users. Several possibilities to introduce humorous productions in a spoken dialog system are investigated in order to enhance naturalness during social interactions between the agent and the user. We finally describe our plan on how neuroscience will help to better evaluate the proposed systems, both objectively and subjectively. ![]() ![]() |
|
Jabaian, Bassam |
![]() Matthieu Riou, Bassam Jabaian, Stéphane Huet, Thierry Chaminade, and Fabrice Lefèvre (University of Avignon, France; Aix-Marseille University, France) In this paper, we present a brief overview of our ongoing work about artificial interactive agents and their adaptation to users. Several possibilities to introduce humorous productions in a spoken dialog system are investigated in order to enhance naturalness during social interactions between the agent and the user. We finally describe our plan on how neuroscience will help to better evaluate the proposed systems, both objectively and subjectively. ![]() ![]() |
|
Jabri, Issam |
![]() Rim Trabelsi, Jagannadan Varadarajan, Yong Pei, Le Zhang, Issam Jabri, Ammar Bouallegue, and Pierre Moulin (Advanced Digital Sciences Center, Singapore; SAP, Singapore; Al Yamamah University, Saudi Arabia; National Engineering School of Tunis, Tunisia; University of Illinois at Urbana-Champaign, USA) This paper addresses the issue of analyzing social interactions between humans in videos. We focus on recognizing dyadic human interactions through multi-modal data, specifically, depth, color and skeleton sequences. Firstly, we introduce a new person-centric proxemic descriptor, named PROF, extracted from skeleton data able to incorporate intrinsic and extrinsic distances between two interacting persons in a view-variant scheme. Then, a novel key frame selection approach is introduced to identify salient instants of the interaction sequence based on the joint energy. From RGBD videos, more holistic CNN features are extracted by applying an adaptive pre-trained CNNs on optical flow frames. Features from three modalities are combined then classified using linear SVM. Finally, extensive experiments have been carried on two multi-modal and multi-view interactions datasets prove the robustness of the introduced approach comparing to state-of-the-art methods. ![]() ![]() |
|
Jonell, Patrik |
![]() Catharine Oertel, Patrik Jonell, Kevin El Haddad, Eva Szekely, and Joakim Gustafson (KTH, Sweden; University of Mons, Belgium) In this paper we are describing how audio-visual corpora recordings using crowd-sourcing techniques can be used for the audio-visual synthesis of attitudional non-verbal feedback expressions for virtual agents. We are discussing the limitations of this approach as well as where we see the opportunities for this technology. ![]() ![]() |
|
Koda, Tomoko |
![]() Tomoko Koda (Osaka Institute of Technology, Japan) This paper introduces the results of a series of experiments on the impression of agents that perform self-adaptors. Human-human interactions were video-taped and analyzed with respect to usage of different types of self-adaptors (relaxed/stressful), and gender-specific self-adaptors (masculine/feminine). We then implemented virtual agents that performed these self-adaptors. Evaluation of the interactions between humans and agents suggested: 1) Relaxed self-adaptors were more likely to prevent any deterioration in the perceived friendliness of the agents than agents without self-adaptors. 2) People with higher social skills harbor a higher perceived friendliness with agents that exhibited self-adaptors than people with lower social skills. 3) Impressions of interactions with agents are formed by mutual-interactions between the self-adaptors and the conversational content. 4) There are cultural differences in sensitivity to other culture's self-adaptors. 5) There is a dichotomy on the impression on the agents that perform gender-specific self-adaptors between participants’ gender. ![]() ![]() |
|
Krenn, Brigitte |
![]() Brigitte Krenn, Stephanie Gross, and Lisa Nussbaumer (Austrian Research Institute for Artificial Intelligence, Austria) In human communication, pronouns are an important means of perspective taking, and in particular in task-oriented communication personal pronouns are an indicator of who has to do what at a certain moment in a given task. The ability of handling task-related discourse is a factor for robots to interact with people in their homes in everyday life. Both, learning and resolution of personal pronouns pose a challenge for robot architectures as there has to be a permanent adaptation to the human interlocutor’s use of personal pronouns. Especially the use of ich, du, wir (I, you, we) may be irritating for the robot’s natural language processing system. ![]() ![]() |
|
Lai, Catherine |
![]() Leimin Tian, Johanna D. Moore, and Catherine Lai (University of Edinburgh, UK) Emotions play a vital role in human communications. Therefore, it is desirable for virtual agent dialogue systems to recognize and react to user's emotions. However, current automatic emotion recognizers have limited performance compared to humans. Our work attempts to improve performance of recognizing emotions in spoken dialogue by identifying dialogue cues predictive of emotions, and by building multimodal recognition models with a knowledge-inspired hierarchy. We conduct experiments on both spontaneous and acted dialogue data to study the efficacy of the proposed approaches. Our results show that including prior knowledge on emotions in dialogue in either the feature representation or the model structure is beneficial for automatic emotion recognition. ![]() ![]() |
|
Lawless, Séamus |
![]() Brendan Spillane, Emer Gilmartin, Christian Saam, Ketong Su, Benjamin R. Cowan, Séamus Lawless, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) This paper introduces ADELE, a Personalized Intelligent Compan- ion designed to engage with users through spoken dialog to help them explore topics of interest. The system will maintain a user model of information consumption habits and preferences in order to (1) personalize the user’s experience for ongoing interactions, and (2) build the user-machine relationship to model that of a friendly companion. The paper details the overall research goal, existing progress, the current focus, and the long term plan for the project. ![]() ![]() |
|
Lefèvre, Fabrice |
![]() Fabrice Lefèvre (University of Avignon, France) In this talk, work about vocal artificial agent ongoing in the Vocal Interaction Group at LIA, University of Avignon, is presented. A focus is made on the research line aiming at endowing such interactive agents with human-like social abilities. After a short overview of the state-of-the-art in spoken dialogue systems a summary of recent efforts to improve systems' development through online learning using social signals is proposed. Then two examples of skills favoring human-like social interactions are presented: firstly a new turn-taking management scheme based on incremental processing and reinforcement learning, then automatic generation and usage optimisaton of humor traits. These studies converge in enabling to develop interactive systems which could foster studies in human sciences to better understand specificities of human social communication. ![]() ![]() ![]() Matthieu Riou, Bassam Jabaian, Stéphane Huet, Thierry Chaminade, and Fabrice Lefèvre (University of Avignon, France; Aix-Marseille University, France) In this paper, we present a brief overview of our ongoing work about artificial interactive agents and their adaptation to users. Several possibilities to introduce humorous productions in a spoken dialog system are investigated in order to enhance naturalness during social interactions between the agent and the user. We finally describe our plan on how neuroscience will help to better evaluate the proposed systems, both objectively and subjectively. ![]() ![]() |
|
Lemon, Oliver |
![]() Christian Dondrup, Ioannis Papaioannou, Jekaterina Novikova, and Oliver Lemon (Heriot-Watt University, UK) Working in human populated environments requires fast and robust action selection and execution especially when deliberately trying to interact with humans. This work presents the combination of a high-level planner (ROSPlan) for action sequencing and automatically generated finite state machines (PNP) for execution. Using this combined system we are able to exploit the speed and robustness of the execution and the flexibility of the sequence generation and combine the positive aspects of both approaches. ![]() ![]() |
|
Miehle, Juliana |
![]() Louisa Pragst, Juliana Miehle, Wolfgang Minker, and Stefan Ultes (University of Ulm, Germany; Cambridge University, UK) Access to health care related information can be vital and should be easily accessible. However, immigrants often have difficulties to obtain the relevant information due to language barriers and cultural differences. In the KRISTINA project, we address those difficulties by creating a socially competent multimodal dialogue system that can assist immigrants in getting information about health care related questions. Dialogue management, as core component responsible for the system behaviour, has a significant impact on the successful reception of such a system. Hence, this work presents the specific challenges of the KRISTINA project to adaptive dialogue management, namely the handling of a large dialogue domain and the cultural adaptability required by the envisioned dialogue system, and our approach to handling them. ![]() ![]() |
|
Minker, Wolfgang |
![]() Louisa Pragst, Juliana Miehle, Wolfgang Minker, and Stefan Ultes (University of Ulm, Germany; Cambridge University, UK) Access to health care related information can be vital and should be easily accessible. However, immigrants often have difficulties to obtain the relevant information due to language barriers and cultural differences. In the KRISTINA project, we address those difficulties by creating a socially competent multimodal dialogue system that can assist immigrants in getting information about health care related questions. Dialogue management, as core component responsible for the system behaviour, has a significant impact on the successful reception of such a system. Hence, this work presents the specific challenges of the KRISTINA project to adaptive dialogue management, namely the handling of a large dialogue domain and the cultural adaptability required by the envisioned dialogue system, and our approach to handling them. ![]() ![]() |
|
Moore, Johanna D. |
![]() Leimin Tian, Johanna D. Moore, and Catherine Lai (University of Edinburgh, UK) Emotions play a vital role in human communications. Therefore, it is desirable for virtual agent dialogue systems to recognize and react to user's emotions. However, current automatic emotion recognizers have limited performance compared to humans. Our work attempts to improve performance of recognizing emotions in spoken dialogue by identifying dialogue cues predictive of emotions, and by building multimodal recognition models with a knowledge-inspired hierarchy. We conduct experiments on both spontaneous and acted dialogue data to study the efficacy of the proposed approaches. Our results show that including prior knowledge on emotions in dialogue in either the feature representation or the model structure is beneficial for automatic emotion recognition. ![]() ![]() |
|
Moulin, Pierre |
![]() Rim Trabelsi, Jagannadan Varadarajan, Yong Pei, Le Zhang, Issam Jabri, Ammar Bouallegue, and Pierre Moulin (Advanced Digital Sciences Center, Singapore; SAP, Singapore; Al Yamamah University, Saudi Arabia; National Engineering School of Tunis, Tunisia; University of Illinois at Urbana-Champaign, USA) This paper addresses the issue of analyzing social interactions between humans in videos. We focus on recognizing dyadic human interactions through multi-modal data, specifically, depth, color and skeleton sequences. Firstly, we introduce a new person-centric proxemic descriptor, named PROF, extracted from skeleton data able to incorporate intrinsic and extrinsic distances between two interacting persons in a view-variant scheme. Then, a novel key frame selection approach is introduced to identify salient instants of the interaction sequence based on the joint energy. From RGBD videos, more holistic CNN features are extracted by applying an adaptive pre-trained CNNs on optical flow frames. Features from three modalities are combined then classified using linear SVM. Finally, extensive experiments have been carried on two multi-modal and multi-view interactions datasets prove the robustness of the introduced approach comparing to state-of-the-art methods. ![]() ![]() |
|
Novikova, Jekaterina |
![]() Christian Dondrup, Ioannis Papaioannou, Jekaterina Novikova, and Oliver Lemon (Heriot-Watt University, UK) Working in human populated environments requires fast and robust action selection and execution especially when deliberately trying to interact with humans. This work presents the combination of a high-level planner (ROSPlan) for action sequencing and automatically generated finite state machines (PNP) for execution. Using this combined system we are able to exploit the speed and robustness of the execution and the flexibility of the sequence generation and combine the positive aspects of both approaches. ![]() ![]() |
|
Nussbaumer, Lisa |
![]() Brigitte Krenn, Stephanie Gross, and Lisa Nussbaumer (Austrian Research Institute for Artificial Intelligence, Austria) In human communication, pronouns are an important means of perspective taking, and in particular in task-oriented communication personal pronouns are an indicator of who has to do what at a certain moment in a given task. The ability of handling task-related discourse is a factor for robots to interact with people in their homes in everyday life. Both, learning and resolution of personal pronouns pose a challenge for robot architectures as there has to be a permanent adaptation to the human interlocutor’s use of personal pronouns. Especially the use of ich, du, wir (I, you, we) may be irritating for the robot’s natural language processing system. ![]() ![]() |
|
Oertel, Catharine |
![]() Catharine Oertel, Patrik Jonell, Kevin El Haddad, Eva Szekely, and Joakim Gustafson (KTH, Sweden; University of Mons, Belgium) In this paper we are describing how audio-visual corpora recordings using crowd-sourcing techniques can be used for the audio-visual synthesis of attitudional non-verbal feedback expressions for virtual agents. We are discussing the limitations of this approach as well as where we see the opportunities for this technology. ![]() ![]() |
|
O’Reilly, Maria |
![]() Emer Gilmartin, Brendan Spillane, Maria O’Reilly, Ketong Su, Christian Saam, Benjamin R. Cowan, Nick Campbell, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) Conversation proceeds through dialogue moves or acts, and dialog act annotation can aid the design of artificial dialog. While many dialogs are task-based or instrumental, with clear goals, as in the case of a service encounter or business meeting, many are more interactional in nature, as in friendly chats or longer casual conversations. Early research on dialogue acts focussed on transactional or task-based dialogue but work is now expanding to social aspects of interaction. We review how dialog annotation schemes treat non-task elements of dialog -- greeting and leave-taking sequences in particular. We describe the collection and annotation, using the ISO Standard 24617-2 Semantic annotation framework, Part 2: Dialogue acts, of a corpus of 187 text dialogues and study the dialog acts used in greeting and leave-taking. ![]() ![]() |
|
Papaioannou, Ioannis |
![]() Christian Dondrup, Ioannis Papaioannou, Jekaterina Novikova, and Oliver Lemon (Heriot-Watt University, UK) Working in human populated environments requires fast and robust action selection and execution especially when deliberately trying to interact with humans. This work presents the combination of a high-level planner (ROSPlan) for action sequencing and automatically generated finite state machines (PNP) for execution. Using this combined system we are able to exploit the speed and robustness of the execution and the flexibility of the sequence generation and combine the positive aspects of both approaches. ![]() ![]() |
|
Pei, Yong |
![]() Rim Trabelsi, Jagannadan Varadarajan, Yong Pei, Le Zhang, Issam Jabri, Ammar Bouallegue, and Pierre Moulin (Advanced Digital Sciences Center, Singapore; SAP, Singapore; Al Yamamah University, Saudi Arabia; National Engineering School of Tunis, Tunisia; University of Illinois at Urbana-Champaign, USA) This paper addresses the issue of analyzing social interactions between humans in videos. We focus on recognizing dyadic human interactions through multi-modal data, specifically, depth, color and skeleton sequences. Firstly, we introduce a new person-centric proxemic descriptor, named PROF, extracted from skeleton data able to incorporate intrinsic and extrinsic distances between two interacting persons in a view-variant scheme. Then, a novel key frame selection approach is introduced to identify salient instants of the interaction sequence based on the joint energy. From RGBD videos, more holistic CNN features are extracted by applying an adaptive pre-trained CNNs on optical flow frames. Features from three modalities are combined then classified using linear SVM. Finally, extensive experiments have been carried on two multi-modal and multi-view interactions datasets prove the robustness of the introduced approach comparing to state-of-the-art methods. ![]() ![]() |
|
Pelachaud, Catherine |
![]() Beatrice Biancardi, Angelo Cafaro, and Catherine Pelachaud (CNRS, France; UPMC, France) In this abstract we introduce the design of an experiment aimed at investigating how users' impressions of an embodied conversational agent are influenced by agent's non-verbal behaviour. We focus on impressions of warmth and competence, the two fundamental dimensions of social perception. Agent's gestures, arms rest poses and smile frequency are manipulated, as well as users' expectations about agent's competence. We hypothesize that user's judgments will differ according to his expectations, by following the Expectancy Violation Theory proposed by Burgoon and colleagues. We also hypothesize to replicate the results found in our previous study concerning human-human interaction, for example high frequency of smiles will elicit higher warmth and lower competence impressions compared to low frequency of smiles, while arms crossed will elicit low competence and low warmth impressions. ![]() ![]() ![]() Catherine Pelachaud (CNRS, France; UPMC, France) To create socially aware virtual agents, we conduct research along two main research directions: 1) develop richer models of multimodal behaviors for the agent; 2) make the agent a more socially competent interlocutor. ![]() ![]() |
|
Pragst, Louisa |
![]() Louisa Pragst, Juliana Miehle, Wolfgang Minker, and Stefan Ultes (University of Ulm, Germany; Cambridge University, UK) Access to health care related information can be vital and should be easily accessible. However, immigrants often have difficulties to obtain the relevant information due to language barriers and cultural differences. In the KRISTINA project, we address those difficulties by creating a socially competent multimodal dialogue system that can assist immigrants in getting information about health care related questions. Dialogue management, as core component responsible for the system behaviour, has a significant impact on the successful reception of such a system. Hence, this work presents the specific challenges of the KRISTINA project to adaptive dialogue management, namely the handling of a large dialogue domain and the cultural adaptability required by the envisioned dialogue system, and our approach to handling them. ![]() ![]() |
|
Richards, Deborah |
![]() Deborah Richards (Macquarie University, Australia) Despite being in the era of Big Data, where our devices seem to anticipate and feed our every desire, intelligent virtual agents appear to lack intimate and important knowledge of their user. Current cognitive agent architectures usually include situation awareness that allows agents to sense their environment, including their human partner, and provide congruent empathic behaviours. Depending on the framework, agents may exhibit their own personality, culture, memories, goals and reasoning styles. However, tailored adaptive behaviours based on multi-dimensional and deep understanding of the human essential for enduring beneficial relationships in certain contexts are lacking. In this paper, examples are provided of what an agent may need to know about the human in the application domains of education, health and cybersecurity and the challenges around agent adaptation and acquisition of relevant data and knowledge. ![]() ![]() |
|
Rieser, Verena |
![]() Amanda Cercas Curry, Helen Hastie, and Verena Rieser (Heriot-Watt University, UK) In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success. Consequently, evaluation of these systems is notoriously hard. In this paper, we review current evaluation methods, focusing on automatic metrics. We conclude that turn-based metrics often ignore the context and do not account for the fact that several replies are valid, while end-of-dialogue rewards are mainly hand-crafted. Both lack grounding in human perceptions. ![]() ![]() |
|
Riou, Matthieu |
![]() Matthieu Riou, Bassam Jabaian, Stéphane Huet, Thierry Chaminade, and Fabrice Lefèvre (University of Avignon, France; Aix-Marseille University, France) In this paper, we present a brief overview of our ongoing work about artificial interactive agents and their adaptation to users. Several possibilities to introduce humorous productions in a spoken dialog system are investigated in order to enhance naturalness during social interactions between the agent and the user. We finally describe our plan on how neuroscience will help to better evaluate the proposed systems, both objectively and subjectively. ![]() ![]() |
|
Saam, Christian |
![]() Emer Gilmartin, Brendan Spillane, Maria O’Reilly, Ketong Su, Christian Saam, Benjamin R. Cowan, Nick Campbell, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) Conversation proceeds through dialogue moves or acts, and dialog act annotation can aid the design of artificial dialog. While many dialogs are task-based or instrumental, with clear goals, as in the case of a service encounter or business meeting, many are more interactional in nature, as in friendly chats or longer casual conversations. Early research on dialogue acts focussed on transactional or task-based dialogue but work is now expanding to social aspects of interaction. We review how dialog annotation schemes treat non-task elements of dialog -- greeting and leave-taking sequences in particular. We describe the collection and annotation, using the ISO Standard 24617-2 Semantic annotation framework, Part 2: Dialogue acts, of a corpus of 187 text dialogues and study the dialog acts used in greeting and leave-taking. ![]() ![]() ![]() Brendan Spillane, Emer Gilmartin, Christian Saam, Ketong Su, Benjamin R. Cowan, Séamus Lawless, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) This paper introduces ADELE, a Personalized Intelligent Compan- ion designed to engage with users through spoken dialog to help them explore topics of interest. The system will maintain a user model of information consumption habits and preferences in order to (1) personalize the user’s experience for ongoing interactions, and (2) build the user-machine relationship to model that of a friendly companion. The paper details the overall research goal, existing progress, the current focus, and the long term plan for the project. ![]() ![]() |
|
Spillane, Brendan |
![]() Emer Gilmartin, Brendan Spillane, Maria O’Reilly, Ketong Su, Christian Saam, Benjamin R. Cowan, Nick Campbell, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) Conversation proceeds through dialogue moves or acts, and dialog act annotation can aid the design of artificial dialog. While many dialogs are task-based or instrumental, with clear goals, as in the case of a service encounter or business meeting, many are more interactional in nature, as in friendly chats or longer casual conversations. Early research on dialogue acts focussed on transactional or task-based dialogue but work is now expanding to social aspects of interaction. We review how dialog annotation schemes treat non-task elements of dialog -- greeting and leave-taking sequences in particular. We describe the collection and annotation, using the ISO Standard 24617-2 Semantic annotation framework, Part 2: Dialogue acts, of a corpus of 187 text dialogues and study the dialog acts used in greeting and leave-taking. ![]() ![]() ![]() Brendan Spillane, Emer Gilmartin, Christian Saam, Ketong Su, Benjamin R. Cowan, Séamus Lawless, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) This paper introduces ADELE, a Personalized Intelligent Compan- ion designed to engage with users through spoken dialog to help them explore topics of interest. The system will maintain a user model of information consumption habits and preferences in order to (1) personalize the user’s experience for ongoing interactions, and (2) build the user-machine relationship to model that of a friendly companion. The paper details the overall research goal, existing progress, the current focus, and the long term plan for the project. ![]() ![]() |
|
Su, Ketong |
![]() Emer Gilmartin, Brendan Spillane, Maria O’Reilly, Ketong Su, Christian Saam, Benjamin R. Cowan, Nick Campbell, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) Conversation proceeds through dialogue moves or acts, and dialog act annotation can aid the design of artificial dialog. While many dialogs are task-based or instrumental, with clear goals, as in the case of a service encounter or business meeting, many are more interactional in nature, as in friendly chats or longer casual conversations. Early research on dialogue acts focussed on transactional or task-based dialogue but work is now expanding to social aspects of interaction. We review how dialog annotation schemes treat non-task elements of dialog -- greeting and leave-taking sequences in particular. We describe the collection and annotation, using the ISO Standard 24617-2 Semantic annotation framework, Part 2: Dialogue acts, of a corpus of 187 text dialogues and study the dialog acts used in greeting and leave-taking. ![]() ![]() ![]() Emer Gilmartin, Marine Collery, Ketong Su, Yuyun Huang, Christy Elias, Benjamin R. Cowan, and Nick Campbell (Trinity College Dublin, Ireland; Grenoble INP, France; University College Dublin, Ireland) Social or interactive talk differs from task-based or instrumental interactions in many ways. Quantitative knowledge of these differences will aid the design of convincing human-machine interfaces for applications requiring machines to take on roles including social companions, healthcare providers, or tutors. We briefly review accounts of social talk from the literature. We outline a three part data collection of human-human, human-woz and human-machine dialogs incorporating light social talk and a guessing game. We finally describe our ongoing experiments on the corpus collected. ![]() ![]() ![]() Brendan Spillane, Emer Gilmartin, Christian Saam, Ketong Su, Benjamin R. Cowan, Séamus Lawless, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) This paper introduces ADELE, a Personalized Intelligent Compan- ion designed to engage with users through spoken dialog to help them explore topics of interest. The system will maintain a user model of information consumption habits and preferences in order to (1) personalize the user’s experience for ongoing interactions, and (2) build the user-machine relationship to model that of a friendly companion. The paper details the overall research goal, existing progress, the current focus, and the long term plan for the project. ![]() ![]() |
|
Szekely, Eva |
![]() Catharine Oertel, Patrik Jonell, Kevin El Haddad, Eva Szekely, and Joakim Gustafson (KTH, Sweden; University of Mons, Belgium) In this paper we are describing how audio-visual corpora recordings using crowd-sourcing techniques can be used for the audio-visual synthesis of attitudional non-verbal feedback expressions for virtual agents. We are discussing the limitations of this approach as well as where we see the opportunities for this technology. ![]() ![]() |
|
Tian, Leimin |
![]() Leimin Tian, Johanna D. Moore, and Catherine Lai (University of Edinburgh, UK) Emotions play a vital role in human communications. Therefore, it is desirable for virtual agent dialogue systems to recognize and react to user's emotions. However, current automatic emotion recognizers have limited performance compared to humans. Our work attempts to improve performance of recognizing emotions in spoken dialogue by identifying dialogue cues predictive of emotions, and by building multimodal recognition models with a knowledge-inspired hierarchy. We conduct experiments on both spontaneous and acted dialogue data to study the efficacy of the proposed approaches. Our results show that including prior knowledge on emotions in dialogue in either the feature representation or the model structure is beneficial for automatic emotion recognition. ![]() ![]() |
|
Trabelsi, Rim |
![]() Rim Trabelsi, Jagannadan Varadarajan, Yong Pei, Le Zhang, Issam Jabri, Ammar Bouallegue, and Pierre Moulin (Advanced Digital Sciences Center, Singapore; SAP, Singapore; Al Yamamah University, Saudi Arabia; National Engineering School of Tunis, Tunisia; University of Illinois at Urbana-Champaign, USA) This paper addresses the issue of analyzing social interactions between humans in videos. We focus on recognizing dyadic human interactions through multi-modal data, specifically, depth, color and skeleton sequences. Firstly, we introduce a new person-centric proxemic descriptor, named PROF, extracted from skeleton data able to incorporate intrinsic and extrinsic distances between two interacting persons in a view-variant scheme. Then, a novel key frame selection approach is introduced to identify salient instants of the interaction sequence based on the joint energy. From RGBD videos, more holistic CNN features are extracted by applying an adaptive pre-trained CNNs on optical flow frames. Features from three modalities are combined then classified using linear SVM. Finally, extensive experiments have been carried on two multi-modal and multi-view interactions datasets prove the robustness of the introduced approach comparing to state-of-the-art methods. ![]() ![]() |
|
Ultes, Stefan |
![]() Louisa Pragst, Juliana Miehle, Wolfgang Minker, and Stefan Ultes (University of Ulm, Germany; Cambridge University, UK) Access to health care related information can be vital and should be easily accessible. However, immigrants often have difficulties to obtain the relevant information due to language barriers and cultural differences. In the KRISTINA project, we address those difficulties by creating a socially competent multimodal dialogue system that can assist immigrants in getting information about health care related questions. Dialogue management, as core component responsible for the system behaviour, has a significant impact on the successful reception of such a system. Hence, this work presents the specific challenges of the KRISTINA project to adaptive dialogue management, namely the handling of a large dialogue domain and the cultural adaptability required by the envisioned dialogue system, and our approach to handling them. ![]() ![]() |
|
Varadarajan, Jagannadan |
![]() Rim Trabelsi, Jagannadan Varadarajan, Yong Pei, Le Zhang, Issam Jabri, Ammar Bouallegue, and Pierre Moulin (Advanced Digital Sciences Center, Singapore; SAP, Singapore; Al Yamamah University, Saudi Arabia; National Engineering School of Tunis, Tunisia; University of Illinois at Urbana-Champaign, USA) This paper addresses the issue of analyzing social interactions between humans in videos. We focus on recognizing dyadic human interactions through multi-modal data, specifically, depth, color and skeleton sequences. Firstly, we introduce a new person-centric proxemic descriptor, named PROF, extracted from skeleton data able to incorporate intrinsic and extrinsic distances between two interacting persons in a view-variant scheme. Then, a novel key frame selection approach is introduced to identify salient instants of the interaction sequence based on the joint energy. From RGBD videos, more holistic CNN features are extracted by applying an adaptive pre-trained CNNs on optical flow frames. Features from three modalities are combined then classified using linear SVM. Finally, extensive experiments have been carried on two multi-modal and multi-view interactions datasets prove the robustness of the introduced approach comparing to state-of-the-art methods. ![]() ![]() |
|
Vinciarelli, Alessandro |
![]() Alessandro Vinciarelli (University of Glasgow, UK) Cognitive and psychological processes underlying social interaction are built around face-to-face interactions, the only possible and available communication setting during the long evolutionary process that has resulted into Homo Sapiens. As the fraction of interactions that take place in technology mediated settings keeps increasing, it is important to investigate how the cognitive and psychological processes mentioned above - ultimately grounded into neural structures - act in and react to the new interaction settings. In particular, it is important to investigate whether nonverbal communication - one of the main channels through which people convey socially and psychologically relevant information - still plays a role in settings where natural nonverbal cues (facial expressions, vocalisations, gestures, etc.) are no longer available. Addressing such an issue has important implications not only for what concerns the understanding of cognition and psychology, but also for what concerns the design of interaction technology and the analysis of phenomena like cyberbullyism and viral diffusion of content that play an important role in nowadays society. ![]() ![]() |
|
Wade, Vincent |
![]() Emer Gilmartin, Brendan Spillane, Maria O’Reilly, Ketong Su, Christian Saam, Benjamin R. Cowan, Nick Campbell, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) Conversation proceeds through dialogue moves or acts, and dialog act annotation can aid the design of artificial dialog. While many dialogs are task-based or instrumental, with clear goals, as in the case of a service encounter or business meeting, many are more interactional in nature, as in friendly chats or longer casual conversations. Early research on dialogue acts focussed on transactional or task-based dialogue but work is now expanding to social aspects of interaction. We review how dialog annotation schemes treat non-task elements of dialog -- greeting and leave-taking sequences in particular. We describe the collection and annotation, using the ISO Standard 24617-2 Semantic annotation framework, Part 2: Dialogue acts, of a corpus of 187 text dialogues and study the dialog acts used in greeting and leave-taking. ![]() ![]() ![]() Brendan Spillane, Emer Gilmartin, Christian Saam, Ketong Su, Benjamin R. Cowan, Séamus Lawless, and Vincent Wade (Trinity College Dublin, Ireland; University College Dublin, Ireland) This paper introduces ADELE, a Personalized Intelligent Compan- ion designed to engage with users through spoken dialog to help them explore topics of interest. The system will maintain a user model of information consumption habits and preferences in order to (1) personalize the user’s experience for ongoing interactions, and (2) build the user-machine relationship to model that of a friendly companion. The paper details the overall research goal, existing progress, the current focus, and the long term plan for the project. ![]() ![]() |
|
Zhang, Le |
![]() Rim Trabelsi, Jagannadan Varadarajan, Yong Pei, Le Zhang, Issam Jabri, Ammar Bouallegue, and Pierre Moulin (Advanced Digital Sciences Center, Singapore; SAP, Singapore; Al Yamamah University, Saudi Arabia; National Engineering School of Tunis, Tunisia; University of Illinois at Urbana-Champaign, USA) This paper addresses the issue of analyzing social interactions between humans in videos. We focus on recognizing dyadic human interactions through multi-modal data, specifically, depth, color and skeleton sequences. Firstly, we introduce a new person-centric proxemic descriptor, named PROF, extracted from skeleton data able to incorporate intrinsic and extrinsic distances between two interacting persons in a view-variant scheme. Then, a novel key frame selection approach is introduced to identify salient instants of the interaction sequence based on the joint energy. From RGBD videos, more holistic CNN features are extracted by applying an adaptive pre-trained CNNs on optical flow frames. Features from three modalities are combined then classified using linear SVM. Finally, extensive experiments have been carried on two multi-modal and multi-view interactions datasets prove the robustness of the introduced approach comparing to state-of-the-art methods. ![]() ![]() |
71 authors
proc time: 16.17