ICMI 2016 – Proceedings

The Influence of Appearance and Interaction Strategy of a Social Robot on the Feeling of Uncanniness in Humans
Maike Paetzel
(Uppsala University, Sweden)
Most research on the uncanny valley effect is concerned with the influence of unimodal visual cues in the dimensions human-likeness and realism as a trigger of an uncanny feeling in humans. This leads to a lack of investigation how multimodality affects the feeling of uncanniness. In our research project, we use the back-projected robot head Furhat to study the influence of multimodal cues in facial texture, expressions, voice and behaviour to broaden the understanding of the underlying cause of the uncanny valley effect. Up to date, we mainly investigated the general perception of uncanniness in a back-projected head, with a special focus on multimodal gender cues. In the upcoming years, the focus shall shift towards interaction strategies of social robots and their interplay with the robot's appearance.

Viewing Support System for Multi-view Videos
Xueting Wang
(Nagoya University, Japan)
Multi-view videos taken by multiple cameras from different angles are expected to be useful in a wide range of applications, such as web lecture broadcasting, concerts and sports viewing, etc. These videos can enhancing viewing experience of users' personal preference through means of virtual camera switching and controlling viewing interfaces. However, the increasing number of cameras burdens even experts on suitable viewpoint selection. Thus, my doctoral research goal is to construct a system providing convenient and high quality viewing support for personal multi-view video viewing. We intend to include 3 parts: automatic viewpoint sequence recommendation, multimodal user feedback analysis, and on-line recommendation updating. Prior works focused on automatic viewpoint sequence recommending considering contextual information and user preference. We proposed a context-dependent recommending model and improved by considering the spatio-temporal contextual information. Further work will concentrate on analyzing multimodal user feedback while viewing recommendations to detect the unsatisfactory timing and model the user preference of viewpoint switching. The switching records and multimodal feedback can be used for on-line recommendation updating to improve the personal viewing support.

Engaging Children with Autism in a Shape Perception Task using a Haptic Force Feedback Interface
Alix Pérusseau-Lambert
(CEA LIST, France)
Atypical sensori-motor reactions and social interaction difficulties are commonly reported in Autism Spectrum Disorders (ASD). In our work, we consider the possible relationship between using our sense of touch to discover our environment and the development of interaction competencies, in the particular case of ASD. For this purpose we will use a task of shape perception, and a haptic feedback device. The design of the haptic device will be based on the State of the Art and advices from experts in ASD. We plan to work on the needs and skills of children on the spectrum in the lower range of intelligence scores. This article details the development of our PhD project, which aims at designing efficient and reliable haptic force feedback interfaces and interactions for ASD user. Preliminary results on the development of our interface and the design of the interaction tasks are presented as well as our specific contributions.

Modeling User's Decision Process through Gaze Behavior
Kei Shimonishi
(Kyoto University, Japan)
When we choose items among alternatives, we sometimes face a problem of mismatch between what we actually want and selected items. Therefore, if an interactive system can probe our interests from several modalities (e.g, eye movements and speech recognition) and decrease these mismatch, the system can be helpful for decision making with a satisfaction. In order to build such interactive decision support systems, the systems need to estimate both users' interests (selection criteria for that decision) and users' knowledge about the content domain. Here, not only users' knowledge but also users' selection criteria can be changed; for example, users' selection criteria converge as a reaction to system's recommendation. Therefore, the system needs to understand the dynamics of users' selection criteria in order to choose appropriate actions. What makes more difficult is that the dynamics of users' internal states themselves can change depend on a phase of decision making. Therefore, we need to address (a) how to represent users' internal states, (b) how to estimate and trace temporal changes of users' internal state, and (c) how to trace users' decision phase so that system can decide actions for decision assistance. In order to tackle these problems, we propose a novel representation of users' internal state, which consists of selection criteria and structures of users' knowledge about the content domain and a method to estimate these selection criteria from gaze behavior. In addition, we consider the multiscale dynamics of users' internal states so that the system can trace users' decision phase as temporal changes of dynamics of users' internal states.

Multimodal Positive Computing System for Public Speaking with Real-Time Feedback
Fiona Dermody
(Dublin City University, Ireland)
A multimodal system with real-time feedback for public speaking has been developed. The system has been developed within the paradigm of positive computing which focuses on designing for user wellbeing. To date we have focused on the following determinants of wellbeing – autonomy, self-awareness and stress reduction. Two system prototypes have been developed which differ in the way they display feedback to users. One prototype displays peripheral feedback and the other displays line-of-vision feedback. Initial user evaluation of the system prototypes has yielded positive results. All users reported that they preferred the prototype with line-of-vision feedback. Users reported having autonomy in choosing what visual feedback to focus on when using the system. They also reported that they gained self-awareness as a speaker from using the system.

Prediction/Assessment of Communication Skill using Multimodal Cues in Social Interactions
Sowmya Rasipuram
(IIIT Bangalore, India)
Understanding people’s behavior in social interactions is a very interesting problem in Social Computing. In this work, we automatically predict the communication skill of a person in various kinds of social interactions. We consider in particular, 1) Interview-based interactions - asynchronous interviews (web-based interview) Vs. synchronous interviews (regular face-to-face interviews) and 2) Non-interview based interactions - dyad and triad conversations (group discussions). We automatically extract multimodal cues related to verbal and non-verbal behavior content of the interaction. First, in interview-based interactions, we consider previously uninvestigated scenarios of comparing the participant’s behavioral and perceptual changes in the two contexts. Second, we address different manifestations of communication skill in different settings (face-to-face interaction vs. group). Third, the non-interview based interactions also leads to answer research questions such as “the relation between a good communicator and other group variables like dominance or leadership” Finally we look at several attributes (manually annotated) and features/feature groups (automatically extracted) that predicts communication skill well in all settings.

Player/Avatar Body Relations in Multimodal Augmented Reality Games
Nina Rosa
(Utrecht University, Netherlands)
Augmented reality research is finally moving towards multimodal experiences: more and more applications do not only include visuals, but also audio and even haptics. The purpose of multimodality in these applications can be to increase realism or to increase the amount or quality of communicated information. One particularly interesting and increasingly important application area is AR gaming, where the player can experience the virtual game integrated into the real environment and interact with it in a multimodal fashion. Currently, many games are set up such that the interaction is local (direct), however there are many cases in which remote (indirect) interaction will be useful or even necessary. In the latter case, the actions can be expressed through a virtual avatar, while the player's real body is also still perceivably present. The player then controls the motions and actions of the avatar, and receives multimodal feedback associated to the events occurring in the game. Can it be that the player starts to perceive the avatar as a (part of) him- or herself? Or does something even more intense take place? What are the benefits of this experience? The core of this research is to understand how multimodal perceptual configuration plays a role in the relation between a player and their in-game avatar.

Computational Model for Interpersonal Attitude Expression
Soumia Dermouche
(CNRS, France; Telecom ParisTech, France)
This paper presents a plan towards a computational model of interpersonal attitudes and its integration in an embodied conversational agent (ECA). The goal is to endow an ECA with the capacity to express different interpersonal attitudes depending on the interaction context. Interpersonal attitudes can be represented by sequences of non-verbal behaviors. In our work, we rely on temporal sequence mining algorithms to extract, from a multimodal corpus, a set of temporal patterns representing interpersonal attitudes. Specifically, we propose a new temporal sequence mining algorithm called HCApriori and we evaluate it against four state-of-the-art algorithms. Results show a significant improvement of HCApriori over the other algorithms in terms of both pattern extraction accuracy and running time. The next step is to implement the temporal patterns extracted with HCApriori on an ECA.

Assessing Symptoms of Excessive SNS Usage Based on User Behavior and Emotion
Ploypailin Intapong, Tipporn Laohakangvalvit, Tiranee Achalakul, and Michiko Ohkura
(Shibaura Institute of Technology, Japan; King Mongkut’s University of Technology Thonburi, Thailand)
The worldwide use of social networking sites (SNSs) continues to dramatically increase. People are spending unexpected and unprecedented amounts of time online. However, many studies have issued warnings about the negative consequences of excessive SNS usage, including the risk of addictive behavior. This research is conducted to detect the symptoms of excessive SNS use by studying user behaviors and emotions in SNSs. We employed questionnaires, SNS APIs, and biological signals as methods. The data obtained from the study will characterize SNS usage to detect excessive use. Finally, the analytic results will be applied for developing prevention strategies to increase the awareness of the risks of excessive SNS usage.

Kawaii Feeling Estimation by Product Attributes and Biological Signals
Tipporn Laohakangvalvit, Tiranee Achalakul, and Michiko Ohkura
(Shibaura Institute of Technology, Japan; King Mongkut’s University of Technology Thonburi, Thailand)
Kansei values are critical factors in manufacturing in Japan. As one kansei value, kawaii, which is a positive adjective that denotes such positive connotations as cute, lovable, and charming, is becoming more important. Our research systematically studies kawaii feelings by eye tracking and biological signals. We will use our results to construct a mathematical model of kawaii feelings that can be applied to the future design and development of kawaii products.

Multimodal Sensing of Affect Intensity
Shalini Bhatia
(University of Canberra, Australia)
Most research on affect intensity has relied on the Affect Intensity Measure (AIM) of self-report that asks respondents to rate how often they react to situations with strong emotions. The AIM gives an indication of how strongly or weakly individuals tend to experience emotions in their everyday life. In this PhD project, I plan to quantify the affect intensity on a continuous scale using multiple modalities of video and audio on real-world, clinically validated depression datasets. Most of the work in this area treats the problem as a binary classification problem, mainly due to the lack of dimensional data. As the depression severity of a subject increases, as seen in the case of melancholia, the facial movements become very subtle. In order to quantify depression in general, and subtypes such as melancholia in particular, we need to reveal these subtle changes. To do this, I propose to use video magnification approaches. Inspired by the success of deep learning in video classification, I plan on using deep learning for information fusion over multiple modalities, such as Convolutional Neural Networks and Long Short Term Memory Networks. Using the common approach to video classification, i.e. local feature extraction, fixed size video level description and training a classifier on the resulting bag of words representation, I present preliminary results on the classification of melancholic and non-melancholic depressed subjects and healthy controls, which will serve as a baseline for future development in depression classification and analysis. I have also compared the sensitivity and specificity of classification in depression sub-types.

Enriching Student Learning Experience using Augmented Reality and Smart Learning Objects
Anmol Srivastava
(IIT Guwahati, India)
Physical laboratories in Electronic Engineering curriculum play a crucial role in enabling students to gain "hands-on" learning experience to get a feel for problem-solving. However, students often feel frustrated in these laboratories due to procedural difficulties and disconnects that exist between theory and practice. This impedes their learning and causes them to lose interest in the practical experiment. This research considers the approach of ubiquitous computing to address this issue by embedding computational capabilities into commonly used physical objects in electronics lab (e.g. breadboard) and making use of mobile Augmented Reality application to assist students. Two working prototypes have been proposed as a proof-of-concept. These are (i) an AR based lab manual and circuit building application, and, (ii) Intelligent Breadboard - which is capable of sensing errors made by students. It is posited that such systems can help reduce cognitive load and bridge gaps between theory and practical applications that students face in laboratories.

Automated Recognition of Facial Expressions Authenticity
Krystian Radlak and Bogdan Smolka
(Silesian University of Technology, Poland)
Recognition of facial expressions authenticity is quite troublesome for humans. Therefore, it is an interesting topic for the computer vision community, as the developed algorithms for facial expressions authenticity estimation may be used as indicators of deception. This paper discusses the state-of-the art methods developed for smile veracity estimation and proposes a plan of development and validation of a novel approach to automated discrimination between genuine and posed facial expressions. The proposed fully automated technique is based on the extension of the high-dimensional Local Binary Patterns (LBP) to the spatio-temporal domain and combines them with the dynamics of facial landmarks movements. The proposed technique will be validated on several existing smile databases and a novel database created with the use of a high speed camera. Finally, the developed framework will be applied for the detection of deception in real life scenarios.

Improving the Generalizability of Emotion Recognition Systems: Towards Emotion Recognition in the Wild
Biqiao Zhang
(University of Michigan, USA)
Emotion recognition in the wild requires the ability to adapt to complex and changeable application scenarios, which necessitates the generalizability of automatic emotion recognition systems. My PhD thesis focuses on methods to address factors that negatively impact the generalizability of automatic emotion recognition systems, such as the ambiguity in emotion labels, the effects of expression style (e.g., speech and music), variation in recording environments, and individual differences. In particular, I propose to tease apart the influence of these factors from emotion using multi-task learning for both feature learning and emotion inference. Results from my completed works have demonstrated that classifiers that take the influence of corpus (simulating environmental differences), expression style and gender of speaker into consideration generalize better across corpus, compared to those that do not.

ICMI 2016 – Proceedings

Doctoral Consortium Sat, Nov 12, 09:00 - 17:30, Time24: Room 182 (Chair: Dirk Heylen (University of Twente); Samer Al Moubayed (KTH))

Doctoral Consortium
Sat, Nov 12, 09:00 - 17:30, Time24: Room 182 (Chair: Dirk Heylen (University of Twente); Samer Al Moubayed (KTH))