Multi-Person Multimodal Playful Interaction Analysis using Automated Approaches
Author | : Metehan Doyran |
Promotor(s) | : Prof.dr.A.A. Salah / Dr. R.W. Poppe |
University | : Utrecht University |
Year of publication | : 2025 |
Link to repository | : Link tot thesis |
Abstract
- “Play and playful interactions include rich non-verbal social and emotional signals, and they are essential tools for researchers and psychologists to evaluate children’s cognitive and emotional development. Play creates an escape from ordinary life while stimulating various affective states. Play and playful interactions provide a rich setting for studying non-verbal communication, emotional expression, and social behavior. On the other hand, the need for objective, continuous, and scalable assessment of complex social and affective dynamics has driven the advancement of automated behavior analysis. This thesis explores how automated multimodal analysis methods can enhance understanding and evaluation of social dynamics and emotional responses in multi-person playful interactions across different scenarios. We focus on three critical multi-person playful interaction scenarios: (i) multiplayer board games, (ii) play therapy for children, and (iii) parent-infant free play. These scenarios encompass a wide variety of playful interactions, each with its own unique challenges and social dynamics.The thesis is structured around four pivotal research questions that, when combined, will help us answer our main research question. First, we explore how multimodal assessment can evaluate perceived affect of people during multi-party playful interactions. We utilize deep neural networks with automatically extracted and handcrafted facial and bodily features of each of the four players playing cooperative board games. The proposed system for detecting expressive moments achieves notable performance, whereas predicting emotions proves to be more difficult with lower test scores. Second, we propose an automated multimodal system to predict children’s emotional states during play therapy interactions with therapists by combining computer vision and natural language processing methods. The results show the potential of these systems to aid the assessment process of the traditional play therapy intervention by providing continuous and objective output for the children’s affective states. Third, we demonstrate the feasibility of using convolutional neural networks to detect physical contact between parents and infants during free-play interactions. Lastly, we develop and compare representations for fine-grained contact analysis in parent-infant playful interactions. Our results highlight the potential of such automated analysis while underscoring the challenges in capturing detailed contact dynamics.Overall, this thesis explores integrating automatic multimodal approaches to three different playful interaction scenarios. The discussion of these challenges and the limitations of the current approaches should be beneficial for future researchers. All the codes, models, datasets, and annotations are made publicly available to speed up the research on using automated systems for analyzing multi-person playful interactions. The proposed systems offer potential improvements in various scenarios, from expressive moment detection to the fine-grained analysis of physical contact, by providing objective, continuous, and scalable assessments. The findings underscore the importance of combining multiple data sources and the need for further refinement in capturing subjective emotional nuances and intricate physical dynamics between parents and infants to improve the accuracy and reliability of these technologies in complex, real-world scenarios.”
