Close

Advanced School for Computing and Imaging (ASCI)

ASCI office
Delft University of Technology
Building 28, room E.3080
Van Mourik Broekmanweg 6
2628 XE – DELFT, The Netherlands

E: asci-office@tudelft.nl

Directions

The ASCI office is located at the Delft University of Technology campus.  It is easily accessible by bicycle, public transport and car. The numbers of buildings can help you find your way around the campus. Make sure you remember the name and building number of your destination.

Contact us at +31 15 278 8032 or send us an email at asci-office@tudelft.nl

Multi-Person Multimodal Playful Interaction Analysis using Automated Approaches

Multi-Person Multimodal Playful Interaction Analysis using Automated Approaches

Author : Metehan Doyran
Promotor(s) : Prof.dr.A.A. Salah / Dr. R.W. Poppe
University : Utrecht University
Year of publication : 2025
Link to repository : Link tot thesis

Abstract

  • “Play and playful interactions include rich non-verbal social and emotional signals, and they are essential tools for researchers and psychologists to evaluate children’s cognitive and emotional development. Play creates an escape from ordinary life while stimulating various affective states. Play and playful interactions provide a rich setting for studying non-verbal communication, emotional expression, and social behavior. On the other hand, the need for objective, continuous, and scalable assessment of complex social and affective dynamics has driven the advancement of automated behavior analysis. This thesis explores how automated multimodal analysis methods can enhance understanding and evaluation of social dynamics and emotional responses in multi-person playful interactions across different scenarios. We focus on three critical multi-person playful interaction scenarios: (i) multiplayer board games, (ii) play therapy for children, and (iii) parent-infant free play. These scenarios encompass a wide variety of playful interactions, each with its own unique challenges and social dynamics.The thesis is structured around four pivotal research questions that, when combined, will help us answer our main research question. First, we explore how multimodal assessment can evaluate perceived affect of people during multi-party playful interactions. We utilize deep neural networks with automatically extracted and handcrafted facial and bodily features of each of the four players playing cooperative board games. The proposed system for detecting expressive moments achieves notable performance, whereas predicting emotions proves to be more difficult with lower test scores. Second, we propose an automated multimodal system to predict children’s emotional states during play therapy interactions with therapists by combining computer vision and natural language processing methods. The results show the potential of these systems to aid the assessment process of the traditional play therapy intervention by providing continuous and objective output for the children’s affective states. Third, we demonstrate the feasibility of using convolutional neural networks to detect physical contact between parents and infants during free-play interactions. Lastly, we develop and compare representations for fine-grained contact analysis in parent-infant playful interactions. Our results highlight the potential of such automated analysis while underscoring the challenges in capturing detailed contact dynamics.Overall, this thesis explores integrating automatic multimodal approaches to three different playful interaction scenarios. The discussion of these challenges and the limitations of the current approaches should be beneficial for future researchers. All the codes, models, datasets, and annotations are made publicly available to speed up the research on using automated systems for analyzing multi-person playful interactions. The proposed systems offer potential improvements in various scenarios, from expressive moment detection to the fine-grained analysis of physical contact, by providing objective, continuous, and scalable assessments. The findings underscore the importance of combining multiple data sources and the need for further refinement in capturing subjective emotional nuances and intricate physical dynamics between parents and infants to improve the accuracy and reliability of these technologies in complex, real-world scenarios.”