Advanced School for Computing and Imaging (ASCI)

ASCI office
Delft University of Technology
Building 28, room 04.E120
Van Mourik Broekmanweg 6
2628 XE – DELFT, The Netherlands

P: +31 15 27 88032

Visiting hours office
Monday, Tuesday, Thursday: 10:00 – 15:00


The ASCI office is located at the Delft University of Technology campus.  It is easily accessible by bicycle, public transport and car. The numbers of buildings can help you find your way around the campus. Make sure you remember the name and building number of your destination.

Contact us at +31 15 278 8032 or send us an email at

Machine Perception of Interactivity in Videos

Machine Perception of Interactivity in Videos

Author : Shuo Chen
Promotor(s) : Prof.dr. C.G.M. Snoek / Dr. P.S.M. Mettes
University : University of Amsterdam
Year of publication : 2023
Link to repository : Link to thesis


This thesis explores research into machine perception and understanding of interactivity in video, which plays a significant role in numerous applications, including surveillance systems, elder care, wildlife protection, and robotics. The work begins by discussing the concept of interactivity. Tthe study then delves into how machines can perceive interactivity in videos by learning rich context information from these videos. The process of formalizing the interactivity in videos is discussed, highlighting the use of semantic triplet structures. These structures include the subject performing the interactivity, the interactivity itself (predicate), and the object being interacted upon. The primary research question in this thesis resolves around automating the perception of interactivity in video, with several research questions focusing on defining, recognizing, analyzing the challenges, and detecting rare interactivities: How can machines accurately define the temporal and spatial boundaries of interactivity? How can we improve the capabilities of machines to recognize and analyze the complexities of human interactivity? What aspects make recognizing interactivity in video challenging? How can rare interactivities, which occur infrequently but are nonetheless important, be effectively detected? The thesis aims to tackle these challenging sub-problems to understanding when, where, and what specific interactivities occur in a video. This comprehensive study strives to provide valuable insights into the machine perception and understanding of interactivity in videos.