Machine Perception of Interactivity in Videos
Author | : Shuo Chen |
Promotor(s) | : Prof.dr. C.G.M. Snoek / Dr. P.S.M. Mettes |
University | : University of Amsterdam |
Year of publication | : 2023 |
Link to repository | : Link to thesis |
Abstract
This thesis explores research into machine perception and understanding of interactivity in video, which plays a significant role in numerous applications, including surveillance systems, elder care, wildlife protection, and robotics. The work begins by discussing the concept of interactivity. Tthe study then delves into how machines can perceive interactivity in videos by learning rich context information from these videos. The process of formalizing the interactivity in videos is discussed, highlighting the use of semantic triplet structures. These structures include the subject performing the interactivity, the interactivity itself (predicate), and the object being interacted upon. The primary research question in this thesis resolves around automating the perception of interactivity in video, with several research questions focusing on defining, recognizing, analyzing the challenges, and detecting rare interactivities: How can machines accurately define the temporal and spatial boundaries of interactivity? How can we improve the capabilities of machines to recognize and analyze the complexities of human interactivity? What aspects make recognizing interactivity in video challenging? How can rare interactivities, which occur infrequently but are nonetheless important, be effectively detected? The thesis aims to tackle these challenging sub-problems to understanding when, where, and what specific interactivities occur in a video. This comprehensive study strives to provide valuable insights into the machine perception and understanding of interactivity in videos.