Close

Advanced School for Computing and Imaging (ASCI)

ASCI office
Delft University of Technology
Building 28, room 04.E120
Van Mourik Broekmanweg 6
2628 XE – DELFT, The Netherlands

E: asci-office@tudelft.nl
P: +31 15 27 88032

Visiting hours office
Monday, Tuesday, Thursday: 10:00 – 15:00

Directions

The ASCI office is located at the Delft University of Technology campus.  It is easily accessible by bicycle, public transport and car. The numbers of buildings can help you find your way around the campus. Make sure you remember the name and building number of your destination.

Contact us at +31 15 278 8032 or send us an email at asci-office@tudelft.nl

Less machine (=) More vision

Less machine (=) More vision

Author : Amogh Gudi
Promotor(s) : Prof. dr. ir. M.J.T. Reinders / Dr. J.C. van Gemert
University : Delft University of Technology
Year of publication : 2022
Link to repository : TU Delft Research Repository

Abstract

Machines that interact with humans can do so better if they can also visually understand us, but they have limited resources to do so. The main topic of this dissertation is contrasting the use of resources by machine vision systems against the accuracy obtained by them. This thesis focuses on reducing the need for data, memory, and computation in real-world machine vision systems, applied to human observation and face analysis.

This dissertation tackles annotation effort by exploring how weakly-supervised object /person detectors can be improved. Findings show that prior knowledge about objects’ bounds in images helps the detector learn the spatial extent of objects using only weak image-level labels. The proposed implementation enables single-shot detection, thus improving computational efficiency of this data-efficient method.

The thesis also demonstrates how prior knowledge about eye locations can be used to reduce the computational burden of gaze tracking: non-vital parts of the input image can be discarded without losing accuracy. Additionally, the thesis finds how a priori known geometrical relations can be exploited to project gaze onto a screen with little human annotation effort.

Findings of this dissertation further suggest that spatial structures in images can be exploited for improving efficiency of vision tasks. The proposed solution allows for learning detection of facial occlusions and anomalies from only a few examples. Results also indicate that this solution can be used as a loss function for unsupervised pre-training of neural networks when resources are constrained.

Lastly, this thesis showcases how prior know-how about blood-flow physiology in faces can be applied in a camera-based vital signs estimator. Even when data is available, this hand-crafted method performs better than deep learning methods — both in terms of accuracy and efficiency. At the same time, the results also reveal the pitfalls of assumptions made in the prior knowledge when exposed to more complex tasks — such as video compression noise filtering.

Through its common theme of incorporating prior knowledge, this dissertation brings attention to the costs incurred by machine vision systems to achieve high accuracy