Close

Advanced School for Computing and Imaging (ASCI)

ASCI office
Delft University of Technology
Building 28, room E.3080
Van Mourik Broekmanweg 6
2628 XE – DELFT, The Netherlands

E: asci-office@tudelft.nl

Directions

The ASCI office is located at the Delft University of Technology campus.  It is easily accessible by bicycle, public transport and car. The numbers of buildings can help you find your way around the campus. Make sure you remember the name and building number of your destination.

Contact us at +31 15 278 8032 or send us an email at asci-office@tudelft.nl

Distributed DNN Inference at the Edge

Distributed DNN Inference at the Edge

Author : Xiaotian Guo
Promotor(s) : Prof.dr. A.D. Pimentel / Dr. T.P Stefanov
University : LIACS
Year of publication : 2025
Link to repository : Link to thesis

Abstract

As deep neural networks (DNNs) grow increasingly complex, their computational demands often surpass the capacity of edge devices, which typically possess limited resources. This thesis explores strategies for efficient and robust deployment of large DNNs on resource-constrained edge environments, where “edge” refers to devices located along the data path between sources and the cloud. Deploying DNNs at the edge offers advantages such as enhanced privacy, efficiency, and reliability but presents challenges due to the constrained resources at the Edge.

The thesis is divided into two parts. The first part addresses the challenge of optimal partitioning and deployment of DNNs over multiple resource-constrained edge devices. The AutoDiCE framework automates model partitioning, code generation, and communication optimization across devices, while a Design Space Exploration (DSE) technique identifies optimal distribution strategies to minimize energy and memory usage while maximizing system inference throughput.

The second part focuses on enhancing system robustness against potential device failures or connectivity issues. RobustDiCE ensures distributed inference accuracy by prioritizing critical neurons and partially replicating them across devices, maintaining functionality even in failure scenarios. Additionally, EASTER, a similar partitioning method for large language models, balances resource utilization and robustness.

Overall, this thesis presents innovative solutions for efficient and fault-tolerant DNN deployment at the edge, optimizing resource utilization and ensuring reliable operation. The proposed methods advance the adoption of distributed edge AI in resource-constrained environments.