Computation Offloading of Machine Learning Workloads at the Edge

Document
Description
The number of IoT (Internet of Things) devices that will be deployed by the year 2030 is expected to exceed 125 billion. Such large volumes will be possible only if their design and maintenance costs are kept to an absolute

The number of IoT (Internet of Things) devices that will be deployed by the year 2030 is expected to exceed 125 billion. Such large volumes will be possible only if their design and maintenance costs are kept to an absolute minimum. For these reasons, IoT devices will have to be designed using mostly commercial-off-the-shelf (COTS) processors, whose performance and energy efficiency would be much less than an equivalent custom device. Compounding this situation is the increasing demand to utilize these devices for performing real-time data analysis and decision making, which require algorithms with substantial computation complexity. Offloading computations to cloud servers is now incurring increasing delays and faces privacy and security concerns. Recognizing these issues, a new computing paradigm, called edge computing, is emerging. It involves a user-end device which is the first recipient of the data and a collection of local or physically nearby systems that have significantly more computational and storage capacity (referred to as cloudlets). The mode of operation involves the user-end device sharing the computation with the cloudlet in a way that minimizes the energy consumption of the user-end device and/or minimizes the latency of the computation. This dissertation provides an optimization framework to partition the machine learning workloads between the user-end device and the cloudlet with the goal of energy and performance improvement. First, a method is presented to partition the layers of a deep neural network (DNN) between the user-end device and the cloudlet to minimize the energy consumption of the user-end device considering stochastic communication delays. Second, an energy minimization method is discussed that partitions the network of DNNs between devices considering the parallel execution of DNNs. Third, a delay-constrained energy minimization technique is presented to partition the network of DNNs between the devices.