Full metadata
Title
A Study on Resources Utilization of Deep Learning Workloads
Description
Deep learning and AI have grabbed tremendous attention in the last decade. The substantial accuracy improvement by neural networks in common tasks such as image classification and speech recognition has made deep learning as a replacement for many conventional machine learning techniques. Training Deep Neural networks require a lot of data, and therefore vast of amounts of computing resources to process the data and train the model for the neural network. The most obvious solution to solving this problem is to speed up the time it takes to train Deep Neural networks.
AI and deep learning workloads are different from the conventional cloud and mobile workloads, with respect to: (1) Computational Intensity, (2) I/O characteristics, and (3) communication pattern. While there is a considerable amount of research activity on the theoretical aspects of AI and Deep Learning algorithms that run with greater efficiency, there are only a few studies on the infrastructural impact of Deep Learning workloads on computing and storage resources in distributed systems.
It is typical to utilize a heterogeneous mixture of CPU and GPU devices to perform training on a neural network. Google Brain has a developed a reinforcement model that can place training operations across a heterogeneous cluster. Though it has only been tested with local devices in a single cluster. This study will explore the method’s capabilities and attempt to apply this method on a cluster with nodes across a network.
AI and deep learning workloads are different from the conventional cloud and mobile workloads, with respect to: (1) Computational Intensity, (2) I/O characteristics, and (3) communication pattern. While there is a considerable amount of research activity on the theoretical aspects of AI and Deep Learning algorithms that run with greater efficiency, there are only a few studies on the infrastructural impact of Deep Learning workloads on computing and storage resources in distributed systems.
It is typical to utilize a heterogeneous mixture of CPU and GPU devices to perform training on a neural network. Google Brain has a developed a reinforcement model that can place training operations across a heterogeneous cluster. Though it has only been tested with local devices in a single cluster. This study will explore the method’s capabilities and attempt to apply this method on a cluster with nodes across a network.
Date Created
2019-05
Contributors
- Nguyen, Andrew Hoang (Author)
- Zhao, Ming (Thesis director)
- Biookaghazadeh, Saman (Committee member)
- Computer Science and Engineering Program (Contributor)
- Barrett, The Honors College (Contributor)
Topical Subject
Resource Type
Extent
7 pages
Language
eng
Copyright Statement
In Copyright
Primary Member of
Series
Academic Year 2018-2019
Handle
https://hdl.handle.net/2286/R.I.52597
Level of coding
minimal
Cataloging Standards
System Created
- 2019-04-18 12:00:03
System Modified
- 2021-08-11 04:09:57
- 3 years 5 months ago
Additional Formats