Processing-in-Memory for Data-Intensive Applications, From Device to Algorithm

Angizi, Shaahin

Over the past decades, the amount of data required to be processed and analyzed by computing systems has been increasing dramatically to exascale (10^18 bytes/s or ops). However, modern computing platforms' inability to deliver both energy-efficient and high-performance computing solutions…

Over the past decades, the amount of data required to be processed and analyzed by computing systems has been increasing dramatically to exascale (10^18 bytes/s or ops). However, modern computing platforms' inability to deliver both energy-efficient and high-performance computing solutions leads to a gap between meets and needs, especially in resource-constraint Internet of Things (IoT) devices. Unfortunately, such a gap will keep widening mainly due to limitations in both devices and architectures. With this motivation, this dissertation's focus is on cross-layer (device/circuit/architecture/application) co-design of energy-efficient and high-performance Processing-in-Memory (PIM) platforms for implementing complex big data applications, i.e., deep learning, bioinformatics, graph processing tasks, and data encryption. The dissertation shows how to leverage innovations from device, circuit, and architecture to integrate memory and logic to break the existing memory and power walls and dramatically increase computing efficiency of today’s non-Von-Neumann computing systems.The proposed PIM platforms transform current volatile and non-volatile random access memory arrays to computational units capable of working as both memory and low-area-overhead, massively parallel, fast, reconfigurable in-memory logic. Instead of integrating complex logic units in cost-sensitive memory, the explored designs exploit hardware-friendly bit-line computing methods to implement complete Boolean logic functions between operands within a memory array in a reduced clock cycle, overcoming the multi-cycle logic issue in modern PIM platforms. Besides, new customized in-memory algorithms and mapping methods are developed to convert the crucial iteratively-used big data application's functions to bit-wise PIM-supported logic. To quantitatively analyze the performance of various PIM platforms running big data applications, a generic and comprehensive evaluation framework is presented. The overall system computing performance (throughput, latency, energy efficiency) for each application is explored through the developed framework. The device-to-algorithm co-simulation results on neural network acceleration demonstrate that the proposed platforms can obtain 36.8× higher energy-efficiency and 22× speed-up compared to state-of-the-art Graphics Processing Unit (GPU). In accelerating bioinformatics tasks such as biological sequence alignment, the presented PIM designs result in ~2×, 43.8×, 458× more throughput per Watt compared to state-of-the-art Application-Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA), and GPU platforms, respectively.

Copyright Statement