Memory Characterization Testing System

132872-Thumbnail Image.png
Description
This thesis outlines the hand-held memory characterization testing system that is to be created into a PCB (printed circuit board). The circuit is designed to apply voltages diagonally through a RRAM cell (32x32 memory array). The purpose of this swee

This thesis outlines the hand-held memory characterization testing system that is to be created into a PCB (printed circuit board). The circuit is designed to apply voltages diagonally through a RRAM cell (32x32 memory array). The purpose of this sweep across the RRAM is to measure and calculate the high and low resistance state value over a specified amount of testing cycles. With each cell having a unique output of high and low resistance states a unique characterization of each RRAM cell is able to be developed. Once the memory is characterized, the specific RRAM cell that was tested is then able to be used in a varying amount of applications for different things based on its uniqueness. Due to an inability to procure a packaged RRAM cell, a Mock-RRAM was instead designed in order to emulate the same behavior found in a RRAM cell.
The final testing circuit and Mock-RRAM are varied and complex but come together to be able to produce a measured value of the high resistance and low resistance state. This is done by the Arduino autonomously digitizing the anode voltage, cathode voltage, and output voltage. A ramp voltage that sweeps from 1V to -1V is applied to the Mock-RRAM acting as an input. This ramp voltage is then later defined as the anode voltage which is just one of the two nodes connected to the Mock-RRAM. The cathode voltage is defined as the other node at which the voltage drops across the Mock-RRAM. Using these three voltages as input to the Arduino, the Mock-RRAM path resistance is able to be calculated at any given point in time. Conducting many test cycles and calculating the high and low resistance values allows for a graph to be developed of the chaotic variation of resistance state values over time. This chaotic variation can then be analyzed further in the future in order to better predict trends and characterize the RRAM cell that was tested.
Furthermore, the interchangeability of many devices on the PCB allows for the testing system to do more in the future. Ports have been added to the final PCB in order to connect a packaged RRAM cell. This will allow for the characterization of a real RRAM memory cell later down the line rather than a Mock-RRAM as emulation. Due to the autonomous testing, very few human intervention is needed which makes this board a great baseline for others in the future looking to add to it and collect larger pools of data.
Date Created
2019-05
Agent

Hardware Acceleration of Deep Convolutional Neural Networks on FPGA

156845-Thumbnail Image.png
Description
The rapid improvement in computation capability has made deep convolutional neural networks (CNNs) a great success in recent years on many computer vision tasks with significantly improved accuracy. During the inference phase, many applications demand low latency processing of one

The rapid improvement in computation capability has made deep convolutional neural networks (CNNs) a great success in recent years on many computer vision tasks with significantly improved accuracy. During the inference phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency of GPU and other general-purpose platform, bringing opportunities for specific acceleration hardware, e.g. FPGA, by customizing the digital circuit specific for the deep learning algorithm inference. However, deploying CNNs on portable and embedded systems is still challenging due to large data volume, intensive computation, varying algorithm structures, and frequent memory accesses. This dissertation proposes a complete design methodology and framework to accelerate the inference process of various CNN algorithms on FPGA hardware with high performance, efficiency and flexibility.

As convolution contributes most operations in CNNs, the convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution involves multiply and accumulate (MAC) operations with four levels of loops. Without fully studying the convolution loop optimization before the hardware design phase, the resulting accelerator can hardly exploit the data reuse and manage data movement efficiently. This work overcomes these barriers by quantitatively analyzing and optimizing the design objectives (e.g. memory access) of the CNN accelerator based on multiple design variables. An efficient dataflow and hardware architecture of CNN acceleration are proposed to minimize the data communication while maximizing the resource utilization to achieve high performance.

Although great performance and efficiency can be achieved by customizing the FPGA hardware for each CNN model, significant efforts and expertise are required leading to long development time, which makes it difficult to catch up with the rapid development of CNN algorithms. In this work, we present an RTL-level CNN compiler that automatically generates customized FPGA hardware for the inference tasks of various CNNs, in order to enable high-level fast prototyping of CNNs from software to FPGA and still keep the benefits of low-level hardware optimization. First, a general-purpose library of RTL modules is developed to model different operations at each layer. The integration and dataflow of physical modules are predefined in the top-level system template and reconfigured during compilation for a given CNN algorithm. The runtime control of layer-by-layer sequential computation is managed by the proposed execution schedule so that even highly irregular and complex network topology, e.g. GoogLeNet and ResNet, can be compiled. The proposed methodology is demonstrated with various CNN algorithms, e.g. NiN, VGG, GoogLeNet and ResNet, on two different standalone FPGAs achieving state-of-the art performance.

Based on the optimized acceleration strategy, there are still a lot of design options, e.g. the degree and dimension of computation parallelism, the size of on-chip buffers, and the external memory bandwidth, which impact the utilization of computation resources and data communication efficiency, and finally affect the performance and energy consumption of the accelerator. The large design space of the accelerator makes it impractical to explore the optimal design choice during the real implementation phase. Therefore, a performance model is proposed in this work to quantitatively estimate the accelerator performance and resource utilization. By this means, the performance bottleneck and design bound can be identified and the optimal design option can be explored early in the design phase.
Date Created
2018
Agent

Semiconductor Memory Applications in Radiation Environment, Hardware Security and Machine Learning System

156804-Thumbnail Image.png
Description
Semiconductor memory is a key component of the computing systems. Beyond the conventional memory and data storage applications, in this dissertation, both mainstream and eNVM memory technologies are explored for radiation environment, hardware security system and machine learning applications.

In

Semiconductor memory is a key component of the computing systems. Beyond the conventional memory and data storage applications, in this dissertation, both mainstream and eNVM memory technologies are explored for radiation environment, hardware security system and machine learning applications.

In the radiation environment, e.g. aerospace, the memory devices face different energetic particles. The strike of these energetic particles can generate electron-hole pairs (directly or indirectly) as they pass through the semiconductor device, resulting in photo-induced current, and may change the memory state. First, the trend of radiation effects of the mainstream memory technologies with technology node scaling is reviewed. Then, single event effects of the oxide based resistive switching random memory (RRAM), one of eNVM technologies, is investigated from the circuit-level to the system level.

Physical Unclonable Function (PUF) has been widely investigated as a promising hardware security primitive, which employs the inherent randomness in a physical system (e.g. the intrinsic semiconductor manufacturing variability). In the dissertation, two RRAM-based PUF implementations are proposed for cryptographic key generation (weak PUF) and device authentication (strong PUF), respectively. The performance of the RRAM PUFs are evaluated with experiment and simulation. The impact of non-ideal circuit effects on the performance of the PUFs is also investigated and optimization strategies are proposed to solve the non-ideal effects. Besides, the security resistance against modeling and machine learning attacks is analyzed as well.

Deep neural networks (DNNs) have shown remarkable improvements in various intelligent applications such as image classification, speech classification and object localization and detection. Increasing efforts have been devoted to develop hardware accelerators. In this dissertation, two types of compute-in-memory (CIM) based hardware accelerator designs with SRAM and eNVM technologies are proposed for two binary neural networks, i.e. hybrid BNN (HBNN) and XNOR-BNN, respectively, which are explored for the hardware resource-limited platforms, e.g. edge devices.. These designs feature with high the throughput, scalability, low latency and high energy efficiency. Finally, we have successfully taped-out and validated the proposed designs with SRAM technology in TSMC 65 nm.

Overall, this dissertation paves the paths for memory technologies’ new applications towards the secure and energy-efficient artificial intelligence system.
Date Created
2018
Agent

Coarse-Fine VCO Design with a New Supply Noise Suppression Method

156598-Thumbnail Image.png
Description
VCO as a ubiquitous circuit in many systems is highly demanding for the phase noises. Lowering the noise migrated from the power supply has been the trending topics for many years. Considering the Ring Oscillator(RO) based VCO is more sensitive

VCO as a ubiquitous circuit in many systems is highly demanding for the phase noises. Lowering the noise migrated from the power supply has been the trending topics for many years. Considering the Ring Oscillator(RO) based VCO is more sensitive to the supply noise, it is more significant to find out a useful technique to reduce the supply noise. Among the conventional supply noise reduction techniques such as filtering, channel length adjusting for the transistors, and the current noise mutual canceling, the new feature of the 28nm UTBB-FD-SOI process launched by the ST semiconductor offered a new method to reduce the noise, which is realized by allowing the circuit designer to dynamically control the threshold voltage. In this thesis, a new structure of the linear coarse-fine VCO with 1V supply voltage is designed for the ring typed VCO. The structure is also designed to be flexible to tune the frequency coverage by the fine and coarse tunable on-board resistors. The thesis has given the model of the phase noise reduction method. The model has also been proved to be meaningful with the newly designed VCO circuit. For instances, given 1μV/√Hz white noise coupled on the supply, the 3GHz VCO can have a more than 7dBc/Hz phase noise lowering at the 10MHz frequency offset.
Date Created
2018
Agent

Embedding Logic and Non-volatile Devices in CMOS Digital Circuits for Improving Energy Efficiency

156189-Thumbnail Image.png
Description
Static CMOS logic has remained the dominant design style of digital systems for

more than four decades due to its robustness and near zero standby current. Static

CMOS logic circuits consist of a network of combinational logic cells and clocked sequential

elements, such

Static CMOS logic has remained the dominant design style of digital systems for

more than four decades due to its robustness and near zero standby current. Static

CMOS logic circuits consist of a network of combinational logic cells and clocked sequential

elements, such as latches and flip-flops that are used for sequencing computations

over time. The majority of the digital design techniques to reduce power, area, and

leakage over the past four decades have focused almost entirely on optimizing the

combinational logic. This work explores alternate architectures for the flip-flops for

improving the overall circuit performance, power and area. It consists of three main

sections.

First, is the design of a multi-input configurable flip-flop structure with embedded

logic. A conventional D-type flip-flop may be viewed as realizing an identity function,

in which the output is simply the value of the input sampled at the clock edge. In

contrast, the proposed multi-input flip-flop, named PNAND, can be configured to

realize one of a family of Boolean functions called threshold functions. In essence,

the PNAND is a circuit implementation of the well-known binary perceptron. Unlike

other reconfigurable circuits, a PNAND can be configured by simply changing the

assignment of signals to its inputs. Using a standard cell library of such gates, a technology

mapping algorithm can be applied to transform a given netlist into one with

an optimal mixture of conventional logic gates and threshold gates. This approach

was used to fabricate a 32-bit Wallace Tree multiplier and a 32-bit booth multiplier

in 65nm LP technology. Simulation and chip measurements show more than 30%

improvement in dynamic power and more than 20% reduction in core area.

The functional yield of the PNAND reduces with geometry and voltage scaling.

The second part of this research investigates the use of two mechanisms to improve

the robustness of the PNAND circuit architecture. One is the use of forward and reverse body biases to change the device threshold and the other is the use of RRAM

devices for low voltage operation.

The third part of this research focused on the design of flip-flops with non-volatile

storage. Spin-transfer torque magnetic tunnel junctions (STT-MTJ) are integrated

with both conventional D-flipflop and the PNAND circuits to implement non-volatile

logic (NVL). These non-volatile storage enhanced flip-flops are able to save the state of

system locally when a power interruption occurs. However, manufacturing variations

in the STT-MTJs and in the CMOS transistors significantly reduce the yield, leading

to an overly pessimistic design and consequently, higher energy consumption. A

detailed analysis of the design trade-offs in the driver circuitry for performing backup

and restore, and a novel method to design the energy optimal driver for a given yield is

presented. Efficient designs of two nonvolatile flip-flop (NVFF) circuits are presented,

in which the backup time is determined on a per-chip basis, resulting in minimizing

the energy wastage and satisfying the yield constraint. To achieve a yield of 98%,

the conventional approach would have to expend nearly 5X more energy than the

minimum required, whereas the proposed tunable approach expends only 26% more

energy than the minimum. A non-volatile threshold gate architecture NV-TLFF are

designed with the same backup and restore circuitry in 65nm technology. The embedded

logic in NV-TLFF compensates performance overhead of NVL. This leads to the

possibility of zero-overhead non-volatile datapath circuits. An 8-bit multiply-and-

accumulate (MAC) unit is designed to demonstrate the performance benefits of the

proposed architecture. Based on the results of HSPICE simulations, the MAC circuit

with the proposed NV-TLFF cells is shown to consume at least 20% less power and

area as compared to the circuit designed with conventional DFFs, without sacrificing

any performance.
Date Created
2018
Agent

Accelerated Aging in Devices and Circuits

155918-Thumbnail Image.png
Description
The aging mechanism in devices is prone to uncertainties due to dynamic stress conditions. In AMS circuits these can lead to momentary fluctuations in circuit voltage that may be missed by a compact model and hence cause unpredictable failure. Firstly,

The aging mechanism in devices is prone to uncertainties due to dynamic stress conditions. In AMS circuits these can lead to momentary fluctuations in circuit voltage that may be missed by a compact model and hence cause unpredictable failure. Firstly, multiple aging effects in the devices may have underlying correlations. The generation of new traps during TDDB may significantly accelerate BTI, since these traps are close to the dielectric-Si interface in scaled technology. Secondly, the prevalent reliability analysis lacks a direct validation of the lifetime of devices and circuits. The aging mechanism of BTI causes gradual degradation of the device leading to threshold voltage shift and increasing the failure rate. In the 28nm HKMG technology, contribution of BTI to NMOS degradation has become significant at high temperature as compared to Channel Hot Carrier (CHC). This requires revising the End of Lifetime (EOL) calculation based on contribution from induvial aging effects especially in feedback loops. Conventionally, aging in devices is extrapolated from a short-term measurement, but this practice results in unreliable prediction of EOL caused by variability in initial parameters and stress conditions. To mitigate the extrapolation issues and improve predictability, this work aims at providing a new approach to test the device to EOL in a fast and controllable manner. The contributions of this thesis include: (1) based on stochastic trapping/de-trapping mechanism, new compact BTI models are developed and verified with 14nm FinFET and 28nm HKMG data. Moreover, these models are implemented into circuit simulation, illustrating a significant increase in failure rate due to accelerated BTI, (2) developing a model to predict accelerated aging under special conditions like feedback loops and stacked inverters, (3) introducing a feedback loop based test methodology called Adaptive Accelerated Aging (AAA) that can generate accurate aging data till EOL, (4) presenting simulation and experimental data for the models and providing test setup for multiple stress conditions, including those for achieving EOL in 1 hour device as well as ring oscillator (RO) circuit for validation of the proposed methodology, and (5) scaling these models for finding a guard band for VLSI design circuits that can provide realistic aging impact.
Date Created
2017
Agent

Radiation effects measurement test structure using GF 32-nm SOI process

155707-Thumbnail Image.png
Description
This thesis describes the design of a Single Event Transient (SET) duration measurement test-structure on the Global Foundries (previously IBM) 32-nm silicon-on insulator (SOI) process. The test structure is designed for portability and allows quick design and implementation on a

This thesis describes the design of a Single Event Transient (SET) duration measurement test-structure on the Global Foundries (previously IBM) 32-nm silicon-on insulator (SOI) process. The test structure is designed for portability and allows quick design and implementation on a new process node. Such a test structure is critical in analyzing the effects of radiation on complementary metal oxide semi-conductor (CMOS) circuits. The focus of this thesis is the change in pulse width during propagation of SET pulse and build a test structure to measure the duration of a SET pulse generated in real time. This test structure can estimate the SET pulse duration with 10ps resolution. It receives the input SET propagated through a SET capture structure made using a chain of combinational gates. The impact of propagation of the SET in a >200 deep collection structure is studied. A novel methodology of deploying Thick Gate TID structure is proposed and analyzed to build multi-stage chain of combinational gates. Upon using long chain of combinational gates, the most critical issue of pulse width broadening and shortening is analyzed across critical process corners. The impact of using regular standard cells on pulse width modification is compared with NMOS and/or PMOS skewed gates for the chain of combinational gates. A possible resolution to pulse width change is demonstrated using circuit and layout design of chain of inverters, two and three inputs NOR gates. The SET capture circuit is also tested in simulation by introducing a glitch signal that mimics an individual ion strike that could lead to perturbation in SET propagation. Design techniques and skewed gates are deployed to dampen the glitch that occurs under the effect of radiation. Simulation results, layout structures of SET capture circuit and chain of combinational gates are presented.
Date Created
2017
Agent

Surface Potential Modelling of Hot Carrier Degradation in CMOS Technology

155615-Thumbnail Image.png
Description
The scaling of transistors has numerous advantages such as increased memory density, less power consumption and better performance; but on the other hand, they also give rise to many reliability issues. One of the major reliability issue is the hot

The scaling of transistors has numerous advantages such as increased memory density, less power consumption and better performance; but on the other hand, they also give rise to many reliability issues. One of the major reliability issue is the hot carrier injection and the effect it has on device degradation over time which causes serious circuit malfunctions.

Hot carrier injection has been studied from early 1980's and a lot of research has been done on the various hot carrier injection mechanisms and how the devices get damaged due to this effect. However, most of the existing hot carrier degradation models do not consider the physics involved in the degradation process and they just calculate the change in threshold voltage for different stress voltages and time. Based on this, an analytical expression is formulated that predicts the device lifetime.

This thesis starts by discussing various hot carrier injection mechanisms and the effects it has on the device. Studies have shown charges getting trapped in gate oxide and interface trap generation are two mechanisms for device degradation. How various device parameters get affected due to these traps is discussed here. The physics based models such as lucky hot electron model and substrate current model are presented and gives an idea how the gate current and substrate current can be related to hot carrier injection and density of traps created.

Devices are stressed under various voltages and from the experimental data obtained, the density of trapped charges and interface traps are calculated using mid-gap technique. In this thesis, a simple analytical model based on substrate current is used to calculate the density of trapped charges in oxide and interface traps generated and it is a function of stress voltage and stress time. The model is verified against the data and the TCAD simulations. Finally, the analytical model is incorporated in a Verilog-A model and based on the surface potential method, the threshold voltage shift due to hot carrier stress is calculated.
Date Created
2017
Agent

Front End Electronics for Neutron- Gamma Spectrometer Device

155532-Thumbnail Image.png
Description
With the natural resources of earth depleting very fast, the natural resources of other celestial bodies are considered a potential replacement. Thus, there has been rise of space missions constantly and with it the need of more sophisticated spectrometer devices

With the natural resources of earth depleting very fast, the natural resources of other celestial bodies are considered a potential replacement. Thus, there has been rise of space missions constantly and with it the need of more sophisticated spectrometer devices has increased. The most important requirement in such an application is low area and power consumption.

To save area, some scintillators have been developed that can resolve both neutrons and gamma events rather than traditional scintillators which can do only one of these and thus, the spacecraft needs two such devices. But with this development, the requirements out of the readout electronics has also increased which now need to discriminate between neutron and gamma events.

This work presents a novel architecture for discriminating such events and compares the results with another approach developed by a partner company. The results show excellent potential in this approach for the neutron-gamma discrimination and the team at ASU is going to expand on this design and build up a working prototype for the complete spectrometer device.
Date Created
2017
Agent

Lateral Ag Electrodeposits in Chalcogenide Glass for Physical Unclonable Function Application

155321-Thumbnail Image.png
Description
Counterfeiting of goods is a widespread epidemic that is affecting the world economy. The conventional labeling techniques are proving inadequate to thwart determined counterfeiters equipped with sophisticated technologies. There is a growing need of a secure labeling that is easy

Counterfeiting of goods is a widespread epidemic that is affecting the world economy. The conventional labeling techniques are proving inadequate to thwart determined counterfeiters equipped with sophisticated technologies. There is a growing need of a secure labeling that is easy to manufacture and analyze but extremely difficult to copy. Programmable metallization cell technology operates on a principle of controllable reduction of a metal ions to an electrodeposit in a solid electrolyte by application of bias. The nature of metallic electrodeposit is unique for each instance of growth, moreover it has a treelike, bifurcating fractal structure with high information capacity. These qualities of the electrodeposit can be exploited to use it as a physical unclonable function. The secure labels made from the electrodeposits grown in radial structure can provide enhanced authentication and protection from counterfeiting and tampering.

So far only microscale radial structures and electrodeposits have been fabricated which limits their use to labeling only high value items due to high cost associated with their fabrication and analysis. Therefore, there is a need for a simple recipe for fabrication of macroscale structure that does not need sophisticated lithography tools and cleanroom environment. Moreover, the growth kinetics and material characteristics of such macroscale electrodeposits need to be investigated. In this thesis, a recipe for fabrication of centimeter scale radial structure for growing Ag electrodeposits using simple fabrication techniques was proposed. Fractal analysis of an electrodeposit suggested information capacity of 1.27 x 1019. The kinetics of growth were investigated by electrical characterization of the full cell and only solid electrolyte at different temperatures. It was found that mass transport of ions is the rate limiting process in the growth. Materials and optical characterization techniques revealed that the subtle relief like structure and consequently distinct optical response of the electrodeposit provides an added layer of security. Thus, the enormous information capacity, ease of fabrication and simplicity of analysis make macroscale fractal electrodeposits grown in radial programmable metallization cells excellent candidates for application as physical unclonable functions.
Date Created
2017
Agent