Berisha, Visar

Characterizing Dysarthric Speech with Transfer Learning

Description

Speech is known to serve as an early indicator of neurological decline, particularly in motor diseases. There is significant interest in developing automated, objective signal analytics that detect clinically-relevant changes and in evaluating these algorithms against the existing gold-standard: perceptual evaluation by trained speech and language pathologists. Hypernasality, the result of poor control of the velopharyngeal flap---the soft palate regulating airflow between the oral and nasal cavities---is one such speech symptom of interest, as precise velopharyngeal control is difficult to achieve under neuromuscular disorders. However, a host of co-modulating variables give hypernasal speech a complex and highly variable acoustic signature, making it difficult for skilled clinicians to assess and for automated systems to evaluate. Previous work in rating hypernasality from speech relies on either engineered features based on statistical signal processing or machine learning models trained end-to-end on clinical ratings of disordered speech examples. Engineered features often fail to capture the complex acoustic patterns associated with hypernasality, while end-to-end methods tend to overfit to the small datasets on which they are trained. In this thesis, I present a set of acoustic features, models, and strategies for characterizing hypernasality in dysarthric speech that split the difference between these two approaches, with the aim of capturing the complex perceptual character of hypernasality without overfitting to the small datasets available. The features are based on acoustic models trained on a large corpus of healthy speech, integrating expert knowledge to capture known perceptual characteristics of hypernasal speech. They are then used in relatively simple linear models to predict clinician hypernasality scores. These simple models are robust, generalizing across diseases and outperforming comprehensive set of baselines in accuracy and correlation. This novel approach represents a new state-of-the-art in objective hypernasality assessment.

Date Created

2020

Agent

Author (aut): Saxon, Michael Stephen
Thesis advisor (ths): Berisha, Visar
Thesis advisor (ths): Panchanathan, Sethuraman
Committee member: Venkateswara, Hemanth
Publisher (pbl): Arizona State University

"I'm Having Trouble Understanding You Right Now": A Multi-Dimensional Evaluation of the Intelligibility of Dysphonic Speech

Description

Individuals with voice disorders experience challenges communicating daily. These challenges lead to a significant decrease in the quality of life for individuals with dysphonia. While voice amplification systems are often employed as a voice-assistive technology, individuals with voice disorders generally still experience difficulties being understood while using voice amplification systems. With the goal of developing systems that help improve the quality of life of individuals with dysphonia, this work outlines the landscape of voice-assistive technology, the inaccessibility of state-of-the-art voice-based technology and the need for the development of intelligibility improving voice-assistive technologies designed both with and for individuals with voice disorders. With the rise of voice-based technologies in society, in order for everyone to participate in the use of voice-based technologies individuals with voice disorders must be included in both the data that is used to train these systems and the design process. An important and necessary step towards the development of better voice assistive technology as well as more inclusive voice-based systems is the creation of a large, publicly available dataset of dysphonic speech. To this end, a web-based platform to crowdsource voice disorder speech was developed to create such a dataset. This dataset will be released so that it is freely and publicly available to stimulate research in the field of voice-assistive technologies. Future work includes building a robust intelligibility estimation model, as well as employing that model to measure, and therefore enhance, the intelligibility of a given utterance. The hope is that this model will lead to the development of voice-assistive technology using state-of-the-art machine learning models to help individuals with voice disorders be better understood.

Date Created

2020

Agent

Author (aut): Moore, Meredith Kay
Thesis advisor (ths): Panchanathan, Sethuraman
Committee member: Berisha, Visar
Committee member: McDaniel, Troy
Committee member: Venkateswara, Hemanth
Publisher (pbl): Arizona State University

Anticipating Postoperative Delirium During Cardiac Surgeries Involving Deep Hypothermia Circulatory Arrest

Description

Aortic aneurysms and dissections are life threatening conditions addressed by replacing damaged sections of the aorta. Blood circulation must be halted to facilitate repairs. Ischemia places the body, especially the brain, at risk of damage. Deep hypothermia circulatory arrest (DHCA) is employed to protect patients and provide time for surgeons to complete repairs on the basis that reducing body temperature suppresses the metabolic rate. Supplementary surgical techniques can be employed to reinforce the brain's protection and increase the duration circulation can be suspended. Even then, protection is not completely guaranteed though. A medical condition that can arise early in recovery is postoperative delirium, which is correlated with poor long term outcome. This study develops a methodology to intraoperatively monitor neurophysiology through electroencephalography (EEG) and anticipate postoperative delirium. The earliest opportunity to detect occurrences of complications through EEG is immediately following DHCA during warming. The first observable electrophysiological activity after being completely suppressed is a phenomenon known as burst suppression, which is related to the brain's metabolic state and recovery of nominal neurological function. A metric termed burst suppression duty cycle (BSDC) is developed to characterize the changing electrophysiological dynamics. Predictions of postoperative delirium incidences are made by identifying deviations in the way these dynamics evolve. Sixteen cases are examined in this study. Accurate predictions can be made, where on average 89.74% of cases are correctly classified when burst suppression concludes and 78.10% when burst suppression begins. The best case receiver operating characteristic curve has an area under its convex hull of 0.8988, whereas the worst case area under the hull is 0.7889. These results demonstrate the feasibility of monitoring BSDC to anticipate postoperative delirium during burst suppression. They also motivate a further analysis on identifying footprints of causal mechanisms of neural injury within BSDC. Being able to raise warning signs of postoperative delirium early provides an opportunity to intervene and potentially avert neurological complications. Doing so would improve the success rate and quality of life after surgery.

Date Created

2020

Agent

Author (aut): Ma, Owen
Thesis advisor (ths): Bliss, Daniel W
Committee member: Berisha, Visar
Committee member: Kosut, Oliver
Committee member: Brewer, Gene
Publisher (pbl): Arizona State University

Robust Networks: Neural Networks Robust to Quantization Noise and Analog Computation Noise Based on Natural Gradient

Description

Deep neural networks (DNNs) have had tremendous success in a variety of

statistical learning applications due to their vast expressive power. Most

applications run DNNs on the cloud on parallelized architectures. There is a need

for for efficient DNN inference on edge with low precision hardware and analog

accelerators. To make trained models more robust for this setting, quantization and

analog compute noise are modeled as weight space perturbations to DNNs and an

information theoretic regularization scheme is used to penalize the KL-divergence

between perturbed and unperturbed models. This regularizer has similarities to

both natural gradient descent and knowledge distillation, but has the advantage of

explicitly promoting the network to and a broader minimum that is robust to

weight space perturbations. In addition to the proposed regularization,

KL-divergence is directly minimized using knowledge distillation. Initial validation

on FashionMNIST and CIFAR10 shows that the information theoretic regularizer

and knowledge distillation outperform existing quantization schemes based on the

straight through estimator or L2 constrained quantization.

Date Created

2019

Agent

Author (aut): Kadambi, Pradyumna
Thesis advisor (ths): Berisha, Visar
Committee member: Dasarathy, Gautam
Committee member: Seo, Jae-Sun
Committee member: Cao, Yu
Publisher (pbl): Arizona State University

Numerical computation of Wishart eigenvalue distributions for multistatic radar detection

Description

Eigenvalues of the Gram matrix formed from received data frequently appear in sufficient detection statistics for multi-channel detection with Generalized Likelihood Ratio (GLRT) and Bayesian tests. In a frequently presented model for passive radar, in which the null hypothesis is that the channels are independent and contain only complex white Gaussian noise and the alternative hypothesis is that the channels contain a common rank-one signal in the mean, the GLRT statistic is the largest eigenvalue $\lambda_1$ of the Gram matrix formed from data. This Gram matrix has a Wishart distribution. Although exact expressions for the distribution of $\lambda_1$ are known under both hypotheses, numerically calculating values of these distribution functions presents difficulties in cases where the dimension of the data vectors is large. This dissertation presents tractable methods for computing the distribution of $\lambda_1$ under both the null and alternative hypotheses through a technique of expanding known expressions for the distribution of $\lambda_1$ as inner products of orthogonal polynomials. These newly presented expressions for the distribution allow for computation of detection thresholds and receiver operating characteristic curves to arbitrary precision in floating point arithmetic. This represents a significant advancement over the state of the art in a problem that could previously only be addressed by Monte Carlo methods.

Date Created

2019

Agent

Author (aut): Jones, Scott, Ph.D
Thesis advisor (ths): Cochran, Douglas
Committee member: Berisha, Visar
Committee member: Bliss, Daniel
Committee member: Kosut, Oliver
Committee member: Richmond, Christ
Publisher (pbl): Arizona State University

A Systematic Survey of Cognitive-Communicative Evaluations

Description

Dementia is a syndrome resulting from an acquired brain disease that affects many domains of cognitive impairment. The progressive disorder generally affects memory, attention, executive functions, communication, and other cognitive domains that significantly alter everyday function (Quinn, 2014). The purpose of this research was to gather a systematic review of cognitive-communication assessments and screeners used in assessing dementia to assist in early prognosis. From this review, there is potential in developing a new test to address the areas that people with dementia often have deficits in 1) Memory, 2) Attention, 3) Executive Functions, 4) Language, and 5) Visuospatial Skills. In the field of speech-language pathology, or medicine in general, there is no one assessment that can diagnose dementia. Additionally, this review will explore identifying speech and language characteristics of dementia through speech analytics to theoretically help clinicians identify early signs of dementia.

Date Created

2019

Agent

Author (aut): Miller, Marissa
Thesis advisor (ths): Liss, Julie M
Thesis advisor (ths): Berisha, Visar
Committee member: Azuma, Tamiko
Publisher (pbl): Arizona State University

A computational model of the relationship between speech intelligibility and speech acoustics

Description

Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental phoneme errors and suprasegmental lexical boundary errors are developed to reveal the perceptual strategies of the listener. A comprehensive set of automated acoustic measures are developed to quantify variations in the acoustic signal from three perceptual aspects, including articulation, prosody, and vocal quality. The developed measures have been validated on a dysarthric speech dataset with various severity degrees. Multiple regression analysis is employed to show the developed measures could predict perceptual ratings reliably. The relationship between the acoustic measures and the listening errors is investigated to show the interaction between speech production and perception. The hypothesize is that the segmental phoneme errors are mainly caused by the imprecise articulation, while the sprasegmental lexical boundary errors are due to the unreliable phonemic information as well as the abnormal rhythm and prosody patterns. To test the hypothesis, within-speaker variations are simulated in different speaking modes. Significant changes have been detected in both the acoustic signals and the listening errors. Results of the regression analysis support the hypothesis by showing that changes in the articulation-related acoustic features are important in predicting changes in listening phoneme errors, while changes in both of the articulation- and prosody-related features are important in predicting changes in lexical boundary errors. Moreover, significant correlation has been achieved in the cross-validation experiment, which indicates that it is possible to predict intelligibility variations from acoustic signal.

Date Created

2019

Agent

Author (aut): Jiao, Yishan
Thesis advisor (ths): Berisha, Visar
Thesis advisor (ths): Liss, Julie
Committee member: Zhou, Yi
Publisher (pbl): Arizona State University

Let's Talk Monkey- Quantitative Analysis of Marmoset Monkey Calls

Description

The marmoset monkey (Callithrix jacchus) is a new-world primate species native to South America rainforests. Because they rely on vocal communication to navigate and survive, marmosets have evolved as a promising primate model to study vocal production, perception, cognition, and social interactions. The purpose of this project is to provide an initial assessment on the vocal repertoire of a marmoset colony raised at Arizona State University and call types they use in different social conditions. The vocal production of a colony of 16 marmoset monkeys was recorded in 3 different conditions with three repeats of each condition. The positive condition involves a caretaker distributing food, the negative condition involves an experimenter taking a marmoset out of his cage to a different room, and the control condition is the normal state of the colony with no human interference. A total of 5396 samples of calls were collected during a total of 256 minutes of audio recordings. Call types were analyzed in semi-automated computer programs developed in the Laboratory of Auditory Computation and Neurophysiology. A total of 5 major call types were identified and their variants in different social conditions were analyzed. The results showed that the total number of calls and the type of calls made differed in the three social conditions, suggesting that monkey vocalization signals and depends on the social context.

Date Created

2019-05

Agent

Author (aut): Fernandez, Jessmin Natalie
Thesis director: Zhou, Yi
Committee member: Berisha, Visar
Contributor (ctb): School of International Letters and Cultures
Contributor (ctb): Department of Psychology
Contributor (ctb): School of Life Sciences
Contributor (ctb): Barrett, The Honors College

Using Capsule Networks for Image and Speech Recognition Problems

Description

In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional CNN. It uses capsules to describe features in multiple dimensions and dynamic routing to increase the statistical stability of the network.

In this work, we first use capsule network for overlapping digit recognition problem. We evaluate the performance of the network with respect to recognition accuracy, convergence and training time per epoch. We show that capsule network achieves higher accuracy when training set size is small. When training set size is larger, capsule network and conventional CNN have comparable recognition accuracy. The training time per epoch for capsule network is longer than conventional CNN because of the dynamic routing algorithm. An analysis of the GPU timing shows that adjusting the capsule structure can help decrease the time complexity of the dynamic routing algorithm significantly.

Next, we design a capsule network for speech recognition, specifically, overlapping word recognition. We use both capsule network and conventional CNN to recognize 2 overlapping words in speech files created from 5 word classes. We show that capsule network achieves a considerably higher recognition accuracy (96.92%) compared to conventional CNN (85.19%). Our results show that capsule network recognizes overlapping word by recognizing each individual word in the speech. We also verify the scalability of capsule network by increasing the number of word classes from 5 to 10. Capsule network still shows a high recognition accuracy of 95.42% in case of 10 words while the accuracy of conventional CNN decreases sharply to 73.18%.

Date Created

2018

Agent

Author (aut): Xiong, Yan
Thesis advisor (ths): Chakrabarti, Chaitali
Thesis advisor (ths): Berisha, Visar
Committee member: Weng, Yang
Publisher (pbl): Arizona State University

Advances in Motion Estimators for Applications in Computer Vision

Description

Motion estimation is a core task in computer vision and many applications utilize optical flow methods as fundamental tools to analyze motion in images and videos. Optical flow is the apparent motion of objects in image sequences that results from relative motion between the objects and the imaging perspective. Today, optical flow fields are utilized to solve problems in various areas such as object detection and tracking, interpolation, visual odometry, etc. In this dissertation, three problems from different areas of computer vision and the solutions that make use of modified optical flow methods are explained.

The contributions of this dissertation are approaches and frameworks that introduce i) a new optical flow-based interpolation method to achieve minimally divergent velocimetry data, ii) a framework that improves the accuracy of change detection algorithms in synthetic aperture radar (SAR) images, and iii) a set of new methods to integrate Proton Magnetic Resonance Spectroscopy (1HMRSI) data into threedimensional (3D) neuronavigation systems for tumor biopsies.

In the first application an optical flow-based approach for the interpolation of minimally divergent velocimetry data is proposed. The velocimetry data of incompressible fluids contain signals that describe the flow velocity. The approach uses the additional flow velocity information to guide the interpolation process towards reduced divergence in the interpolated data.

In the second application a framework that mainly consists of optical flow methods and other image processing and computer vision techniques to improve object extraction from synthetic aperture radar images is proposed. The proposed framework is used for distinguishing between actual motion and detected motion due to misregistration in SAR image sets and it can lead to more accurate and meaningful change detection and improve object extraction from a SAR datasets.

In the third application a set of new methods that aim to improve upon the current state-of-the-art in neuronavigation through the use of detailed three-dimensional (3D) 1H-MRSI data are proposed. The result is a progressive form of online MRSI-guided neuronavigation that is demonstrated through phantom validation and clinical application.

Date Created

2018

Agent

Author (aut): Kanberoglu, Berkay
Thesis advisor (ths): Frakes, David
Thesis advisor (ths): Turaga, Pavan
Committee member: Spanias, Andreas
Committee member: Berisha, Visar
Publisher (pbl): Arizona State University

Subscribe to Berisha, Visar