Addressing the Challenges of Automated Speech and Language Analysis for the Assessment of Mental Health and Functional Competency

171844-Thumbnail Image.png
Description
Severe forms of mental illness, such as schizophrenia and bipolar disorder, are debilitating conditions that negatively impact an individual's quality of life. Additionally, they are often difficult and expensive to diagnose and manage, placing a large burden on society. Mental

Severe forms of mental illness, such as schizophrenia and bipolar disorder, are debilitating conditions that negatively impact an individual's quality of life. Additionally, they are often difficult and expensive to diagnose and manage, placing a large burden on society. Mental illness is typically diagnosed by the use of clinical interviews and a set of neuropsychiatric batteries; a key component of nearly all of these evaluations is some spoken language task. Clinicians have long used speech and language production as a proxy for neurological health, but most of these assessments are subjective in nature. Meanwhile, technological advancements in speech and natural language processing have grown exponentially over the past decade, increasing the capacity of computer models to assess particular aspects of speech and language. For this reason, many have seen an opportunity to leverage signal processing and machine learning applications to objectively assess clinical speech samples in order to automatically compute objective measures of neurological health. This document summarizes several contributions to expand upon this body of research. Mainly, there is still a large gap between the theoretical power of computational language models and their actual use in clinical applications. One of the largest concerns is the limited and inconsistent reliability of speech and language features used in models for assessing specific aspects of mental health; numerous methods may exist to measure the same or similar constructs and lead researchers to different conclusions in different studies. To address this, a novel measurement model based on a theoretical framework of speech production is used to motivate feature selection, while also performing a smoothing operation on features across several domains of interest. Then, these composite features are used to perform a much wider range of analyses than is typical of previous studies, looking at everything from diagnosis to functional competency assessments. Lastly, potential improvements to address practical implementation challenges associated with the use of speech and language technology in a real-world environment are investigated. The goal of this work is to demonstrate the ability of speech and language technology to aid clinical practitioners toward improvements in quality of life outcomes for their patients.
Date Created
2022
Agent

A Systematic Survey of Cognitive-Communicative Evaluations

157395-Thumbnail Image.png
Description
Dementia is a syndrome resulting from an acquired brain disease that affects many domains of cognitive impairment. The progressive disorder generally affects memory, attention, executive functions, communication, and other cognitive domains that significantly alter everyday function (Quinn, 2014). The purpose

Dementia is a syndrome resulting from an acquired brain disease that affects many domains of cognitive impairment. The progressive disorder generally affects memory, attention, executive functions, communication, and other cognitive domains that significantly alter everyday function (Quinn, 2014). The purpose of this research was to gather a systematic review of cognitive-communication assessments and screeners used in assessing dementia to assist in early prognosis. From this review, there is potential in developing a new test to address the areas that people with dementia often have deficits in 1) Memory, 2) Attention, 3) Executive Functions, 4) Language, and 5) Visuospatial Skills. In the field of speech-language pathology, or medicine in general, there is no one assessment that can diagnose dementia. Additionally, this review will explore identifying speech and language characteristics of dementia through speech analytics to theoretically help clinicians identify early signs of dementia.
Date Created
2019
Agent

A computational model for studying L1’s effect on L2 speech learning

156814-Thumbnail Image.png
Description
Much evidence has shown that first language (L1) plays an important role in the formation of L2 phonological system during second language (L2) learning process. This combines with the fact that different L1s have distinct phonological patterns to indicate the

Much evidence has shown that first language (L1) plays an important role in the formation of L2 phonological system during second language (L2) learning process. This combines with the fact that different L1s have distinct phonological patterns to indicate the diverse L2 speech learning outcomes for speakers from different L1 backgrounds. This dissertation hypothesizes that phonological distances between accented speech and speakers' L1 speech are also correlated with perceived accentedness, and the correlations are negative for some phonological properties. Moreover, contrastive phonological distinctions between L1s and L2 will manifest themselves in the accented speech produced by speaker from these L1s. To test the hypotheses, this study comes up with a computational model to analyze the accented speech properties in both segmental (short-term speech measurements on short-segment or phoneme level) and suprasegmental (long-term speech measurements on word, long-segment, or sentence level) feature space. The benefit of using a computational model is that it enables quantitative analysis of L1's effect on accent in terms of different phonological properties. The core parts of this computational model are feature extraction schemes to extract pronunciation and prosody representation of accented speech based on existing techniques in speech processing field. Correlation analysis on both segmental and suprasegmental feature space is conducted to look into the relationship between acoustic measurements related to L1s and perceived accentedness across several L1s. Multiple regression analysis is employed to investigate how the L1's effect impacts the perception of foreign accent, and how accented speech produced by speakers from different L1s behaves distinctly on segmental and suprasegmental feature spaces. Results unveil the potential application of the methodology in this study to provide quantitative analysis of accented speech, and extend current studies in L2 speech learning theory to large scale. Practically, this study further shows that the computational model proposed in this study can benefit automatic accentedness evaluation system by adding features related to speakers' L1s.
Date Created
2018
Agent

Model-driven time-varying signal analysis and its application to speech processing

154572-Thumbnail Image.png
Description
This work examines two main areas in model-based time-varying signal processing with emphasis in speech processing applications. The first area concentrates on improving speech intelligibility and on increasing the proposed methodologies application for clinical practice in speech-language pathology. The second

This work examines two main areas in model-based time-varying signal processing with emphasis in speech processing applications. The first area concentrates on improving speech intelligibility and on increasing the proposed methodologies application for clinical practice in speech-language pathology. The second area concentrates on signal expansions matched to physical-based models but without requiring independent basis functions; the significance of this work is demonstrated with speech vowels.

A fully automated Vowel Space Area (VSA) computation method is proposed that can be applied to any type of speech. It is shown that the VSA provides an efficient and reliable measure and is correlated to speech intelligibility. A clinical tool that incorporates the automated VSA was proposed for evaluation and treatment to be used by speech language pathologists. Two exploratory studies are performed using two databases by analyzing mean formant trajectories in healthy speech for a wide range of speakers, dialects, and coarticulation contexts. It is shown that phonemes crowded in formant space can often have distinct trajectories, possibly due to accurate perception.

A theory for analyzing time-varying signals models with amplitude modulation and frequency modulation is developed. Examples are provided that demonstrate other possible signal model decompositions with independent basis functions and corresponding physical interpretations. The Hilbert transform (HT) and the use of the analytic form of a signal are motivated, and a proof is provided to show that a signal can still preserve desirable mathematical properties without the use of the HT. A visualization of the Hilbert spectrum is proposed to aid in the interpretation. A signal demodulation is proposed and used to develop a modified Empirical Mode Decomposition (EMD) algorithm.
Date Created
2016
Agent

Executive function and language control in bilinguals with a history of mild traumatic brain injury

154034-Thumbnail Image.png
Description
Adults with a history of traumatic brain injury (TBI) often show deficits in executive functioning, which include the ability to inhibit, switch, and attend to task relevant information. These abilities are also essential for language processing in bilinguals, who constantly

Adults with a history of traumatic brain injury (TBI) often show deficits in executive functioning, which include the ability to inhibit, switch, and attend to task relevant information. These abilities are also essential for language processing in bilinguals, who constantly inhibit and switch between languages. Currently, there is no data regarding the effect of TBI on executive function and language processing in bilinguals. This study used behavioral and eye-tracking measures to examine the effect of mild traumatic brain injury (mTBI) on executive function and language processing in Spanish-English bilinguals. In Experiment 1, thirty-nine healthy bilinguals completed a variety of executive function and language processing tasks. The primary executive function and language processing tasks were paired with a cognitive load task intended to simulate mTBI. In Experiment 2, twenty-two bilinguals with a history of mTBI and twenty healthy control bilinguals completed the same executive function measures and language processing tasks. The results revealed that bilinguals with a history of mTBI show deficits in specific executive functions and have higher rates of language processing deficits than healthy control bilinguals. Additionally, behavioral and eye-tracking data suggest that these language processing deficits are related to underlying executive function abilities. This study also identified a subset of bilinguals who may be at the greater risk of language processing deficits following mTBI. The findings of this study have a direct impact on the identification of executive function deficits and language processing deficits in bilinguals with a history mTBI.
Date Created
2015
Agent

Glottal fry in college aged females: an entrainment phenomenon?

153745-Thumbnail Image.png
Description
Glottal fry is a vocal register characterized by low frequency and increased signal perturbation, and is perceptually identified by its popping, creaky quality. Recently, the use of the glottal fry vocal register has received growing awareness and attention in popular

Glottal fry is a vocal register characterized by low frequency and increased signal perturbation, and is perceptually identified by its popping, creaky quality. Recently, the use of the glottal fry vocal register has received growing awareness and attention in popular culture and media in the United States. The creaky quality that was originally associated with vocal pathologies is indeed becoming “trendy,” particularly among young women across the United States. But while existing studies have defined, quantified, and attempted to explain the use of glottal fry in conversational speech, there is currently no explanation for the increasing prevalence of the use of glottal fry amongst American women. This thesis, however, proposes that conversational entrainment—a communication phenomenon which describes the propensity to modify one’s behavior to align more closely with one’s communication partner—may provide a theoretical framework to explain the growing trend in the use of glottal fry amongst college-aged women in the United States. Female participants (n = 30) between the ages of 18 and 29 years (M = 20.6, SD = 2.95) had conversations with two conversation partners, one who used quantifiably more glottal fry than the other. The study utilized perceptual and quantifiable acoustic information to address the following key question: Does the amount of habitual glottal fry in a conversational partner influence one’s use of glottal fry in their own speech? Results yielded the following two findings: (1) according to perceptual annotations, the participants used a greater amount of glottal fry when speaking with the Fry conversation partner than with the Non Fry partner, (2) statistically significant differences were found in the acoustics of the participants’ vocal qualities based on conversation partner. While the current study demonstrates that young women are indeed speaking in glottal fry in everyday conversations, and that its use can be attributed in part to conversational entrainment, we still lack a clear explanation of the deeper motivations for women to speak in a lower vocal register. The current study opens avenues for continued analysis of the sociolinguistic functions of the glottal fry register.
Date Created
2015
Agent

Understanding the processing of degraded speech: electroencephalographic measures as a surrogate for recovery from concussion

152594-Thumbnail Image.png
Description
The recent spotlight on concussion has illuminated deficits in the current standard of care with regard to addressing acute and persistent cognitive signs and symptoms of mild brain injury. This stems, in part, from the diffuse nature of the injury,

The recent spotlight on concussion has illuminated deficits in the current standard of care with regard to addressing acute and persistent cognitive signs and symptoms of mild brain injury. This stems, in part, from the diffuse nature of the injury, which tends not to produce focal cognitive or behavioral deficits that are easily identified or tracked. Indeed it has been shown that patients with enduring symptoms have difficulty describing their problems; therefore, there is an urgent need for a sensitive measure of brain activity that corresponds with higher order cognitive processing. The development of a neurophysiological metric that maps to clinical resolution would inform decisions about diagnosis and prognosis, including the need for clinical intervention to address cognitive deficits. The literature suggests the need for assessment of concussion under cognitively demanding tasks. Here, a joint behavioral- high-density electroencephalography (EEG) paradigm was employed. This allows for the examination of cortical activity patterns during speech comprehension at various levels of degradation in a sentence verification task, imposing the need for higher-order cognitive processes. Eight participants with concussion listened to true-false sentences produced with either moderately to highly intelligible noise-vocoders. Behavioral data were simultaneously collected. The analysis of cortical activation patterns included 1) the examination of event-related potentials, including latency and source localization, and 2) measures of frequency spectra and associated power. Individual performance patterns were assessed during acute injury and a return visit several months following injury. Results demonstrate a combination of task-related electrophysiology measures correspond to changes in task performance during the course of recovery. Further, a discriminant function analysis suggests EEG measures are more sensitive than behavioral measures in distinguishing between individuals with concussion and healthy controls at both injury and recovery, suggesting the robustness of neurophysiological measures during a cognitively demanding task to both injury and persisting pathophysiology.
Date Created
2014
Agent

Investigating the influence of top-down mechanisms on hemispheric asymmetries in verbal memory

152036-Thumbnail Image.png
Description
It is commonly known that the left hemisphere of the brain is more efficient in the processing of verbal information, compared to the right hemisphere. One proposal suggests that hemispheric asymmetries in verbal processing are due in part to the

It is commonly known that the left hemisphere of the brain is more efficient in the processing of verbal information, compared to the right hemisphere. One proposal suggests that hemispheric asymmetries in verbal processing are due in part to the efficient use of top-down mechanisms by the left hemisphere. Most evidence for this comes from hemispheric semantic priming, though fewer studies have investigated verbal memory in the cerebral hemispheres. The goal of the current investigations is to examine how top-down mechanisms influence hemispheric asymmetries in verbal memory, and determine the specific nature of hypothesized top-down mechanisms. Five experiments were conducted to explore the influence of top-down mechanisms on hemispheric asymmetries in verbal memory. Experiments 1 and 2 used item-method directed forgetting to examine maintenance and inhibition mechanisms. In Experiment 1, participants were cued to remember or forget certain words, and cues were presented simultaneously or after the presentation of target words. In Experiment 2, participants were cued again to remember or forget words, but each word was repeated once or four times. Experiments 3 and 4 examined the influence of cognitive load on hemispheric asymmetries in true and false memory. In Experiment 3, cognitive load was imposed during memory encoding, while in Experiment 4, cognitive load was imposed during memory retrieval. Finally, Experiment 5 investigated the association between controlled processing in hemispheric semantic priming, and top-down mechanisms used for hemispheric verbal memory. Across all experiments, divided visual field presentation was used to probe verbal memory in the cerebral hemispheres. Results from all experiments revealed several important findings. First, top-down mechanisms used by the LH primarily used to facilitate verbal processing, but also operate in a domain general manner in the face of increasing processing demands. Second, evidence indicates that the RH uses top-down mechanisms minimally, and processes verbal information in a more bottom-up manner. These data help clarify the nature of top-down mechanisms used in hemispheric memory and language processing, and build upon current theories that attempt to explain hemispheric asymmetries in language processing.
Date Created
2013
Agent

Manifestation of higher-order cognitive processing deficits resulting from concussion

151671-Thumbnail Image.png
Description
Concussion, a subset of mild traumatic brain injury (mTBI), has recently been brought to the forefront of the media due to a large lawsuit filed against the National Football League. Concussion resulting from injury varies in severity, duration, and type,

Concussion, a subset of mild traumatic brain injury (mTBI), has recently been brought to the forefront of the media due to a large lawsuit filed against the National Football League. Concussion resulting from injury varies in severity, duration, and type, based on many characteristics about the individual that research does not presently understand. Chronic fatigue, poor working memory, impaired self-awareness, and lack of attention to task are symptoms commonly present post-concussion. Currently, there is not a standard method of assessing concussion, nor is there a way to track an individual's recovery, resulting in misguided treatment for better prognosis. The aim of the following study was to determine patient specific higher-order cognitive processing deficits for clinical diagnosis and prognosis of concussion. Six individuals (N=6) were seen during the acute phase of concussion, two of whom were seen subsequently when their symptoms were deemed clinically resolved. Subjective information was collected from both the patient and from neurology testing. Each individual completed a task, in which they were presented with degraded speech, taxing their higher-order cognitive processing. Patient specific behavioral patterns are noted, creating a unique paradigm for mapping subjective and objective data for each patient's strategy to compensate for deficits and understand speech in a difficult listening situation. Keywords: concussion, cognitive processing
Date Created
2013
Agent

Degraded vowel acoustics and the perceptual consequences in dysarthria

150496-Thumbnail Image.png
Description
Distorted vowel production is a hallmark characteristic of dysarthric speech, irrespective of the underlying neurological condition or dysarthria diagnosis. A variety of acoustic metrics have been used to study the nature of vowel production deficits in dysarthria; however, not all

Distorted vowel production is a hallmark characteristic of dysarthric speech, irrespective of the underlying neurological condition or dysarthria diagnosis. A variety of acoustic metrics have been used to study the nature of vowel production deficits in dysarthria; however, not all demonstrate sensitivity to the exhibited deficits. Less attention has been paid to quantifying the vowel production deficits associated with the specific dysarthrias. Attempts to characterize the relationship between naturally degraded vowel production in dysarthria with overall intelligibility have met with mixed results, leading some to question the nature of this relationship. It has been suggested that aberrant vowel acoustics may be an index of overall severity of the impairment and not an "integral component" of the intelligibility deficit. A limitation of previous work detailing perceptual consequences of disordered vowel acoustics is that overall intelligibility, not vowel identification accuracy, has been the perceptual measure of interest. A series of three experiments were conducted to address the problems outlined herein. The goals of the first experiment were to identify subsets of vowel metrics that reliably distinguish speakers with dysarthria from non-disordered speakers and differentiate the dysarthria subtypes. Vowel metrics that capture vowel centralization and reduced spectral distinctiveness among vowels differentiated dysarthric from non-disordered speakers. Vowel metrics generally failed to differentiate speakers according to their dysarthria diagnosis. The second and third experiments were conducted to evaluate the relationship between degraded vowel acoustics and the resulting percept. In the second experiment, correlation and regression analyses revealed vowel metrics that capture vowel centralization and distinctiveness and movement of the second formant frequency were most predictive of vowel identification accuracy and overall intelligibility. The third experiment was conducted to evaluate the extent to which the nature of the acoustic degradation predicts the resulting percept. Results suggest distinctive vowel tokens are better identified and, likewise, better-identified tokens are more distinctive. Further, an above-chance level agreement between nature of vowel misclassification and misidentification errors was demonstrated for all vowels, suggesting degraded vowel acoustics are not merely an index of severity in dysarthria, but rather are an integral component of the resultant intelligibility disorder.
Date Created
2012
Agent