verbal behavior - OpenMD.com Journal Search

Speech-induced suppression during natural dialogues.

Communications Biology Mar 2024

When engaged in a conversation, one receives auditory information from the other's speech but also from their own speech. However, this information is processed...

Summary PubMed Full Text PDF

Authors: Joaquin E Gonzalez, Nicolás Nieto, Pablo Brusco...

When engaged in a conversation, one receives auditory information from the other's speech but also from their own speech. However, this information is processed differently by an effect called Speech-Induced Suppression. Here, we studied brain representation of acoustic properties of speech in natural unscripted dialogues, using electroencephalography (EEG) and high-quality speech recordings from both participants. Using encoding techniques, we were able to reproduce a broad range of previous findings on listening to another's speech, and achieving even better performances when predicting EEG signal in this complex scenario. Furthermore, we found no response when listening to oneself, using different acoustic features (spectrogram, envelope, etc.) and frequency bands, evidencing a strong effect of SIS. The present work shows that this mechanism is present, and even stronger, during natural dialogues. Moreover, the methodology presented here opens the possibility of a deeper understanding of the related mechanisms in a wider range of contexts.

Topics: Humans; Speech; Acoustic Stimulation; Electroencephalography; Brain; Brain Mapping

PubMed: 38459110
DOI: 10.1038/s42003-024-05945-9

Towards interpretable speech biomarkers: exploring MFCCs.

Scientific Reports Dec 2023

While speech biomarkers of disease have attracted increased interest in recent years, a challenge is that features derived from signal processing or machine learning...

Summary PubMed Full Text PDF

Authors: Brian Tracey, Dmitri Volfson, James Glass...

While speech biomarkers of disease have attracted increased interest in recent years, a challenge is that features derived from signal processing or machine learning approaches may lack clinical interpretability. As an example, Mel frequency cepstral coefficients (MFCCs) have been identified in several studies as a useful marker of disease, but are regarded as uninterpretable. Here we explore correlations between MFCC coefficients and more interpretable speech biomarkers. In particular we quantify the MFCC2 endpoint, which can be interpreted as a weighted ratio of low- to high-frequency energy, a concept which has been previously linked to disease-induced voice changes. By exploring MFCC2 in several datasets, we show how its sensitivity to disease can be increased by adjusting computation parameters.

Topics: Speech; Speech Acoustics; Signal Processing, Computer-Assisted

PubMed: 38123603
DOI: 10.1038/s41598-023-49352-2

The Relationship Between Acoustic and Kinematic Vowel Space Areas With and Without Normalization for Speakers With and Without Dysarthria.

American Journal of Speech-language... Aug 2023

Few studies have reported on the vowel space area (VSA) in both acoustic and kinematic domains. This study examined acoustic and kinematic VSAs for speakers with and...

Summary PubMed Full Text PDF

Authors: Christina Kuo, Jeffrey Berry

PURPOSE

Few studies have reported on the vowel space area (VSA) in both acoustic and kinematic domains. This study examined acoustic and kinematic VSAs for speakers with and without dysarthria and evaluated effects of normalization on acoustic and kinematic VSAs and the relationship between these measures.

METHOD

Vowel data from 12 speakers with and without dysarthria, presenting with a range of speech abilities, were examined. The speakers included four speakers with Parkinson's disease (PD), four speakers with brain injury (BI), and four neurotypical (NT) speakers. Speech acoustic and kinematic data were acquired simultaneously using electromagnetic articulography during a passage reading task. Raw and normalized VSAs calculated from corner vowels /i/, /æ/, /ɑ/, and /u/ were evaluated. Normalization was achieved through score transformations to the acoustic and kinematic data. The effect of normalization on variability within and across groups was evaluated. Regression analysis was used across speakers to assess the association between acoustic and kinematic VSAs for both raw and normalized data.

RESULTS

When evaluating the speakers as three different groups (i.e., PD, BI, and NT), normalization reduced the standard deviations within each group and changed the relative differences in average magnitude between groups. Regression analysis revealed a significant relationship between normalized, but not raw, acoustic and kinematic VSAs, after the exclusion of an outlier speaker.

CONCLUSIONS

Normalization reduces the variability across speakers, within groups, and changes average magnitudes affecting speaker group comparisons. Normalization also influences the correlation between acoustic and kinematic measures. Further investigation of the impact of normalization techniques upon acoustic and kinematic measures is warranted.

SUPPLEMENTAL MATERIAL

https://doi.org/10.23641/asha.22669747.

Topics: Humans; Speech Intelligibility; Speech Production Measurement; Speech Acoustics; Dysarthria; Biomechanical Phenomena; Acoustics; Parkinson Disease; Phonetics

PubMed: 37105919
DOI: 10.1044/2023_AJSLP-22-00158

Rate Modulation Abilities in Acquired Motor Speech Disorders.

Journal of Speech, Language, and... Aug 2023

The purpose of this study was to describe, compare, and understand speech modulation capabilities of patients with varying motor speech disorders (MSDs) in a paradigm in...

Summary PubMed Full Text PDF

Authors: Rene L Utianski, Joseph R Duffy, Peter R Martin...

PURPOSE

The purpose of this study was to describe, compare, and understand speech modulation capabilities of patients with varying motor speech disorders (MSDs) in a paradigm in which patients made highly cued attempts to speak faster or slower.

METHOD

Twenty-nine patients, 12 with apraxia of speech (AOS; four phonetic and eight prosodic subtype), eight with dysarthria (six hypokinetic and two spastic subtype), and nine patients without any neurogenic MSD completed a standard motor speech evaluation where they were asked to repeat words and sentences, which served as their "natural" speaking rate. They were then asked to repeat lower complexity (counting 1-5; repeating "cat" and "catnip" 3 times each) and higher complexity stimuli (repeating "catastrophe" and "stethoscope" 3 times each and "My physician wrote out a prescription" once) as fast/slow as possible. Word durations and interword intervals were measured. Linear mixed-effects models were used to assess differences related to MSD subtype and stimuli complexity on bidirectional rate modulation capacity as indexed by word duration and interword interval. Articulatory accuracy was also judged and compared.

RESULTS

Patients with prosodic AOS demonstrated a reduced ability to go faster; while they performed similarly to patients with spastic dysarthria when counting, patients with spastic dysarthria were able to increase rate similar to controls during sentence repetition; patients with prosodic AOS could not and made increased articulatory errors attempting to increase rate. AOS patients made more articulatory errors relative to other groups, regardless of condition; however, their percentage of errors reduced with an intentionally slowed speaking rate.

CONCLUSIONS

The findings suggest comparative rate modulation abilities in conjunction with their impact on articulatory accuracy may support differential diagnosis between healthy and abnormal speech and among subtypes of MSDs (i.e., type of dysarthria or AOS). Findings need to be validated in a larger, more representative cohort encompassing several types of MSDs.

SUPPLEMENTAL MATERIAL

https://doi.org/10.23641/asha.22044632.

Topics: Humans; Dysarthria; Speech; Apraxias; Phonetics; Speech Production Measurement; Speech Disorders

PubMed: 36780318
DOI: 10.1044/2022_JSLHR-22-00286

Research on Speech Synthesis Based on Mixture Alignment Mechanism.

Sensors (Basel, Switzerland) Aug 2023

In recent years, deep learning-based speech synthesis has attracted a lot of attention from the machine learning and speech communities. In this paper, we propose...

Summary PubMed Full Text PDF

Authors: Yan Deng, Ning Wu, Chengjun Qiu...

In recent years, deep learning-based speech synthesis has attracted a lot of attention from the machine learning and speech communities. In this paper, we propose Mixture-TTS, a non-autoregressive speech synthesis model based on mixture alignment mechanism. Mixture-TTS aims to optimize the alignment information between text sequences and mel-spectrogram. Mixture-TTS uses a linguistic encoder based on soft phoneme-level alignment and hard word-level alignment approaches, which explicitly extract word-level semantic information, and introduce pitch and energy predictors to optimally predict the rhythmic information of the audio. Specifically, Mixture-TTS introduces a post-net based on a five-layer 1D convolution network to optimize the reconfiguration capability of the mel-spectrogram. We connect the output of the decoder to the post-net through the residual network. The mel-spectrogram is converted into the final audio by the HiFi-GAN vocoder. We evaluate the performance of the Mixture-TTS on the AISHELL3 and LJSpeech datasets. Experimental results show that Mixture-TTS is somewhat better in alignment information between the text sequences and mel-spectrogram, and is able to achieve high-quality audio. The ablation studies demonstrate that the structure of Mixture-TTS is effective.

Topics: Speech; Linguistics; Machine Learning; Semantics

PubMed: 37631819
DOI: 10.3390/s23167283

Speech and language processing with deep learning for dementia diagnosis: A systematic review.

Psychiatry Research Nov 2023

Dementia is a progressive neurodegenerative disease that burdens the person living with the disease, their families, and medical and social services. Timely diagnosis of... (Review)

Summary PubMed Full Text

Review

Authors: Mengke Shi, Gary Cheung, Seyed Reza Shahamiri...

Dementia is a progressive neurodegenerative disease that burdens the person living with the disease, their families, and medical and social services. Timely diagnosis of dementia could be followed by introducing interventions that may slow down its progression or reduce its burdens. However, the diagnostic process of dementia is often complex and resource intensive. Access to diagnostic services is also an issue in low and middle-income countries. The abundance and easy accessibility of speech and language data have created new possibilities for utilizing Deep Learning (DL) technologies to be part of the dementia diagnostic process. This systematic review included studies published between 2012-2022 that utilized such technologies to aid in diagnosing dementia. We identified 72 studies using the PRISMA 2020 protocol, extracted and analyzed data from these studies and reported the related DL technologies. We found these technologies effectively differentiated between healthy individuals and those with a dementia diagnosis, highlighting their potential in the diagnosis of dementia. This systematic review provides insights into the contributions of DL-based speech and language techniques to support the dementia diagnostic process. It also offers an understanding of the advancements made in this field thus far and highlights some challenges that still need to be addressed.

Topics: Humans; Speech; Deep Learning; Neurodegenerative Diseases; Language; Dementia

PubMed: 37864994
DOI: 10.1016/j.psychres.2023.115538

Does Electrophysiological Maturation Shape Language Acquisition?

Perspectives on Psychological Science :... Nov 2023

Infants master temporal patterns of their native language at a developmental trajectory from slow to fast: Shortly after birth, they recognize the slow acoustic...

Summary PubMed Full Text PDF

Authors: Katharina H Menn, Claudia Männel, Lars Meyer...

Infants master temporal patterns of their native language at a developmental trajectory from slow to fast: Shortly after birth, they recognize the slow acoustic modulations specific to their native language before tuning into faster language-specific patterns between 6 and 12 months of age. We propose here that this trajectory is constrained by neuronal maturation-in particular, the gradual emergence of high-frequency neural oscillations in the infant electroencephalogram. Infants' initial focus on slow prosodic modulations is consistent with the prenatal availability of slow electrophysiological activity (i.e., theta- and delta-band oscillations). Our proposal is consistent with the temporal patterns of infant-directed speech, which initially amplifies slow modulations, approaching the faster modulation range of adult-directed speech only as infants' language has advanced sufficiently. Moreover, our proposal agrees with evidence from premature infants showing maturational age is a stronger predictor of language development than ex utero exposure to speech, indicating that premature infants cannot exploit their earlier availability of speech because of electrophysiological constraints. In sum, we provide a new perspective on language acquisition emphasizing neuronal development as a critical driving force of infants' language development.

Topics: Infant; Adult; Female; Pregnancy; Humans; Language Development; Language; Speech; Speech Perception

PubMed: 36753616
DOI: 10.1177/17456916231151584

Assessment protocol for acquired apraxia of speech.

CoDAS 2023

To develop an assessment protocol for speech motor planning with phonologically balanced stimuli for Brazilian Portuguese, including all necessary variables for this...

Summary PubMed Full Text PDF

Authors: Beatriz Maurer Costa, Cláudia Regina Brescancini, Karin Zazo Ortiz...

PURPOSE

To develop an assessment protocol for speech motor planning with phonologically balanced stimuli for Brazilian Portuguese, including all necessary variables for this diagnosis.

METHODS

Three stages were carried out: In the first, word lists were built with the main criterion being syllabic and accentual patterns. From the survey conducted in Stage 1, the words that composed the first version of the protocol lists in Stage 2 were selected, and grouped into two fundamental tasks for diagnosing acquired apraxia of speech (AOS): repetition and Reading Aloud (RA). In Stage 3, the occurrence of words was investigated using the Brazilian Corpus (PUC-SP) - Linguateca database, and a statistical analysis was performed to verify if the repetition and RA lists were balanced in terms of the occurrences. Thus, the lists were distributed in quartiles and submitted to both descriptive and bivariate analyses. A significance level of 5% (p<0.05) was adopted.

RESULTS

After completion of all stages, the words that composed the lists of the repetition and RA tasks were obtained. Finally, other tasks considered essential for the assessment of AOS, such as diadochokinetic rates and the board for spontaneous oral emission, were then added to the protocol.

CONCLUSION

The developed protocol contains the tasks considered standard for the assessment of AOS according to the international literature, which makes this instrument important for diagnosing this disorder in speakers of Brazilian Portuguese.

Topics: Humans; Speech; Speech Disorders; Speech Production Measurement; Apraxias; Language

PubMed: 37851756
DOI: 10.1590/2317-1782/20232022251pt

Long-term effects of subthalamic nucleus deep brain stimulation on speech in Parkinson's disease.

Scientific Reports Jul 2023

Bilateral subthalamic nucleus deep brain stimulation (STN-DBS) is an effective treatment in advanced Parkinson's Disease (PD). However, the effects of STN-DBS on speech...

Summary PubMed Full Text PDF

Authors: Annalisa Gessani, Francesco Cavallieri, Valentina Fioravanti...

Bilateral subthalamic nucleus deep brain stimulation (STN-DBS) is an effective treatment in advanced Parkinson's Disease (PD). However, the effects of STN-DBS on speech are still debated, particularly in the long-term follow-up. The objective of this study was to evaluate the long-term effects of bilateral STN-DBS on speech in a cohort of advanced PD patients treated with bilateral STN-DBS. Each patient was assessed before surgery through a neurological evaluation and a perceptual-acoustic analysis of speech and re-assessed in the long-term in different stimulation and drug conditions. The primary outcome was the percentage change of speech intelligibility obtained by comparing the postoperative on-stimulation/off-medication condition with the preoperative off-medication condition. Twenty-five PD patients treated with bilateral STN-DBS with a 5-year follow-up were included. In the long-term, speech intelligibility stayed at the same level as preoperative values when compared with preoperative values. STN-DBS induced a significant acute improvement of speech intelligibility (p < 0.005) in the postoperative assessment when compared to the on-stimulation/off-medication and off-stimulation/off-medication conditions. These results highlight that STN-DBS may handle speech intelligibility even in the long-term.

Topics: Humans; Parkinson Disease; Subthalamic Nucleus; Deep Brain Stimulation; Treatment Outcome; Speech Intelligibility

PubMed: 37454168
DOI: 10.1038/s41598-023-38555-2

A Contemporary Review of Clinical Factors Involved in Speech-Perspectives from a Prosthodontist Point of View.

Medicina (Kaunas, Lithuania) Jul 2023

Learning to speak properly requires a fully formed brain, good eyesight, and a functioning auditory system. Defective phonation is the outcome of a failure in the... (Review)

Summary PubMed Full Text PDF

Review

Authors: Dana Gabriela Budală, Costin Iulian Lupu, Roxana Ionela Vasluianu...

Learning to speak properly requires a fully formed brain, good eyesight, and a functioning auditory system. Defective phonation is the outcome of a failure in the development of any of the systems or components involved in speech production. Dentures with strong phonetic skills can be fabricated with the help of a dentist who has a firm grasp of speech production and phonetic characteristics. Every dentist strives to perfect their craft by perfecting the balance between the technical, cosmetic, and acoustic aspects of dentistry, or "phonetics". The ideal prosthesis for a patient is one that not only sounds good but also functions well mechanically and aesthetically. Words are spoken by using articulators that alter their size and form. : Therefore, a prosthesis should be made in such a way that it does not interfere with the ability to communicate. As a result, a prosthodontist has to have a solid grasp of how speech is made and the numerous parts that go into it.

Topics: Humans; Speech; Phonetics; Phonation; Learning; Brain

PubMed: 37512133
DOI: 10.3390/medicina59071322