-
Communications Biology Mar 2024When engaged in a conversation, one receives auditory information from the other's speech but also from their own speech. However, this information is processed...
When engaged in a conversation, one receives auditory information from the other's speech but also from their own speech. However, this information is processed differently by an effect called Speech-Induced Suppression. Here, we studied brain representation of acoustic properties of speech in natural unscripted dialogues, using electroencephalography (EEG) and high-quality speech recordings from both participants. Using encoding techniques, we were able to reproduce a broad range of previous findings on listening to another's speech, and achieving even better performances when predicting EEG signal in this complex scenario. Furthermore, we found no response when listening to oneself, using different acoustic features (spectrogram, envelope, etc.) and frequency bands, evidencing a strong effect of SIS. The present work shows that this mechanism is present, and even stronger, during natural dialogues. Moreover, the methodology presented here opens the possibility of a deeper understanding of the related mechanisms in a wider range of contexts.
Topics: Humans; Speech; Acoustic Stimulation; Electroencephalography; Brain; Brain Mapping
PubMed: 38459110
DOI: 10.1038/s42003-024-05945-9 -
Scientific Reports Dec 2023While speech biomarkers of disease have attracted increased interest in recent years, a challenge is that features derived from signal processing or machine learning...
While speech biomarkers of disease have attracted increased interest in recent years, a challenge is that features derived from signal processing or machine learning approaches may lack clinical interpretability. As an example, Mel frequency cepstral coefficients (MFCCs) have been identified in several studies as a useful marker of disease, but are regarded as uninterpretable. Here we explore correlations between MFCC coefficients and more interpretable speech biomarkers. In particular we quantify the MFCC2 endpoint, which can be interpreted as a weighted ratio of low- to high-frequency energy, a concept which has been previously linked to disease-induced voice changes. By exploring MFCC2 in several datasets, we show how its sensitivity to disease can be increased by adjusting computation parameters.
Topics: Speech; Speech Acoustics; Signal Processing, Computer-Assisted
PubMed: 38123603
DOI: 10.1038/s41598-023-49352-2 -
American Journal of Speech-language... Aug 2023Few studies have reported on the vowel space area (VSA) in both acoustic and kinematic domains. This study examined acoustic and kinematic VSAs for speakers with and...
PURPOSE
Few studies have reported on the vowel space area (VSA) in both acoustic and kinematic domains. This study examined acoustic and kinematic VSAs for speakers with and without dysarthria and evaluated effects of normalization on acoustic and kinematic VSAs and the relationship between these measures.
METHOD
Vowel data from 12 speakers with and without dysarthria, presenting with a range of speech abilities, were examined. The speakers included four speakers with Parkinson's disease (PD), four speakers with brain injury (BI), and four neurotypical (NT) speakers. Speech acoustic and kinematic data were acquired simultaneously using electromagnetic articulography during a passage reading task. Raw and normalized VSAs calculated from corner vowels /i/, /æ/, /ɑ/, and /u/ were evaluated. Normalization was achieved through score transformations to the acoustic and kinematic data. The effect of normalization on variability within and across groups was evaluated. Regression analysis was used across speakers to assess the association between acoustic and kinematic VSAs for both raw and normalized data.
RESULTS
When evaluating the speakers as three different groups (i.e., PD, BI, and NT), normalization reduced the standard deviations within each group and changed the relative differences in average magnitude between groups. Regression analysis revealed a significant relationship between normalized, but not raw, acoustic and kinematic VSAs, after the exclusion of an outlier speaker.
CONCLUSIONS
Normalization reduces the variability across speakers, within groups, and changes average magnitudes affecting speaker group comparisons. Normalization also influences the correlation between acoustic and kinematic measures. Further investigation of the impact of normalization techniques upon acoustic and kinematic measures is warranted.
SUPPLEMENTAL MATERIAL
https://doi.org/10.23641/asha.22669747.
Topics: Humans; Speech Intelligibility; Speech Production Measurement; Speech Acoustics; Dysarthria; Biomechanical Phenomena; Acoustics; Parkinson Disease; Phonetics
PubMed: 37105919
DOI: 10.1044/2023_AJSLP-22-00158 -
Journal of Speech, Language, and... Aug 2023The purpose of this study was to describe, compare, and understand speech modulation capabilities of patients with varying motor speech disorders (MSDs) in a paradigm in...
PURPOSE
The purpose of this study was to describe, compare, and understand speech modulation capabilities of patients with varying motor speech disorders (MSDs) in a paradigm in which patients made highly cued attempts to speak faster or slower.
METHOD
Twenty-nine patients, 12 with apraxia of speech (AOS; four phonetic and eight prosodic subtype), eight with dysarthria (six hypokinetic and two spastic subtype), and nine patients without any neurogenic MSD completed a standard motor speech evaluation where they were asked to repeat words and sentences, which served as their "natural" speaking rate. They were then asked to repeat lower complexity (counting 1-5; repeating "cat" and "catnip" 3 times each) and higher complexity stimuli (repeating "catastrophe" and "stethoscope" 3 times each and "My physician wrote out a prescription" once) as fast/slow as possible. Word durations and interword intervals were measured. Linear mixed-effects models were used to assess differences related to MSD subtype and stimuli complexity on bidirectional rate modulation capacity as indexed by word duration and interword interval. Articulatory accuracy was also judged and compared.
RESULTS
Patients with prosodic AOS demonstrated a reduced ability to go faster; while they performed similarly to patients with spastic dysarthria when counting, patients with spastic dysarthria were able to increase rate similar to controls during sentence repetition; patients with prosodic AOS could not and made increased articulatory errors attempting to increase rate. AOS patients made more articulatory errors relative to other groups, regardless of condition; however, their percentage of errors reduced with an intentionally slowed speaking rate.
CONCLUSIONS
The findings suggest comparative rate modulation abilities in conjunction with their impact on articulatory accuracy may support differential diagnosis between healthy and abnormal speech and among subtypes of MSDs (i.e., type of dysarthria or AOS). Findings need to be validated in a larger, more representative cohort encompassing several types of MSDs.
SUPPLEMENTAL MATERIAL
https://doi.org/10.23641/asha.22044632.
Topics: Humans; Dysarthria; Speech; Apraxias; Phonetics; Speech Production Measurement; Speech Disorders
PubMed: 36780318
DOI: 10.1044/2022_JSLHR-22-00286 -
Sensors (Basel, Switzerland) Aug 2023In recent years, deep learning-based speech synthesis has attracted a lot of attention from the machine learning and speech communities. In this paper, we propose...
In recent years, deep learning-based speech synthesis has attracted a lot of attention from the machine learning and speech communities. In this paper, we propose Mixture-TTS, a non-autoregressive speech synthesis model based on mixture alignment mechanism. Mixture-TTS aims to optimize the alignment information between text sequences and mel-spectrogram. Mixture-TTS uses a linguistic encoder based on soft phoneme-level alignment and hard word-level alignment approaches, which explicitly extract word-level semantic information, and introduce pitch and energy predictors to optimally predict the rhythmic information of the audio. Specifically, Mixture-TTS introduces a post-net based on a five-layer 1D convolution network to optimize the reconfiguration capability of the mel-spectrogram. We connect the output of the decoder to the post-net through the residual network. The mel-spectrogram is converted into the final audio by the HiFi-GAN vocoder. We evaluate the performance of the Mixture-TTS on the AISHELL3 and LJSpeech datasets. Experimental results show that Mixture-TTS is somewhat better in alignment information between the text sequences and mel-spectrogram, and is able to achieve high-quality audio. The ablation studies demonstrate that the structure of Mixture-TTS is effective.
Topics: Speech; Linguistics; Machine Learning; Semantics
PubMed: 37631819
DOI: 10.3390/s23167283 -
Psychiatry Research Nov 2023Dementia is a progressive neurodegenerative disease that burdens the person living with the disease, their families, and medical and social services. Timely diagnosis of... (Review)
Review
Dementia is a progressive neurodegenerative disease that burdens the person living with the disease, their families, and medical and social services. Timely diagnosis of dementia could be followed by introducing interventions that may slow down its progression or reduce its burdens. However, the diagnostic process of dementia is often complex and resource intensive. Access to diagnostic services is also an issue in low and middle-income countries. The abundance and easy accessibility of speech and language data have created new possibilities for utilizing Deep Learning (DL) technologies to be part of the dementia diagnostic process. This systematic review included studies published between 2012-2022 that utilized such technologies to aid in diagnosing dementia. We identified 72 studies using the PRISMA 2020 protocol, extracted and analyzed data from these studies and reported the related DL technologies. We found these technologies effectively differentiated between healthy individuals and those with a dementia diagnosis, highlighting their potential in the diagnosis of dementia. This systematic review provides insights into the contributions of DL-based speech and language techniques to support the dementia diagnostic process. It also offers an understanding of the advancements made in this field thus far and highlights some challenges that still need to be addressed.
Topics: Humans; Speech; Deep Learning; Neurodegenerative Diseases; Language; Dementia
PubMed: 37864994
DOI: 10.1016/j.psychres.2023.115538 -
Perspectives on Psychological Science :... Nov 2023Infants master temporal patterns of their native language at a developmental trajectory from slow to fast: Shortly after birth, they recognize the slow acoustic...
Infants master temporal patterns of their native language at a developmental trajectory from slow to fast: Shortly after birth, they recognize the slow acoustic modulations specific to their native language before tuning into faster language-specific patterns between 6 and 12 months of age. We propose here that this trajectory is constrained by neuronal maturation-in particular, the gradual emergence of high-frequency neural oscillations in the infant electroencephalogram. Infants' initial focus on slow prosodic modulations is consistent with the prenatal availability of slow electrophysiological activity (i.e., theta- and delta-band oscillations). Our proposal is consistent with the temporal patterns of infant-directed speech, which initially amplifies slow modulations, approaching the faster modulation range of adult-directed speech only as infants' language has advanced sufficiently. Moreover, our proposal agrees with evidence from premature infants showing maturational age is a stronger predictor of language development than ex utero exposure to speech, indicating that premature infants cannot exploit their earlier availability of speech because of electrophysiological constraints. In sum, we provide a new perspective on language acquisition emphasizing neuronal development as a critical driving force of infants' language development.
Topics: Infant; Adult; Female; Pregnancy; Humans; Language Development; Language; Speech; Speech Perception
PubMed: 36753616
DOI: 10.1177/17456916231151584 -
CoDAS 2023To develop an assessment protocol for speech motor planning with phonologically balanced stimuli for Brazilian Portuguese, including all necessary variables for this...
PURPOSE
To develop an assessment protocol for speech motor planning with phonologically balanced stimuli for Brazilian Portuguese, including all necessary variables for this diagnosis.
METHODS
Three stages were carried out: In the first, word lists were built with the main criterion being syllabic and accentual patterns. From the survey conducted in Stage 1, the words that composed the first version of the protocol lists in Stage 2 were selected, and grouped into two fundamental tasks for diagnosing acquired apraxia of speech (AOS): repetition and Reading Aloud (RA). In Stage 3, the occurrence of words was investigated using the Brazilian Corpus (PUC-SP) - Linguateca database, and a statistical analysis was performed to verify if the repetition and RA lists were balanced in terms of the occurrences. Thus, the lists were distributed in quartiles and submitted to both descriptive and bivariate analyses. A significance level of 5% (p<0.05) was adopted.
RESULTS
After completion of all stages, the words that composed the lists of the repetition and RA tasks were obtained. Finally, other tasks considered essential for the assessment of AOS, such as diadochokinetic rates and the board for spontaneous oral emission, were then added to the protocol.
CONCLUSION
The developed protocol contains the tasks considered standard for the assessment of AOS according to the international literature, which makes this instrument important for diagnosing this disorder in speakers of Brazilian Portuguese.
Topics: Humans; Speech; Speech Disorders; Speech Production Measurement; Apraxias; Language
PubMed: 37851756
DOI: 10.1590/2317-1782/20232022251pt -
Scientific Reports Jul 2023Bilateral subthalamic nucleus deep brain stimulation (STN-DBS) is an effective treatment in advanced Parkinson's Disease (PD). However, the effects of STN-DBS on speech...
Bilateral subthalamic nucleus deep brain stimulation (STN-DBS) is an effective treatment in advanced Parkinson's Disease (PD). However, the effects of STN-DBS on speech are still debated, particularly in the long-term follow-up. The objective of this study was to evaluate the long-term effects of bilateral STN-DBS on speech in a cohort of advanced PD patients treated with bilateral STN-DBS. Each patient was assessed before surgery through a neurological evaluation and a perceptual-acoustic analysis of speech and re-assessed in the long-term in different stimulation and drug conditions. The primary outcome was the percentage change of speech intelligibility obtained by comparing the postoperative on-stimulation/off-medication condition with the preoperative off-medication condition. Twenty-five PD patients treated with bilateral STN-DBS with a 5-year follow-up were included. In the long-term, speech intelligibility stayed at the same level as preoperative values when compared with preoperative values. STN-DBS induced a significant acute improvement of speech intelligibility (p < 0.005) in the postoperative assessment when compared to the on-stimulation/off-medication and off-stimulation/off-medication conditions. These results highlight that STN-DBS may handle speech intelligibility even in the long-term.
Topics: Humans; Parkinson Disease; Subthalamic Nucleus; Deep Brain Stimulation; Treatment Outcome; Speech Intelligibility
PubMed: 37454168
DOI: 10.1038/s41598-023-38555-2 -
Medicina (Kaunas, Lithuania) Jul 2023Learning to speak properly requires a fully formed brain, good eyesight, and a functioning auditory system. Defective phonation is the outcome of a failure in the... (Review)
Review
Learning to speak properly requires a fully formed brain, good eyesight, and a functioning auditory system. Defective phonation is the outcome of a failure in the development of any of the systems or components involved in speech production. Dentures with strong phonetic skills can be fabricated with the help of a dentist who has a firm grasp of speech production and phonetic characteristics. Every dentist strives to perfect their craft by perfecting the balance between the technical, cosmetic, and acoustic aspects of dentistry, or "phonetics". The ideal prosthesis for a patient is one that not only sounds good but also functions well mechanically and aesthetically. Words are spoken by using articulators that alter their size and form. : Therefore, a prosthesis should be made in such a way that it does not interfere with the ability to communicate. As a result, a prosthodontist has to have a solid grasp of how speech is made and the numerous parts that go into it.
Topics: Humans; Speech; Phonetics; Phonation; Learning; Brain
PubMed: 37512133
DOI: 10.3390/medicina59071322