-
Brain Research Bulletin Sep 2023Decoding brain activity is conducive to the breakthrough of brain-computer interface (BCI) technology. The development of artificial intelligence (AI) continually... (Review)
Review
Decoding brain activity is conducive to the breakthrough of brain-computer interface (BCI) technology. The development of artificial intelligence (AI) continually promotes the progress of brain language decoding technology. Existent research has mainly focused on a single modality and paid insufficient attention to AI methods. Therefore, our objective is to provide an overview of relevant decoding research from the perspective of different modalities and methodologies. The modalities involve text, speech, image, and video, whereas the core method is using AI-built decoders to translate brain signals induced by multimodal stimuli into text or vocal language. The semantic information of brain activity can be successfully decoded into a language at various levels, ranging from words through sentences to discourses. However, the decoding effect is affected by various factors, such as the decoding model, vector representation model, and brain regions. Challenges and future directions are also discussed. The advances in brain language decoding and BCI technology will potentially assist patients with clinical aphasia in regaining the ability to communicate.
Topics: Humans; Artificial Intelligence; Brain; Language; Speech
PubMed: 37487829
DOI: 10.1016/j.brainresbull.2023.110713 -
Translational Psychiatry Sep 2023Speech is a promising biomarker for schizophrenia spectrum disorder (SSD) and major depressive disorder (MDD). This proof of principle study investigates previously...
Speech is a promising biomarker for schizophrenia spectrum disorder (SSD) and major depressive disorder (MDD). This proof of principle study investigates previously studied speech acoustics in combination with a novel application of voice pathology features as objective and reproducible classifiers for depression, schizophrenia, and healthy controls (HC). Speech and voice features for classification were calculated from recordings of picture descriptions from 240 speech samples (20 participants with SSD, 20 with MDD, and 20 HC each with 4 samples). Binary classification support vector machine (SVM) models classified the disorder groups and HC. For each feature, the permutation feature importance was calculated, and the top 25% most important features were used to compare differences between the disorder groups and HC including correlations between the important features and symptom severity scores. Multiple kernels for SVM were tested and the pairwise models with the best performing kernel (3-degree polynomial) were highly accurate for each classification: 0.947 for HC vs. SSD, 0.920 for HC vs. MDD, and 0.932 for SSD vs. MDD. The relatively most important features were measures of articulation coordination, number of pauses per minute, and speech variability. There were moderate correlations between important features and positive symptoms for SSD. The important features suggest that speech characteristics relating to psychomotor slowing, alogia, and flat affect differ between HC, SSD, and MDD.
Topics: Humans; Speech; Depressive Disorder, Major; Depression; Schizophrenia; Support Vector Machine
PubMed: 37726285
DOI: 10.1038/s41398-023-02594-0 -
Systematic Reviews Jul 2023We systematically reviewed the literature and performed a meta-analysis on the effects of speech therapy and phonosurgery, for transgender women, in relation to the... (Meta-Analysis)
Meta-Analysis Review
BACKGROUND
We systematically reviewed the literature and performed a meta-analysis on the effects of speech therapy and phonosurgery, for transgender women, in relation to the fundamental frequency gain of the voice, regarding the type of vocal sample collected, and we compared the effectiveness of the treatments. In addition, the study design, year, country, types of techniques used, total therapy time, and vocal assessment protocols were analyzed.
METHODS
We searched the PubMed, Lilacs, and SciELO databases for observational studies and clinical trials, published in English, Portuguese, or Spanish, between January 2010 and January 2023. The selection of studies was carried out according to Prisma 2020. The quality of selected studies was assessed using the Newcastle-Ottawa scale.
RESULTS
Of 493 studies, 31 were deemed potentially eligible and retrieved for full-text review and 16 were included in the systematic review and meta-analysis. Six studies performed speech therapy and ten studies phonosurgery. The speech therapy time did not influence the post-treatment gain in voice fundamental frequency (p = 0.6254). The type of sample collected significantly influenced the post-treatment voice frequency gain (p < 0.01). When the vocal sample was collected through vowel (p < 0.01) and reading (p < 0.01), the gain was significantly more heterogeneous between the different types of treatment. Phonosurgery is significantly more effective in terms of fundamental frequency gain compared to speech therapy alone, regardless of the type of sample collected (p < 0.01). The average gain of fundamental frequency after speech therapy, in the /a/ vowel sample, was 27 Hz, 39.05 Hz in reading, and 25.42 Hz in spontaneous speech. In phonosurgery, there was a gain of 71.68 Hz for the vowel /a/, 41.07 Hz in reading, and 39.09 Hz in spontaneous speech. The study with the highest gain (110 Hz) collected vowels, and the study with the lowest gain (15 Hz), spontaneous speech. The major of the included studies received a score between 4 and 8 on the Newcastle-Ottawa Scale.
CONCLUSION
The type of vocal sample collected influences the gain result of the fundamental frequency after treatment. Speech therapy and phonosurgery increased the fundamental frequency and improved female voice perception and vocal satisfaction. However, phonosurgery yielded a greater fundamental frequency gain in the different samples collected. The study protocol was registered at Prospero (CRD42017078446).
Topics: Female; Humans; Speech Therapy; Transgender Persons; Speech; Voice; Databases, Factual
PubMed: 37481572
DOI: 10.1186/s13643-023-02267-5 -
Nature Feb 2024Humans are capable of generating extraordinarily diverse articulatory movement combinations to produce meaningful speech. This ability to orchestrate specific phonetic...
Humans are capable of generating extraordinarily diverse articulatory movement combinations to produce meaningful speech. This ability to orchestrate specific phonetic sequences, and their syllabification and inflection over subsecond timescales allows us to produce thousands of word sounds and is a core component of language. The fundamental cellular units and constructs by which we plan and produce words during speech, however, remain largely unknown. Here, using acute ultrahigh-density Neuropixels recordings capable of sampling across the cortical column in humans, we discover neurons in the language-dominant prefrontal cortex that encoded detailed information about the phonetic arrangement and composition of planned words during the production of natural speech. These neurons represented the specific order and structure of articulatory events before utterance and reflected the segmentation of phonetic sequences into distinct syllables. They also accurately predicted the phonetic, syllabic and morphological components of upcoming words and showed a temporally ordered dynamic. Collectively, we show how these mixtures of cells are broadly organized along the cortical column and how their activity patterns transition from articulation planning to production. We also demonstrate how these cells reliably track the detailed composition of consonant and vowel sounds during perception and how they distinguish processes specifically related to speaking from those related to listening. Together, these findings reveal a remarkably structured organization and encoding cascade of phonetic representations by prefrontal neurons in humans and demonstrate a cellular process that can support the production of speech.
Topics: Humans; Movement; Neurons; Phonetics; Speech; Speech Perception; Prefrontal Cortex
PubMed: 38297120
DOI: 10.1038/s41586-023-06982-w -
Scientific Reports Feb 2024Speech emotion recognition (SER) has gained an increased interest during the last decades as part of enriched affective computing. As a consequence, a variety of...
Speech emotion recognition (SER) has gained an increased interest during the last decades as part of enriched affective computing. As a consequence, a variety of engineering approaches have been developed addressing the challenge of the SER problem, exploiting different features, learning algorithms, and datasets. In this paper, we propose the application of the graph theory for classifying emotionally-colored speech signals. Graph theory provides tools for extracting statistical as well as structural information from any time series. We propose to use the mentioned information as a novel feature set. Furthermore, we suggest setting a unique feature-based identity for each emotion belonging to each speaker. The emotion classification is performed by a Random Forest classifier in a Leave-One-Speaker-Out Cross Validation (LOSO-CV) scheme. The proposed method is compared with two state-of-the-art approaches involving well known hand-crafted features as well as deep learning architectures operating on mel-spectrograms. Experimental results on three datasets, EMODB (German, acted) and AESDD (Greek, acted), and DEMoS (Italian, in-the-wild), reveal that our proposed method outperforms the comparative methods in these datasets. Specifically, we observe an average UAR increase of almost [Formula: see text], [Formula: see text] and [Formula: see text], respectively.
Topics: Speech; Emotions; Algorithms
PubMed: 38396002
DOI: 10.1038/s41598-024-52989-2 -
Proceedings of the National Academy of... Oct 2023Human cognition is underpinned by structured internal representations that encode relationships between entities in the world (cognitive maps). Clinical features of...
Human cognition is underpinned by structured internal representations that encode relationships between entities in the world (cognitive maps). Clinical features of schizophrenia-from thought disorder to delusions-are proposed to reflect disorganization in such conceptual representations. Schizophrenia is also linked to abnormalities in neural processes that support cognitive map representations, including hippocampal replay and high-frequency ripple oscillations. Here, we report a computational assay of semantically guided conceptual sampling and exploit this to test a hypothesis that people with schizophrenia (PScz) exhibit abnormalities in semantically guided cognition that relate to hippocampal replay and ripples. Fifty-two participants [26 PScz (13 unmedicated) and 26 age-, gender-, and intelligence quotient (IQ)-matched nonclinical controls] completed a category- and letter-verbal fluency task, followed by a magnetoencephalography (MEG) scan involving a separate sequence-learning task. We used a pretrained word embedding model of semantic similarity, coupled to a computational model of word selection, to quantify the degree to which each participant's verbal behavior was guided by semantic similarity. Using MEG, we indexed neural replay and ripple power in a post-task rest session. Across all participants, word selection was strongly influenced by semantic similarity. The strength of this influence showed sensitivity to task demands (category > letter fluency) and predicted performance. In line with our hypothesis, the influence of semantic similarity on behavior was reduced in schizophrenia relative to controls, predicted negative psychotic symptoms, and correlated with an MEG signature of hippocampal ripple power (but not replay). The findings bridge a gap between phenomenological and neurocomputational accounts of schizophrenia.
Topics: Humans; Schizophrenia; Semantics; Psychotic Disorders; Verbal Behavior; Learning
PubMed: 37816054
DOI: 10.1073/pnas.2305290120 -
CoDAS 2023To present evidence of intra- and inter-rater reliability and internal consistency of the Phonological Assessment Instrument scores, so that it can be considered...
PURPOSE
To present evidence of intra- and inter-rater reliability and internal consistency of the Phonological Assessment Instrument scores, so that it can be considered reliable and valid for use in clinical practice.
METHODS
179 audio recordings of the instrument's speech samples were analyzed. The collection was carried out from its application in the period of 5 months in children aged from five to eight years and 11 months. Three expert judges transcribed the speech production of each child into the software, which generated performance reports. The speech data of each child were compared between these evaluators, who were trained and experienced in phonetic transcription, to verify the agreement of the instrument scores. For the reliability analysis, the internal consistency was verified using Cronbach's Alpha and the intra and inter-rater reliability using the Intraclass Correlation Coefficient.
RESULTS
The Phonological Assessment Instrument showed evidence of high internal consistency, with scores indicating excellent reliability for the assessment of Brazilian Portuguese phonemes, as well as adequate agreement among the judges regarding the instrument scores.
CONCLUSION
The instrument presented robust evidence of reliability, being a reliable and safe option to be used in Brazilian research and clinical practice to evaluate the phonological system of Brazilian children.
Topics: Child; Humans; Reproducibility of Results; Language; Speech; Phonetics; Brazil
PubMed: 37970894
DOI: 10.1590/2317-1782/20232022303pt -
Medicina (Kaunas, Lithuania) Jul 2023More and more children with severe-to-profound hearing loss are receiving cochlear implants (CIs) at an early age to improve their hearing and listening abilities,... (Review)
Review
More and more children with severe-to-profound hearing loss are receiving cochlear implants (CIs) at an early age to improve their hearing and listening abilities, speech recognition, speech intelligibility, and other aspects of spoken language development. Despite this, the rehabilitation outcomes can be very heterogeneous in this population, not only because of issues related to surgery and fitting or the specific characteristics of the child with his/her additional disabilities but also because of huge differences in the quality of the support and rehabilitation offered by the therapist and the family. These quality standards for the rehabilitation of young deaf children receiving CIs are developed within the European KA202 Erasmus+ project "VOICE"-vocational education and training for speech and language therapists and parents for the rehabilitation of children with CIs, Ref. No.: 2020-1-RO01-KA202-080059. To develop these quality standards, we used the input from the face-to-face interviews of 11 local rehabilitation experts in CIs from the four partner countries of the project and the outcomes of the bibliographic analysis of 848 publications retrieved from six databases: Pub Med, Psych Info, CINAHL, Scopus, Eric, and Cochrane. Based on all this information, we created a first set of 32 quality standards over four domains: general, fitting, rehabilitation, and for professionals. Further on, the Delphi method was used by 18 international rehabilitation experts to discuss and agree on these standards. The results from the literature analysis and the interviews show us that more than 90% of the consulted international experts agreed on 29 quality standards. They focus on different aspects of rehabilitation: the multidisciplinary team, their expertise and knowledge, important rehabilitation topics to focus on, and programming issues related to rehabilitation. These quality standards aim to optimize the activity of speech rehabilitation specialists so that they reach the optimal level of expertise. Also presented is the necessary equipment for the IC team to carry out the rehabilitation sessions in good conditions. This set of quality standards can be useful to ensure the appropriate postoperative care of these children. As a result, the rehabilitation process will be more relaxed, and therapists will have the opportunity to focus more on the specific needs of each child, with the provision of quality services, which will result in better results. This theme is particularly complex and dependent on multifactorial aspects of medicine, education, speech therapy, social work, and psychology that are very intricate and interdependent.
Topics: Humans; Child; Female; Male; Cochlear Implants; Cochlear Implantation; Hearing Loss, Sensorineural; Speech Intelligibility; Treatment Outcome
PubMed: 37512167
DOI: 10.3390/medicina59071354 -
Journal of Speech, Language, and... Aug 2023Nonnative consonant cluster learning has become a useful experimental approach for learning about speech motor learning, and we sought to enhance our understanding of...
PURPOSE
Nonnative consonant cluster learning has become a useful experimental approach for learning about speech motor learning, and we sought to enhance our understanding of this area and to establish best practices for this type of research.
METHOD
One hundred twenty individuals completed a nonnative consonant cluster learning task within a speech motor learning paradigm. Following a brief prepractice, participants then practiced the production of eight word-initial nonnative consonant clusters embedded in bisyllabic nonwords (e.g., GD in /gdivu/). The clusters ranged in difficulty according to linguistic typology and sonority sequencing. Acquisition was operationalized as the change across the practice section and learning was assessed with two retention sessions (R1: 30 min after practice; R2: 2 days after practice). We evaluated changes in accuracy as well as in the acoustic details of the cluster production at each time point.
RESULTS
Overall, participants improved in their production of the consonant clusters. Accuracy increased, and duration measures decreased in specific measures associated with cluster production. The change in coordination measured in the acoustics changed both for clusters that were incorrectly produced and for those that were correctly produced, indicating continued motor learning even in accurate tokens.
CONCLUSIONS
These results aid our understanding of the complexity of nonnative consonant cluster learning. In particular, both factors related to both phonological and speech motor control properties affect the learning of novel speech sequences.
SUPPLEMENTAL MATERIAL
https://doi.org/10.23641/asha.21844185.
Topics: Humans; Phonetics; Speech; Learning; Acoustics
PubMed: 36634242
DOI: 10.1044/2022_JSLHR-22-00322 -
Nature Communications Sep 2023Imagine being in a crowded room with a cacophony of speakers and having the ability to focus on or remove speech from a specific 2D region. This would require...
Imagine being in a crowded room with a cacophony of speakers and having the ability to focus on or remove speech from a specific 2D region. This would require understanding and manipulating an acoustic scene, isolating each speaker, and associating a 2D spatial context with each constituent speech. However, separating speech from a large number of concurrent speakers in a room into individual streams and identifying their precise 2D locations is challenging, even for the human brain. Here, we present the first acoustic swarm that demonstrates cooperative navigation with centimeter-resolution using sound, eliminating the need for cameras or external infrastructure. Our acoustic swarm forms a self-distributing wireless microphone array, which, along with our attention-based neural network framework, lets us separate and localize concurrent human speakers in the 2D space, enabling speech zones. Our evaluations showed that the acoustic swarm could localize and separate 3-5 concurrent speech sources in real-world unseen reverberant environments with median and 90-percentile 2D errors of 15 cm and 50 cm, respectively. Our system enables applications like mute zones (parts of the room where sounds are muted), active zones (regions where sounds are captured), multi-conversation separation and location-aware interaction.
Topics: Humans; Speech; Acoustics; Sound; Communication; Awareness
PubMed: 37735445
DOI: 10.1038/s41467-023-40869-8