-
Scientific Reports Jun 2024Cochlear implants (CIs) do not offer the same level of effectiveness in noisy environments as in quiet settings. Current single-microphone noise reduction algorithms in...
Cochlear implants (CIs) do not offer the same level of effectiveness in noisy environments as in quiet settings. Current single-microphone noise reduction algorithms in hearing aids and CIs only remove predictable, stationary noise, and are ineffective against realistic, non-stationary noise such as multi-talker interference. Recent developments in deep neural network (DNN) algorithms have achieved noteworthy performance in speech enhancement and separation, especially in removing speech noise. However, more work is needed to investigate the potential of DNN algorithms in removing speech noise when tested with listeners fitted with CIs. Here, we implemented two DNN algorithms that are well suited for applications in speech audio processing: (1) recurrent neural network (RNN) and (2) SepFormer. The algorithms were trained with a customized dataset ( 30 h), and then tested with thirteen CI listeners. Both RNN and SepFormer algorithms significantly improved CI listener's speech intelligibility in noise without compromising the perceived quality of speech overall. These algorithms not only increased the intelligibility in stationary non-speech noise, but also introduced a substantial improvement in non-stationary noise, where conventional signal processing strategies fall short with little benefits. These results show the promise of using DNN algorithms as a solution for listening challenges in multi-talker noise interference.
Topics: Humans; Cochlear Implants; Deep Learning; Speech Intelligibility; Noise; Female; Middle Aged; Male; Algorithms; Speech Perception; Aged; Adult; Neural Networks, Computer
PubMed: 38853168
DOI: 10.1038/s41598-024-63675-8 -
Scientific Reports Jun 2024In human-computer interaction systems, speech emotion recognition (SER) plays a crucial role because it enables computers to understand and react to users' emotions. In...
In human-computer interaction systems, speech emotion recognition (SER) plays a crucial role because it enables computers to understand and react to users' emotions. In the past, SER has significantly emphasised acoustic properties extracted from speech signals. The use of visual signals for enhancing SER performance, however, has been made possible by recent developments in deep learning and computer vision. This work utilizes a lightweight Vision Transformer (ViT) model to propose a novel method for improving speech emotion recognition. We leverage the ViT model's capabilities to capture spatial dependencies and high-level features in images which are adequate indicators of emotional states from mel spectrogram input fed into the model. To determine the efficiency of our proposed approach, we conduct a comprehensive experiment on two benchmark speech emotion datasets, the Toronto English Speech Set (TESS) and the Berlin Emotional Database (EMODB). The results of our extensive experiment demonstrate a considerable improvement in speech emotion recognition accuracy attesting to its generalizability as it achieved 98%, 91%, and 93% (TESS-EMODB) accuracy respectively on the datasets. The outcomes of the comparative experiment show that the non-overlapping patch-based feature extraction method substantially improves the discipline of speech emotion recognition. Our research indicates the potential for integrating vision transformer models into SER systems, opening up fresh opportunities for real-world applications requiring accurate emotion recognition from speech compared with other state-of-the-art techniques.
Topics: Humans; Emotions; Speech; Deep Learning; Speech Recognition Software; Databases, Factual; Algorithms
PubMed: 38849422
DOI: 10.1038/s41598-024-63776-4 -
Nature Communications Jun 2024Humans produce two forms of cognitively complex vocalizations: speech and song. It is debated whether these differ based primarily on culturally specific, learned...
Humans produce two forms of cognitively complex vocalizations: speech and song. It is debated whether these differ based primarily on culturally specific, learned features, or if acoustical features can reliably distinguish them. We study the spectro-temporal modulation patterns of vocalizations produced by 369 people living in 21 urban, rural, and small-scale societies across six continents. Specific ranges of spectral and temporal modulations, overlapping within categories and across societies, significantly differentiate speech from song. Machine-learning classification shows that this effect is cross-culturally robust, vocalizations being reliably classified solely from their spectro-temporal features across all 21 societies. Listeners unfamiliar with the cultures classify these vocalizations using similar spectro-temporal cues as the machine learning algorithm. Finally, spectro-temporal features are better able to discriminate song from speech than a broad range of other acoustical variables, suggesting that spectro-temporal modulation-a key feature of auditory neuronal tuning-accounts for a fundamental difference between these categories.
Topics: Humans; Speech; Male; Female; Machine Learning; Adult; Acoustics; Cross-Cultural Comparison; Auditory Perception; Sound Spectrography; Singing; Music; Middle Aged; Young Adult
PubMed: 38844457
DOI: 10.1038/s41467-024-49040-3 -
PloS One 2024Dementia can disrupt how people experience and describe events as well as their own role in them. Alzheimer's disease (AD) compromises the processing of entities...
Dementia can disrupt how people experience and describe events as well as their own role in them. Alzheimer's disease (AD) compromises the processing of entities expressed by nouns, while behavioral variant frontotemporal dementia (bvFTD) entails a depersonalized perspective with increased third-person references. Yet, no study has examined whether these patterns can be captured in connected speech via natural language processing tools. To tackle such gaps, we asked 96 participants (32 AD patients, 32 bvFTD patients, 32 healthy controls) to narrate a typical day of their lives and calculated the proportion of nouns, verbs, and first- or third-person markers (via part-of-speech and morphological tagging). We also extracted objective properties (frequency, phonological neighborhood, length, semantic variability) from each content word. In our main study (with 21 AD patients, 21 bvFTD patients, and 21 healthy controls), we used inferential statistics and machine learning for group-level and subject-level discrimination. The above linguistic features were correlated with patients' scores in tests of general cognitive status and executive functions. We found that, compared with HCs, (i) AD (but not bvFTD) patients produced significantly fewer nouns, (ii) bvFTD (but not AD) patients used significantly more third-person markers, and (iii) both patient groups produced more frequent words. Machine learning analyses showed that these features identified individuals with AD and bvFTD (AUC = 0.71). A generalizability test, with a model trained on the entire main study sample and tested on hold-out samples (11 AD patients, 11 bvFTD patients, 11 healthy controls), showed even better performance, with AUCs of 0.76 and 0.83 for AD and bvFTD, respectively. No linguistic feature was significantly correlated with cognitive test scores in either patient group. These results suggest that specific cognitive traits of each disorder can be captured automatically in connected speech, favoring interpretability for enhanced syndrome characterization, diagnosis, and monitoring.
Topics: Humans; Frontotemporal Dementia; Alzheimer Disease; Female; Male; Aged; Speech; Middle Aged; Case-Control Studies; Biomarkers; Natural Language Processing; Machine Learning; Neuropsychological Tests; Executive Function
PubMed: 38843210
DOI: 10.1371/journal.pone.0304272 -
Low testosterone levels relate to poorer cognitive function in women in an APOE-ε4-dependant manner.Biology of Sex Differences Jun 2024Past research suggests that low testosterone levels relate to poorer cognitive function and higher Alzheimer's disease (AD) risk; however, these findings are...
BACKGROUND
Past research suggests that low testosterone levels relate to poorer cognitive function and higher Alzheimer's disease (AD) risk; however, these findings are inconsistent and are mostly derived from male samples, despite similar age-related testosterone decline in females. Both animal and human studies demonstrate that testosterone's effects on brain health may be moderated by apolipoprotein E ε4 allele (APOE-ε4) carrier status, which may explain some previous inconsistencies. We examined how testosterone relates to cognitive function in older women versus men across healthy aging and the AD continuum and the moderating role of APOE-ε4 genotype.
METHODS
Five hundred and sixty one participants aged 55-90 (155 cognitively normal (CN), 294 mild cognitive impairment (MCI), 112 AD dementia) from the Alzheimer's Disease Neuroimaging Initiative (ADNI), who had baseline cognitive and plasma testosterone data, as measured by the Rules Based Medicine Human DiscoveryMAP Panel were included. There were 213 females and 348 males (self-reported sex assigned at birth), and 52% of the overall sample were APOE-ε4 carriers. We tested the relationship of plasma testosterone levels and its interaction with APOE-ε4 status on clinical diagnostic group (CN vs. MCI vs. AD), global, and domain-specific cognitive performance using ANOVAs and linear regression models in sex-stratified samples. Cognitive domains included verbal memory, executive function, processing speed, and language.
RESULTS
We did not observe a significant difference in testosterone levels between clinical diagnostic groups in either sex, regrardless of APOE-ε4 status. Across clinical diagnostic group, we found a significant testosterone by APOE-ε4 interaction in females, such that lower testosterone levels related to worse global cognition, processing speed, and verbal memory in APOE-ε4 carriers only. We did not find that testosterone, nor its interaction with APOE-ε4, related to cognitive outcomes in males.
CONCLUSIONS
Findings suggest that low testosterone levels in older female APOE-ε4 carriers across the aging-MCI-AD continuum may have deleterious, domain-specific effects on cognitive performance. Although future studies including additional sex hormones and longitudinal cognitive trajectories are needed, our results highlight the importance of including both sexes and considering APOE-ε4 carrier status when examining testosterone's role in cognitive health.
Topics: Aged; Aged, 80 and over; Female; Humans; Male; Middle Aged; Alzheimer Disease; Apolipoprotein E4; Cognition; Cognitive Dysfunction; Sex Characteristics; Testosterone
PubMed: 38835072
DOI: 10.1186/s13293-024-00620-4 -
Frontiers in Psychiatry 2024Transgressive incidents directed at staff by forensic patients occur frequently, leading to detrimental psychological and physical harm, underscoring urgency of...
Transgressive incidents directed at staff by forensic patients occur frequently, leading to detrimental psychological and physical harm, underscoring urgency of preventive measures. These incidents, emerging within therapeutic relationships, involve complex interactions between patient and staff behavior. This study aims to identify clusters of transgressive incidents based on incident characteristics such as impact, severity, (presumed) cause, type of aggression, and consequences, using latent class analysis (LCA). Additionally, variations in incident clusters based on staff, patient, and context characteristics were investigated. A total of 1,184 transgressive incidents, reported by staff and targeted at staff by patients between 2018-2022, were extracted from a digital incident reporting system at Fivoor, a Dutch forensic psychiatric healthcare organisation. Latent Class Analysis revealed six incident classes: 1) ; 2) ; 3) ; 4) ; 5) ; and 6) . Significant differences in age and gender of both staff and patients, staff function, and patient diagnoses were observed among these classes. Incidents with higher impact were more prevalent in high security clinics, while lower-impact incidents were more common in clinics for patients with intellectual disabilities. Despite limitations like missing information, tailored prevention approaches are needed due to varying types of transgressive incidents across patients, staff, and units.
PubMed: 38832326
DOI: 10.3389/fpsyt.2024.1394535 -
Trends in Hearing 2024The extent to which active noise cancelation (ANC), when combined with hearing assistance, can improve speech intelligibility in noise is not well understood. One...
The extent to which active noise cancelation (ANC), when combined with hearing assistance, can improve speech intelligibility in noise is not well understood. One possible source of benefit is ANC's ability to reduce the sound level of the direct (i.e., vent-transmitted) path. This reduction lowers the "floor" imposed by the direct path, thereby allowing any increases to the signal-to-noise ratio (SNR) created in the amplified path to be "realized" at the eardrum. Here we used a modeling approach to estimate this benefit. We compared pairs of simulated hearing aids that differ only in terms of their ability to provide ANC and computed intelligibility metrics on their outputs. The difference in metric scores between simulated devices is termed the "ANC Benefit." These simulations show that ANC Benefit increases as (1) the environmental sound level increases, (2) the ability of the hearing aid to improve SNR increases, (3) the strength of the ANC increases, and (4) the hearing loss severity decreases. The predicted size of the ANC Benefit can be substantial. For a moderate hearing loss, the model predicts improvement in intelligibility metrics of >30% when environments are moderately loud (>70 dB SPL) and devices are moderately capable of increasing SNR (by >4 dB). It appears that ANC can be a critical ingredient in hearing devices that attempt to improve SNR in loud environments. ANC will become more and more important as advanced SNR-improving algorithms (e.g., artificial intelligence speech enhancement) are included in hearing devices.
Topics: Humans; Hearing Aids; Signal-To-Noise Ratio; Speech Intelligibility; Noise; Perceptual Masking; Speech Perception; Computer Simulation; Acoustic Stimulation; Correction of Hearing Impairment; Persons With Hearing Impairments; Hearing Loss; Equipment Design; Signal Processing, Computer-Assisted
PubMed: 38831646
DOI: 10.1177/23312165241260029 -
BMC Medical Education Jun 2024Professional behaviour is the first manifestation of professionalism. In teaching hospitals, the residents can be considered vulnerable to lapses in professional...
Factors leading to lapses in professional behaviour of Gynae residents in Pakistan: a study reflecting through the lenses of patients and family, consultants and residents.
INTRODUCTION
Professional behaviour is the first manifestation of professionalism. In teaching hospitals, the residents can be considered vulnerable to lapses in professional behaviour when they fail to meet the set standards of professionalism. Residents of some specialties are more at risk of lapses in professional behaviour due to the demanding nature of work. Research focusing on the behaviour of residents in the field of Gynae and the underlying factors contributing to such behaviour is notably lacking in the literature. Additionally, there is a gap in understanding the perspectives of patients from Pakistan on this matter, as it remains unexplored thus far, which constitutes the central focus of this study. An increase in complaints lodged against Gynae resident's professional behaviour in Pakistan Citizen Portal (PCP) was observed. Therefore, an exploratory qualitative study was conducted to investigate the factors and rationales contributing to the lapses in resident's professional behaviour. The study collected the viewpoints of three stakeholder groups: patients and their families, consultants and residents. The study was conducted in three phases. First, the document analysis of written complaints was conducted, followed by face-to-face interviews (11 per group) conducted by trained researchers from an independent 3rd party. Finally, the interview data was transcribed, coded and analysed. In total 15 themes were identified from the interviews with 3 stakeholders, which were then categorized and resulted in 6 overlapping themes. The most prevalent lapse reported by all 3 stakeholders was poor verbal behaviour of residents.
CONCLUSION
The highly ranked factors contributing to triggering the situation were associated with workplace challenges, well-being of residents, limited resources, patients and family characteristics, patients' expectations, lack of administrative and paramedic support, cultural factors and challenges specific to Gynae specialty. Another intriguing and emerging theme was related to the characteristics of patients and attendants which helped in understanding the causes and implications of conflicting environments. The value of competency also emphasized that can be accomplished by training and mentoring systems. The thorough examination of these factors by key stakeholders aided in accurately analysing the issue, its causes, and possible solutions. The study's findings will assist higher authorities in implementing corrective actions and offering evidence-based guidance to policymakers to improve healthcare system.
Topics: Humans; Pakistan; Internship and Residency; Female; Professionalism; Qualitative Research; Male; Adult; Consultants; Family; Professional Misconduct
PubMed: 38831320
DOI: 10.1186/s12909-024-05509-9 -
Journal of Affective Disorders May 2024A more in-depth understanding of the relationship between depressive symptoms, neurocognition and suicidal behavior could provide insights into the prognosis and...
The impact of antidepressant treatment on the network structure of neurocognition and core emotional depressive symptoms among depressed individuals with a history of suicide attempt: An 8-week clinical study.
BACKGROUND
A more in-depth understanding of the relationship between depressive symptoms, neurocognition and suicidal behavior could provide insights into the prognosis and treatment of major depressive disorder (MDD) and suicide. We conducted a network analysis among depressed patients examining associations between history of suicide attempt (HSA), core emotional major depression disorder, and key neurocognitive domains.
METHOD
Depressed patients (n = 120) aged 18-65 years were recruited from a larger randomized clinical trial conducted at the Douglas Institute in Montreal, Canada. They were randomly assigned to receive one of two antidepressant treatments (i.e., escitalopram or desvenlafaxine) for 8 weeks. Core emotional MDD and key neurocognitive domains were assessed pre-post treatment.
RESULTS
At baseline, an association between history of suicide attempt (HSA) and phonemic verbal fluency (PVF) suggested that HSA patients reported lower levels of the latter. After 8 weeks of antidepressant treatment, HSA became conditionally independent from PVF. Similar results were found for both the HAM-D and the QIDS-SR core emotional MDD/neurocognitive networks.
CONCLUSION
Network analysis revealed a pre-treatment relationship between a HSA and decreased phonemic VF among depressed patients, which was no longer present after 8 weeks of antidepressant treatment.
PubMed: 38823590
DOI: 10.1016/j.jad.2024.05.111 -
Scientific Reports May 2024Speech is produced by a nonlinear, dynamical Vocal Tract (VT) system, and is transmitted through multiple (air, bone and skin conduction) modes, as captured by the air,...
Speech is produced by a nonlinear, dynamical Vocal Tract (VT) system, and is transmitted through multiple (air, bone and skin conduction) modes, as captured by the air, bone and throat microphones respectively. Speaker specific characteristics that capture this nonlinearity are rarely used as stand-alone features for speaker modeling, and at best have been used in tandem with well known linear spectral features to produce tangible results. This paper proposes Recurrent Plot (RP) embeddings as stand-alone, non-linear speaker-discriminating features. Two datasets, the continuous multimodal TIMIT speech corpus and the consonant-vowel unimodal syllable dataset, are used in this study for conducting closed-set speaker identification experiments. Experiments with unimodal speaker recognition systems show that RP embeddings capture the nonlinear dynamics of the VT system which are unique to every speaker, in all the modes of speech. The Air (A), Bone (B) and Throat (T) microphone systems, trained purely on RP embeddings perform with an accuracy of 95.81%, 98.18% and 99.74%, respectively. Experiments using the joint feature space of combined RP embeddings for bimodal (A-T, A-B, B-T) and trimodal (A-B-T) systems show that the best trimodal system (99.84% accuracy) performs on par with trimodal systems using spectrogram (99.45%) and MFCC (99.98%). The 98.84% performance of the B-T bimodal system shows the efficacy of a speaker recognition system based entirely on alternate (bone and throat) speech, in the absence of the standard (air) speech. The results underscore the significance of the RP embedding, as a nonlinear feature representation of the dynamical VT system that can act independently for speaker recognition. It is envisaged that speech recognition too will benefit from this nonlinear feature.
Topics: Humans; Pharynx; Speech; Nonlinear Dynamics; Male; Female; Speech Acoustics; Bone and Bones; Adult
PubMed: 38822054
DOI: 10.1038/s41598-024-62406-3