-
PloS One 2023Speech deepfakes are artificial voices generated by machine learning models. Previous literature has highlighted deepfakes as one of the biggest security threats arising...
Speech deepfakes are artificial voices generated by machine learning models. Previous literature has highlighted deepfakes as one of the biggest security threats arising from progress in artificial intelligence due to their potential for misuse. However, studies investigating human detection capabilities are limited. We presented genuine and deepfake audio to n = 529 individuals and asked them to identify the deepfakes. We ran our experiments in English and Mandarin to understand if language affects detection performance and decision-making rationale. We found that detection capability is unreliable. Listeners only correctly spotted the deepfakes 73% of the time, and there was no difference in detectability between the two languages. Increasing listener awareness by providing examples of speech deepfakes only improves results slightly. As speech synthesis algorithms improve and become more realistic, we can expect the detection task to become harder. The difficulty of detecting speech deepfakes confirms their potential for misuse and signals that defenses against this threat are needed.
Topics: Humans; Speech; Artificial Intelligence; Phonetics; Speech Perception; Language
PubMed: 37531336
DOI: 10.1371/journal.pone.0285333 -
Modifications of auditory feedback and its effects on the voice of adult subjects: a scoping review.CoDAS 2023The auditory perception of voice and its production involve auditory feedback, kinesthetic cues and the feedforward system that produce different effects for the voice.... (Review)
Review
INTRODUCTION
The auditory perception of voice and its production involve auditory feedback, kinesthetic cues and the feedforward system that produce different effects for the voice. The Lombard, Sidetone and Pitch-Shift-Reflex effects are the most studied. The mapping of scientific experiments on changes in auditory feedback for voice motor control makes it possible to examine the existing literature on the phenomenon and may contribute to voice training or therapies.
PURPOSE
To map experiments and research results with manipulation of auditory feedback for voice motor control in adults.
METHOD
Scope review following the Checklist Preferred Reporting Items for Systematic reviews and Meta-Analyses extension (PRISMA-ScR) to answer the question: "What are the investigation methods and main research findings on the manipulation of auditory feedback in voice self-monitoring of adults?". The search protocol was based on the Population, Concept, and Context (PCC) mnemonic strategy, in which the population is adult individuals, the concept is the manipulation of auditory feedback and the context is on motor voice control. Articles were searched in the databases: BVS/Virtual Health Library, MEDLINE/Medical Literature Analysis and Retrieval System online, COCHRANE, CINAHL/Cumulative Index to Nursing and Allied Health Literature, SCOPUS and WEB OF SCIENCE.
RESULTS
60 articles were found, 19 on the Lombard Effect, 25 on the Pitch-shift-reflex effect, 12 on the Sidetone effect and four on the Sidetone/Lombard effect. The studies are in agreement that the insertion of a noise that masks the auditory feedback causes an increase in the individual's speech intensity and that the amplification of the auditory feedback promotes the reduction of the sound pressure level in the voice production. A reflex response to the change in pitch is observed in the auditory feedback, however, with particular characteristics in each study.
CONCLUSION
The material and method of the experiments are different, there are no standardizations in the tasks, the samples are varied and often reduced. The methodological diversity makes it difficult to generalize the results. The main findings of research on auditory feedback on voice motor control confirm that in the suppression of auditory feedback, the individual tends to increase the intensity of the voice. In auditory feedback amplification, the individual decreases the intensity and has greater control over the fundamental frequency, and in frequency manipulations, the individual tends to correct the manipulation. The few studies with dysphonic individuals show that they behave differently from non-dysphonic individuals.
Topics: Adult; Humans; Feedback; Pitch Perception; Voice; Speech; Auditory Perception
PubMed: 38126424
DOI: 10.1590/2317-1782/20232022202pt -
The Journal of Neuroscience : the... Aug 2023Hearing impairment affects many older adults but is often diagnosed decades after speech comprehension in noisy situations has become effortful. Accurate assessment of...
Hearing impairment affects many older adults but is often diagnosed decades after speech comprehension in noisy situations has become effortful. Accurate assessment of listening effort may thus help diagnose hearing impairment earlier. However, pupillometry-the most used approach to assess listening effort-has limitations that hinder its use in practice. The current study explores a novel way to assess listening effort through eye movements. Building on cognitive and neurophysiological work, we examine the hypothesis that eye movements decrease when speech listening becomes challenging. In three experiments with human participants from both sexes, we demonstrate, consistent with this hypothesis, that fixation duration increases and spatial gaze dispersion decreases with increasing speech masking. Eye movements decreased during effortful speech listening for different visual scenes (free viewing, object tracking) and speech materials (simple sentences, naturalistic stories). In contrast, pupillometry was less sensitive to speech masking during story listening, suggesting pupillometric measures may not be as effective for the assessments of listening effort in naturalistic speech-listening paradigms. Our results reveal a critical link between eye movements and cognitive load, suggesting that neural activity in the brain regions that support the regulation of eye movements, such as frontal eye field and superior colliculus, are modulated when listening is effortful. Assessment of listening effort is critical for early diagnosis of age-related hearing loss. Pupillometry is most used but has several disadvantages. The current study explores a novel way to assess listening effort through eye movements. We examine the hypothesis that eye movements decrease when speech listening becomes effortful. We demonstrate, consistent with this hypothesis, that fixation duration increases and gaze dispersion decreases with increasing speech masking. Eye movements decreased during effortful speech listening for different visual scenes (free viewing, object tracking) and speech materials (sentences, naturalistic stories). Our results reveal a critical link between eye movements and cognitive load, suggesting that neural activity in brain regions that support the regulation of eye movements are modulated when listening is effortful.
Topics: Male; Female; Humans; Aged; Speech; Eye Movements; Speech Perception; Auditory Perception; Noise; Speech Intelligibility
PubMed: 37491313
DOI: 10.1523/JNEUROSCI.0240-23.2023 -
European Journal of Medical Research Jul 2023Multiple sclerosis (MS) is a chronic inflammatory and demyelinating autoimmune disease. MS patients deal with motor and sensory impairments, visual disabilities,... (Review)
Review
Multiple sclerosis (MS) is a chronic inflammatory and demyelinating autoimmune disease. MS patients deal with motor and sensory impairments, visual disabilities, cognitive disorders, and speech and language deficits. The study aimed to record, enhance, update, and delve into our present comprehension of speech deficits observed in patients with MS and the methodology (assessment tools) studies followed. The method used was a search of the literature through the databases for May 2015 until June 2022. The reviewed studies offer insight into speech impairments most exhibited by MS patients. Patients with MS face numerous communication changes concerning the phonation system (changes observed concerning speech rate, long pause duration) and lower volume. Moreover, the articulation system was affected by the lack of muscle synchronization and inaccurate pronunciations, mainly of vowels. Finally, there are changes regarding prosody (MS patients exhibited monotonous speech). Findings indicated that MS patients experience communication changes across various domains. Based on the reviewed studies, we concluded that the speech system of MS patients is impaired to some extent, and the patients face many changes that impact their conversational ability and the production of slower and inaccurate speech. These changes can affect MS patients' quality of life.
Topics: Humans; Multiple Sclerosis; Speech; Quality of Life; Autoimmune Diseases; Cognition Disorders
PubMed: 37488623
DOI: 10.1186/s40001-023-01230-3 -
Journal of Psychiatry & Neuroscience :... 2023Delirium is a critically underdiagnosed syndrome of altered mental status affecting more than 50% of older adults admitted to hospital. Few studies have incorporated...
BACKGROUND
Delirium is a critically underdiagnosed syndrome of altered mental status affecting more than 50% of older adults admitted to hospital. Few studies have incorporated speech and language disturbance in delirium detection. We sought to describe speech and language disturbances in delirium, and provide a proof of concept for detecting delirium using computational speech and language features.
METHODS
Participants underwent delirium assessment and completed language tasks. Speech and language disturbances were rated using standardized clinical scales. Recordings and transcripts were processed using an automated pipeline to extract acoustic and textual features. We used binomial, elastic net, machine learning models to predict delirium status.
RESULTS
We included 33 older adults admitted to hospital, of whom 10 met criteria for delirium. The group with delirium scored higher on total language disturbances and incoherence, and lower on category fluency. Both groups scored lower on category fluency than the normative population. Cognitive dysfunction as a continuous measure was correlated with higher total language disturbance, incoherence, loss of goal and lower category fluency. Including computational language features in the model predicting delirium status increased accuracy to 78%.
LIMITATIONS
This was a proof-of-concept study with limited sample size, without a set-aside cross-validation sample. Subsequent studies are needed before establishing a generalizable model for detecting delirium.
CONCLUSION
Language impairments were elevated among patients with delirium and may also be used to identify subthreshold cognitive disturbances. Computational speech and language features are promising as accurate, noninvasive and efficient biomarkers of delirium.
Topics: Humans; Aged; Speech; Language; Cognitive Dysfunction; Delirium
PubMed: 37402579
DOI: 10.1503/jpn.230026 -
CoDAS 2023Dynamic vocal analysis (DVA) is an auditory-perceptual and acoustic vocal assessment strategy that provides estimates on the biomechanics and aerodynamics of vocal...
Dynamic vocal analysis (DVA) is an auditory-perceptual and acoustic vocal assessment strategy that provides estimates on the biomechanics and aerodynamics of vocal production by performing frequency and intensity variation tasks and using voice acoustic spectrography. The objective of this experience report is to demonstrate the use of DVA in the assessment of vocal functionality of dysphonic and non-dysphonic individuals, with a special focus on the laryngeal musculature. Phonatory tasks consisted of sustained vowel, "a" or "é", and/or connected speech, in three intensities (habitual, soft, and loud) and three frequencies (habitual, high, and low), as well as ascending and descending glissando. The adjustments of the laryngeal and paralaryngeal muscles can be inferred from the different DVA tasks. The main characteristics of the laryngeal muscles analyzed are control of glottic adduction, stretching, and shortening of the vocal folds; the main characteristics of the paralaryngeal musculature are mainly related to the vertical laryngeal position in the neck. While the sustained vowel evaluates the vocal functionality with a focus on the larynx, connected speech allows the evaluation of the articulatory adjustments employed. An acoustic spectrographic software can be used to visualize the performance of such tasks. The clinical application of the DVA will be exemplified using acoustic spectrography plates from normal and dysphonic voices, taken from a voice bank. Individuals who perform the DVA tasks in a balanced way, with adequate vocal quality and without phonatory effort, demonstrate good vocal functionality. On the other hand, difficulties in performing these tasks with worsening vocal quality and/or increased muscle tension may be indications of altered vocal functionality.
Topics: Humans; Voice; Speech; Phonation; Voice Quality; Larynx
PubMed: 37729254
DOI: 10.1590/2317-1782/20232021083pt -
Internal Medicine (Tokyo, Japan) Dec 2023
Topics: Humans; Speech; Tachycardia; Syncope; Tilt-Table Test
PubMed: 37062741
DOI: 10.2169/internalmedicine.1737-23 -
Journal of Medical Internet Research Sep 2023Conversational agents (CAs), also known as chatbots, are digital dialog systems that enable people to have a text-based, speech-based, or nonverbal conversation with a... (Review)
Review
BACKGROUND
Conversational agents (CAs), also known as chatbots, are digital dialog systems that enable people to have a text-based, speech-based, or nonverbal conversation with a computer or another machine based on natural language via an interface. The use of CAs offers new opportunities and various benefits for health care. However, they are not yet ubiquitous in daily practice. Nevertheless, research regarding the implementation of CAs in health care has grown tremendously in recent years.
OBJECTIVE
This review aims to present a synthesis of the factors that facilitate or hinder the implementation of CAs from the perspectives of patients and health care professionals. Specifically, it focuses on the early implementation outcomes of acceptability, acceptance, and adoption as cornerstones of later implementation success.
METHODS
We performed an integrative review. To identify relevant literature, a broad literature search was conducted in June 2021 with no date limits and using all fields in PubMed, Cochrane Library, Web of Science, LIVIVO, and PsycINFO. To keep the review current, another search was conducted in March 2022. To identify as many eligible primary sources as possible, we used a snowballing approach by searching reference lists and conducted a hand search. Factors influencing the acceptability, acceptance, and adoption of CAs in health care were coded through parallel deductive and inductive approaches, which were informed by current technology acceptance and adoption models. Finally, the factors were synthesized in a thematic map.
RESULTS
Overall, 76 studies were included in this review. We identified influencing factors related to 4 core Unified Theory of Acceptance and Use of Technology (UTAUT) and Unified Theory of Acceptance and Use of Technology 2 (UTAUT2) factors (performance expectancy, effort expectancy, facilitating conditions, and hedonic motivation), with most studies underlining the relevance of performance and effort expectancy. To meet the particularities of the health care context, we redefined the UTAUT2 factors social influence, habit, and price value. We identified 6 other influencing factors: perceived risk, trust, anthropomorphism, health issue, working alliance, and user characteristics. Overall, we identified 10 factors influencing acceptability, acceptance, and adoption among health care professionals (performance expectancy, effort expectancy, facilitating conditions, social influence, price value, perceived risk, trust, anthropomorphism, working alliance, and user characteristics) and 13 factors influencing acceptability, acceptance, and adoption among patients (additionally hedonic motivation, habit, and health issue).
CONCLUSIONS
This review shows manifold factors influencing the acceptability, acceptance, and adoption of CAs in health care. Knowledge of these factors is fundamental for implementation planning. Therefore, the findings of this review can serve as a basis for future studies to develop appropriate implementation strategies. Furthermore, this review provides an empirical test of current technology acceptance and adoption models and identifies areas where additional research is necessary.
TRIAL REGISTRATION
PROSPERO CRD42022343690; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=343690.
Topics: Humans; Communication; Language; Habits; Speech; Delivery of Health Care
PubMed: 37751279
DOI: 10.2196/46548 -
Hearing Research Sep 2023Direct neural recordings from human auditory cortex have demonstrated encoding for acoustic-phonetic features of consonants and vowels. Neural responses also encode... (Review)
Review
Direct neural recordings from human auditory cortex have demonstrated encoding for acoustic-phonetic features of consonants and vowels. Neural responses also encode distinct acoustic amplitude cues related to timing, such as those that occur at the onset of a sentence after a silent period or the onset of the vowel in each syllable. Here, we used a group reduced rank regression model to show that distributed cortical responses support a low-dimensional latent state representation of temporal context in speech. The timing cues each capture more unique variance than all other phonetic features and exhibit rotational or cyclical dynamics in latent space from activity that is widespread over the superior temporal gyrus. We propose that these spatially distributed timing signals could serve to provide temporal context for, and possibly bind across time, the concurrent processing of individual phonetic features, to compose higher-order phonological (e.g. word-level) representations.
Topics: Humans; Speech; Speech Perception; Temporal Lobe; Auditory Cortex; Phonetics; Acoustic Stimulation
PubMed: 37441880
DOI: 10.1016/j.heares.2023.108838 -
Internal Medicine (Tokyo, Japan) Sep 2023Speech-induced atrial tachycardia (AT) with presyncope is extremely rare. A 52-year-old woman employed at a supermarket reported recurrent presyncope while speaking out...
Speech-induced atrial tachycardia (AT) with presyncope is extremely rare. A 52-year-old woman employed at a supermarket reported recurrent presyncope while speaking out loud at her job. Holter electrocardiography revealed AT while swallowing without presyncope. The patient's blood pressure decreased during AT, and she experienced presyncope while saying "IRASSHAIMASE" loudly during a tilt table test. Accordingly, bisoprolol 1.25 mg was prescribed, and the patient did not experience episodes of presyncope with recurrence of AT for 2 years. This case suggests that provocation of arrhythmia in the tilting position may be useful for demonstrating a relationship between arrhythmia and presyncope and/or syncope.
Topics: Female; Humans; Middle Aged; Speech; Syncope; Tachycardia, Supraventricular; Arrhythmias, Cardiac; Tilt-Table Test
PubMed: 36575016
DOI: 10.2169/internalmedicine.1028-22