speech - OpenMD.com Journal Search

Speech and nonspeech: What are we talking about?

International Journal of... Aug 2017

Understanding of the behavioural, cognitive and neural underpinnings of speech production is of interest theoretically, and is important for understanding disorders of... (Review)

Summary PubMed Full Text PDF

Review

Authors: Edwin Maas

Understanding of the behavioural, cognitive and neural underpinnings of speech production is of interest theoretically, and is important for understanding disorders of speech production and how to assess and treat such disorders in the clinic. This paper addresses two claims about the neuromotor control of speech production: (1) speech is subserved by a distinct, specialised motor control system and (2) speech is holistic and cannot be decomposed into smaller primitives. Both claims have gained traction in recent literature, and are central to a task-dependent model of speech motor control. The purpose of this paper is to stimulate thinking about speech production, its disorders and the clinical implications of these claims. The paper poses several conceptual and empirical challenges for these claims - including the critical importance of defining speech. The emerging conclusion is that a task-dependent model is called into question as its two central claims are founded on ill-defined and inconsistently applied concepts. The paper concludes with discussion of methodological and clinical implications, including the potential utility of diadochokinetic (DDK) tasks in assessment of motor speech disorders and the contraindication of nonspeech oral motor exercises to improve speech function.

Topics: Humans; Speech

PubMed: 27701907
DOI: 10.1080/17549507.2016.1221995

Combining partial information from speech and text.

The Journal of the Acoustical Society... Feb 2020

The current study investigated how partial speech and text information, distributed at various interruption rates, is combined to support sentence recognition in quiet....

Summary PubMed Full Text PDF

Authors: Daniel Fogerty, Irraj Iftikhar, Rachel Madorskiy...

The current study investigated how partial speech and text information, distributed at various interruption rates, is combined to support sentence recognition in quiet. Speech and text stimuli were interrupted by silence and presented unimodally or combined in multimodal conditions. Across all conditions, performance was best at the highest interruption rates. Listeners were able to gain benefit from most multimodal presentations, even when the rate of interruption was mismatched between modalities. Supplementing partial speech with incomplete visual cues can improve sentence intelligibility and compensate for degraded speech in adverse listening conditions. However, individual variability in benefit depends on unimodal performance.

Topics: Acoustic Stimulation; Cues; Recognition, Psychology; Speech; Speech Intelligibility; Speech Perception

PubMed: 32113272
DOI: 10.1121/10.0000748

Gesture-speech physics in fluent speech and rhythmic upper limb movements.

Annals of the New York Academy of... May 2021

It is commonly understood that hand gesture and speech coordination in humans is culturally and cognitively acquired, rather than having a biological basis. Recently,...

Summary PubMed Full Text PDF

Authors: Wim Pouw, Lisette de Jonge-Hoekstra, Steven J Harrison...

It is commonly understood that hand gesture and speech coordination in humans is culturally and cognitively acquired, rather than having a biological basis. Recently, however, the biomechanical physical coupling of arm movements to speech vocalization has been studied in steady-state vocalization and monosyllabic utterances, where forces produced during gesturing are transferred onto the tensioned body, leading to changes in respiratory-related activity and thereby affecting vocalization F0 and intensity. In the current experiment (n = 37), we extend this previous line of work to show that gesture-speech physics also impacts fluent speech. Compared with nonmovement, participants who are producing fluent self-formulated speech while rhythmically moving their limbs demonstrate heightened F0 and amplitude envelope, and such effects are more pronounced for higher-impulse arm versus lower-impulse wrist movement. We replicate that acoustic peaks arise especially during moments of peak impulse (i.e., the beat) of the movement, namely around deceleration phases of the movement. Finally, higher deceleration rates of higher-mass arm movements were related to higher peaks in acoustics. These results confirm a role for physical impulses of gesture affecting the speech system. We discuss the implications of gesture-speech physics for understanding of the emergence of communicative gesture, both ontogenetically and phylogenetically.

Topics: Adolescent; Biomechanical Phenomena; Female; Gestures; Humans; Male; Motion Perception; Movement; Speech; Speech Acoustics; Young Adult

PubMed: 33336809
DOI: 10.1111/nyas.14532

Rhythm in speech and animal vocalizations: a cross-species perspective.

Annals of the New York Academy of... Oct 2019

Why does human speech have rhythm? As we cannot travel back in time to witness how speech developed its rhythmic properties and why humans have the cognitive skills to... (Review)

Summary PubMed Full Text PDF

Review

Authors: Andrea Ravignani, Simone Dalla Bella, Simone Falk...

Why does human speech have rhythm? As we cannot travel back in time to witness how speech developed its rhythmic properties and why humans have the cognitive skills to process them, we rely on alternative methods to find out. One powerful tool is the comparative approach: studying the presence or absence of cognitive/behavioral traits in other species to determine which traits are shared between species and which are recent human inventions. Vocalizations of many species exhibit temporal structure, but little is known about how these rhythmic structures evolved, are perceived and produced, their biological and developmental bases, and communicative functions. We review the literature on rhythm in speech and animal vocalizations as a first step toward understanding similarities and differences across species. We extend this review to quantitative techniques that are useful for computing rhythmic structure in acoustic sequences and hence facilitate cross-species research. We report links between vocal perception and motor coordination and the differentiation of rhythm based on hierarchical temporal structure. While still far from a complete cross-species perspective of speech rhythm, our review puts some pieces of the puzzle together.

Topics: Animals; Biological Evolution; Humans; Language; Periodicity; Speech; Vocalization, Animal

PubMed: 31237365
DOI: 10.1111/nyas.14166

Brain-Computer Interface: Applications to Speech Decoding and Synthesis to Augment Communication.

Neurotherapeutics : the Journal of the... Jan 2022

Damage or degeneration of motor pathways necessary for speech and other movements, as in brainstem strokes or amyotrophic lateral sclerosis (ALS), can interfere with... (Review)

Summary PubMed Full Text PDF

Review

Authors: Shiyu Luo, Qinwan Rabbani, Nathan E Crone...

Damage or degeneration of motor pathways necessary for speech and other movements, as in brainstem strokes or amyotrophic lateral sclerosis (ALS), can interfere with efficient communication without affecting brain structures responsible for language or cognition. In the worst-case scenario, this can result in the locked in syndrome (LIS), a condition in which individuals cannot initiate communication and can only express themselves by answering yes/no questions with eye blinks or other rudimentary movements. Existing augmentative and alternative communication (AAC) devices that rely on eye tracking can improve the quality of life for people with this condition, but brain-computer interfaces (BCIs) are also increasingly being investigated as AAC devices, particularly when eye tracking is too slow or unreliable. Moreover, with recent and ongoing advances in machine learning and neural recording technologies, BCIs may offer the only means to go beyond cursor control and text generation on a computer, to allow real-time synthesis of speech, which would arguably offer the most efficient and expressive channel for communication. The potential for BCI speech synthesis has only recently been realized because of seminal studies of the neuroanatomical and neurophysiological underpinnings of speech production using intracranial electrocorticographic (ECoG) recordings in patients undergoing epilepsy surgery. These studies have shown that cortical areas responsible for vocalization and articulation are distributed over a large area of ventral sensorimotor cortex, and that it is possible to decode speech and reconstruct its acoustics from ECoG if these areas are recorded with sufficiently dense and comprehensive electrode arrays. In this article, we review these advances, including the latest neural decoding strategies that range from deep learning models to the direct concatenation of speech units. We also discuss state-of-the-art vocoders that are integral in constructing natural-sounding audio waveforms for speech BCIs. Finally, this review outlines some of the challenges ahead in directly synthesizing speech for patients with LIS.

Topics: Brain-Computer Interfaces; Communication; Electrocorticography; Humans; Quality of Life; Speech

PubMed: 35099768
DOI: 10.1007/s13311-022-01190-2

Speech intelligibility prediction based on modulation frequency-selective processing.

Hearing Research Dec 2022

Speech intelligibility models can provide insights regarding the auditory processes involved in human speech perception and communication. One successful approach to... (Review)

Summary PubMed Full Text

Review

Authors: Helia Relaño-Iborra, Torsten Dau

Speech intelligibility models can provide insights regarding the auditory processes involved in human speech perception and communication. One successful approach to modelling speech intelligibility has been based on the analysis of the amplitude modulations present in speech as well as competing interferers. This review covers speech intelligibility models that include a modulation-frequency selective processing stage i.e., a modulation filterbank, as part of their front end. The speech-based envelope power spectrum model [sEPSM, Jørgensen and Dau (2011). J. Acoust. Soc. Am. 130(3), 1475-1487], several variants of the sEPSM including modifications with respect to temporal resolution, spectro-temporal processing and binaural processing, as well as the speech-based computational auditory signal processing and perception model [sCASP; Relaño-Iborra et al. (2019). J. Acoust. Soc. Am. 146(5), 3306-3317], which is based on an established auditory signal detection and masking model, are discussed. The key processing stages of these models for the prediction of speech intelligibility across a variety of acoustic conditions are addressed in relation to competing modeling approaches. The strengths and weaknesses of the modulation-based analysis are outlined and perspectives presented, particularly in connection with the challenge of predicting the consequences of individual hearing loss on speech intelligibility.

Topics: Humans; Speech Intelligibility; Perceptual Masking; Speech Acoustics; Auditory Threshold; Speech Perception; Acoustic Stimulation

PubMed: 36163219
DOI: 10.1016/j.heares.2022.108610

Dataset of Speech Production in intracranial.Electroencephalography.

Scientific Data Jul 2022

Speech production is an intricate process involving a large number of muscles and cognitive processes. The neural processes underlying speech production are not...

Summary PubMed Full Text PDF

Authors: Maxime Verwoert, Maarten C Ottenhoff, Sophocles Goulis...

Speech production is an intricate process involving a large number of muscles and cognitive processes. The neural processes underlying speech production are not completely understood. As speech is a uniquely human ability, it can not be investigated in animal models. High-fidelity human data can only be obtained in clinical settings and is therefore not easily available to all researchers. Here, we provide a dataset of 10 participants reading out individual words while we measured intracranial EEG from a total of 1103 electrodes. The data, with its high temporal resolution and coverage of a large variety of cortical and sub-cortical brain regions, can help in understanding the speech production process better. Simultaneously, the data can be used to test speech decoding and synthesis approaches from neural data to develop speech Brain-Computer Interfaces and speech neuroprostheses.

Topics: Electrocorticography; Electroencephalography; Humans; Reading; Speech

PubMed: 35869138
DOI: 10.1038/s41597-022-01542-9

Age-Related Changes in Speech and Voice: Spectral and Cepstral Measures.

Journal of Speech, Language, and... Mar 2020

Purpose This study examined differences in selected acoustic measures of speech and voice according to age and sex and across families. Method Participants included 169...

Summary PubMed Full Text PDF

Authors: Sammi Taylor, Christopher Dromey, Shawn L Nissen...

Purpose This study examined differences in selected acoustic measures of speech and voice according to age and sex and across families. Method Participants included 169 individuals, 79 men and 90 women, from 18 families, ranging in age from 17 to 87 years. Participants reported no history of articulation disorders, stroke or active neurologic disease, or severe-to-profound hearing loss. They read aloud two passages to facilitate examination of the following speech and voice acoustic parameters: fricative spectral moments (center of gravity, standard deviation, skewness, and kurtosis), the proportion of time spent speaking, mean speaking fundamental frequency, semitone standard deviation (STSD), and cepstral peak prominence smoothed. Results The results indicated a significant age effect for fricative spectral center of gravity, spectral skewness, and speaking STSD. There was a significant sex effect for spectral center of gravity, spectral kurtosis, and mean fundamental frequency. Familial relationship was significant for spectral skewness, STSD, and cepstral peak prominence smoothed. Conclusions These findings revealed that certain speech and voice features change with age and some change differently for men and women. Additionally, speakers from the same family units may demonstrate similar patterns for prosody, voicing, and articulatory behavior. The results also demonstrated normal differences in speech and voice variation across age, sex, and family unit. Understanding patterns and differences across these demographic variables in healthy speakers is important to distinguishing more confidently between normal and disordered speech and voice patterns clinically.

Topics: Acoustics; Adolescent; Adult; Aged; Aged, 80 and over; Female; Humans; Male; Middle Aged; Speech; Speech Acoustics; Speech Production Measurement; Voice; Young Adult

PubMed: 32097060
DOI: 10.1044/2019_JSLHR-19-00028

What Acoustic Studies Tell Us About Vowels in Developing and Disordered Speech.

American Journal of Speech-language... Aug 2020

Purpose Literature was reviewed on the development of vowels in children's speech and on vowel disorders in children and adults, with an emphasis on studies using... (Review)

Summary PubMed Full Text PDF

Review

Authors: Ray D Kent, Carrie Rountrey

Purpose Literature was reviewed on the development of vowels in children's speech and on vowel disorders in children and adults, with an emphasis on studies using acoustic methods. Method Searches were conducted with PubMed/MEDLINE, Google Scholar, CINAHL, HighWire Press, and legacy sources in retrieved articles. The primary search items included, but were not limited to, vowels, vowel development, vowel disorders, vowel formants, vowel therapy, vowel inherent spectral change, speech rhythm, and prosody. Results/Discussion The main conclusions reached in this review are that vowels are (a) important to speech intelligibility; (b) intrinsically dynamic; (c) refined in both perceptual and productive aspects beyond the age typically given for their phonetic mastery; (d) produced to compensate for articulatory and auditory perturbations; (e) influenced by language and dialect even in early childhood; (f) affected by a variety of speech, language, and hearing disorders in children and adults; (g) inadequately assessed by standardized articulation tests; and (h) characterized by at least three factors-articulatory configuration, extrinsic and intrinsic regulation of duration, and role in speech rhythm and prosody. Also discussed are stages in typical vowel ontogeny, acoustic characterization of rhotic vowels, a sensory-motor perspective on vowel production, and implications for clinical assessment of vowels.

Topics: Acoustics; Adult; Child; Child, Preschool; Humans; Language; Phonetics; Speech; Speech Acoustics; Speech Intelligibility; Speech Perception

PubMed: 32631070
DOI: 10.1044/2020_AJSLP-19-00178

Cross-Modal Somatosensory Repetition Priming and Speech Processing.

Journal of Integrative Neuroscience Aug 2022

Motor speech treatment approaches have been applied in both adults with aphasia and apraxia of speech and children with speech-sound disorders. Identifying links between...

Summary PubMed Full Text

Authors: Aravind K Namasivayam, Tina Yan, Rohan Bali...

BACKGROUND

Motor speech treatment approaches have been applied in both adults with aphasia and apraxia of speech and children with speech-sound disorders. Identifying links between motor speech intervention techniques and the modes of action (MoA) targeted would improve our understanding of how and why motor speech interventions achieve their effects, along with identifying its effective components. The current study focuses on identifying potential MoAs for a specific motor speech intervention technique.

OBJECTIVES

We aim to demonstrate that somatosensory inputs can influence lexical processing, thus providing further evidence that linguistic information stored in the brain and accessed as part of speech perception processes encodes information related to speech production.

METHODS

In a cross-modal repetition priming paradigm, we examined whether the processing of external somatosensory priming cues was modulated by both word-level (lexical frequency, low- or high-frequency) and speech sound articulatory features. The study participants were divided into two groups. The first group consisted of twenty-three native English speakers who received somatosensory priming stimulation to their oro-facial structures (either to labial corners or under the jaw). The second group consisted of ten native English speakers who participated in a control study where somatosensory priming stimulation was applied to their right or left forehead as a control condition.

RESULTS

The results showed significant somatosensory priming effects for the low-frequency words, where the congruent somatosensory condition yielded significantly shorter reaction times and numerically higher phoneme accuracy scores when compared to the incongruent somatosensory condition. Data from the control study did not reveal any systematic priming effects from forehead stimulation (non-speech related site), other than a general (and expected) tendency for longer reaction times with low-frequency words.

CONCLUSIONS

These findings provide further support for the notion that speech production information is represented in the mental lexicon and can be accessed through exogenous Speech-Language Pathologist driven somatosensory inputs related to place of articulation.

Topics: Adult; Child; Humans; Language; Repetition Priming; Speech; Speech Perception

PubMed: 36137962
DOI: 10.31083/j.jin2105146