-
Hearing Research Dec 2022Speech perception is strongly affected by noise and reverberation in the listening room, and binaural processing can substantially facilitate speech perception in... (Review)
Review
Speech perception is strongly affected by noise and reverberation in the listening room, and binaural processing can substantially facilitate speech perception in conditions when target speech and maskers originate from different directions. Most studies and proposed models for predicting spatial unmasking have focused on speech intelligibility. The present study introduces a model framework that predicts both speech intelligibility and perceived listening effort from the same output measure. The framework is based on a combination of a blind binaural processing stage employing a blind equalization cancelation (EC) mechanism, and a blind backend based on phoneme probability classification. Neither frontend nor backend require any additional information, such as the source directions, the signal-to-noise ratio (SNR), or the number of sources, allowing for a fully blind perceptual assessment of binaural input signals consisting of target speech mixed with noise. The model is validated against a recent data set in which speech intelligibility and perceived listening effort were measured for a range of acoustic conditions differing in reverberation and binaural cues [Rennies and Kidd (2018), J. Acoust. Soc. Am. 144, 2147-2159]. Predictions of the proposed model are compared with a non-blind binaural model consisting of a non-blind EC stage and a backend based on the speech intelligibility index. The analyses indicated that all main trends observed in the experiments were correctly predicted by the blind model. The overall proportion of variance explained by the model (R² = 0.94) for speech intelligibility was slightly worse than for the non-blind model (R² = 0.98). For listening effort predictions, both models showed lower prediction accuracy, but still explained significant proportions of the observed variance (R² = 0.88 and R² = 0.71 for the non-blind and blind model, respectively). Closer inspection showed that the differences between data and predictions were largest for binaural conditions at high SNRs, where the perceived listening effort of human listeners tended to be underestimated by the models, specifically by the blind version.
Topics: Humans; Speech Intelligibility; Listening Effort; Noise; Speech Perception; Signal-To-Noise Ratio; Perceptual Masking
PubMed: 35995688
DOI: 10.1016/j.heares.2022.108598 -
IEEE Transactions on Haptics 2021Masking has been used to study human perception of tactile stimuli, including those created by electrovibration on touch screens. Earlier studies have investigated the...
Masking has been used to study human perception of tactile stimuli, including those created by electrovibration on touch screens. Earlier studies have investigated the effect of on-site masking on tactile perception of electrovibration. In this article, we investigated whether it is possible to change the absolute detection threshold and intensity difference threshold of electrovibration at the fingertip of index finger via remote masking, i.e., by applying a (mechanical) vibrotactile stimulus on the proximal phalanx of the same finger. The masking stimuli were generated by a voice coil (the Haptuator). For 16 participants, we first measured the detection thresholds for electrovibration at the fingertip and for vibrotactile stimuli at the proximal phalanx. Then, the vibrations on the skin were measured at four different locations on the index finger of subjects to investigate how the mechanical masking stimulus propagated as the masking level was varied. Later, masked absolute thresholds of eight participants were measured. Finally, for another group of eight participants, intensity difference thresholds were measured in the presence/absence of vibrotactile masking stimuli. Our results show that vibrotactile masking stimuli generated sub-threshold vibrations around the fingertip, and hence, probably did not mechanically interfere with the electrovibration stimulus. However, there was a clear psychophysical masking effect due to central neural processes. We measured the effect of masking stimuli, up to 40 dB SL, on the difference threshold at four different intensity standards of electrovibration. We proposed two models based on hypothetical neural signals for prediction of the masking effect on intensity difference thresholds for electrovibration: amplitude and energy models. The energy model was able to predict the effect of masking more accurately, especially at high intensity masking levels.
Topics: Differential Threshold; Fingers; Humans; Perceptual Masking; Sensory Thresholds; Touch Perception; Vibration
PubMed: 32960768
DOI: 10.1109/TOH.2020.3025772 -
The Journal of the Acoustical Society... Feb 2024This study examined the role of visual speech in providing release from perceptual masking in children by comparing visual speech benefit across conditions with and...
This study examined the role of visual speech in providing release from perceptual masking in children by comparing visual speech benefit across conditions with and without a spatial separation cue. Auditory-only and audiovisual speech recognition thresholds in a two-talker speech masker were obtained from 21 children with typical hearing (7-9 years of age) using a color-number identification task. The target was presented from a loudspeaker at 0° azimuth. Masker source location varied across conditions. In the spatially collocated condition, the masker was also presented from the loudspeaker at 0° azimuth. In the spatially separated condition, the masker was presented from the loudspeaker at 0° azimuth and a loudspeaker at -90° azimuth, with the signal from the -90° loudspeaker leading the signal from the 0° loudspeaker by 4 ms. The visual stimulus (static image or video of the target talker) was presented at 0° azimuth. Children achieved better thresholds when the spatial cue was provided and when the visual cue was provided. Visual and spatial cue benefit did not differ significantly depending on the presence of the other cue. Additional studies are needed to characterize how children's preferential use of visual and spatial cues varies depending on the strength of each cue.
Topics: Child; Humans; Perceptual Masking; Cues; Noise; Speech Perception; Hearing
PubMed: 38393738
DOI: 10.1121/10.0024766 -
The Journal of the Acoustical Society... Oct 2023Modern hearing research has identified the ability of listeners to segregate simultaneous speech streams with a reliance on three major voice cues, fundamental...
Modern hearing research has identified the ability of listeners to segregate simultaneous speech streams with a reliance on three major voice cues, fundamental frequency, level, and location. Few of these studies evaluated reliance for these cues presented simultaneously as occurs in nature, and fewer still considered the listeners' relative reliance on these cues owing to the cues' different units of measure. In the present study trial-by-trial analyses were used to isolate the listener's simultaneous reliance on the three voice cues, with the behavior of an ideal observer [Green and Swets (1966). (Wiley, New York), pp.151-178] serving as a comparison standard for evaluating relative reliance. Listeners heard on each trial a pair of randomly selected, simultaneous recordings of naturally spoken sentences. One of the recordings was always from the same talker, a distracter, and the other, with equal probability, was from one of two target talkers differing in the three voice cues. The listener's task was to identify the target talker. Among 33 clinically normal-hearing adults only one relied predominantly on voice level, the remaining were split between voice fundamental frequency and/or location. The results are discussed regarding their implications for the common practice in studies of using target-distracter level as a dependent measure of speech-on-speech masking.
Topics: Speech; Cues; Speech Perception; Perceptual Masking; Hearing
PubMed: 37870932
DOI: 10.1121/10.0021874 -
Hearing Research Sep 2023The relative contributions of superior temporal vs. inferior frontal and parietal networks to recognition of speech in a background of competing speech remain unclear,...
The relative contributions of superior temporal vs. inferior frontal and parietal networks to recognition of speech in a background of competing speech remain unclear, although the contributions themselves are well established. Here, we use fMRI with spectrotemporal modulation transfer function (ST-MTF) modeling to examine the speech information represented in temporal vs. frontoparietal networks for two speech recognition tasks with and without a competing talker. Specifically, 31 listeners completed two versions of a three-alternative forced choice competing speech task: "Unison" and "Competing", in which a female (target) and a male (competing) talker uttered identical or different phrases, respectively. Spectrotemporal modulation filtering (i.e., acoustic distortion) was applied to the two-talker mixtures and ST-MTF models were generated to predict brain activation from differences in spectrotemporal-modulation distortion on each trial. Three cortical networks were identified based on differential patterns of ST-MTF predictions and the resultant ST-MTF weights across conditions (Unison, Competing): a bilateral superior temporal (S-T) network, a frontoparietal (F-P) network, and a network distributed across cortical midline regions and the angular gyrus (M-AG). The S-T network and the M-AG network responded primarily to spectrotemporal cues associated with speech intelligibility, regardless of condition, but the S-T network responded to a greater range of temporal modulations suggesting a more acoustically driven response. The F-P network responded to the absence of intelligibility-related cues in both conditions, but also to the absence (presence) of target-talker (competing-talker) vocal pitch in the Competing condition, suggesting a generalized response to signal degradation. Task performance was best predicted by activation in the S-T and F-P networks, but in opposite directions (S-T: more activation = better performance; F-P: vice versa). Moreover, S-T network predictions were entirely ST-MTF mediated while F-P network predictions were ST-MTF mediated only in the Unison condition, suggesting an influence from non-acoustic sources (e.g., informational masking) in the Competing condition. Activation in the M-AG network was weakly positively correlated with performance and this relation was entirely superseded by those in the S-T and F-P networks. Regarding contributions to speech recognition, we conclude: (a) superior temporal regions play a bottom-up, perceptual role that is not qualitatively dependent on the presence of competing speech; (b) frontoparietal regions play a top-down role that is modulated by competing speech and scales with listening effort; and (c) performance ultimately relies on dynamic interactions between these networks, with ancillary contributions from networks not involved in speech processing per se (e.g., the M-AG network).
Topics: Male; Humans; Female; Speech; Speech Perception; Cognition; Cues; Acoustics; Speech Intelligibility; Perceptual Masking
PubMed: 37531847
DOI: 10.1016/j.heares.2023.108856 -
Vision Research Jun 2024Recent studies suggest that binocular adding S+ and differencing S- channels play an important role in binocular vision. To test for such a role in the context of...
Recent studies suggest that binocular adding S+ and differencing S- channels play an important role in binocular vision. To test for such a role in the context of binocular contrast detection and binocular summation, we employed a surround masking paradigm consisting of a central target disk surrounded by a mask annulus. All stimuli were horizontally oriented 0.5c/d sinusoidal gratings. Correlated stimuli were identical in interocular spatial phase while anticorrelated stimuli were opposite in interocular spatial phase. There were four target conditions: monocular left eye, monocular right eye, binocular correlated and binocular anticorrelated, and three surround mask conditions: no surround, binocularly correlated and binocularly anticorrelated. We observed consistent elevation of detection thresholds for monocular and binocular targets across the two binocular surround mask conditions. In addition, we found an interaction between the type of surround and the type of binocular target: both detection and summation were relatively enhanced by surround masks and targets with opposite interocular phase relationships and reduced by surround masks and targets with the same interocular phase relationships. The data were reasonably well accounted for by a model of binocular combination termed MAX (S+S-), in which the decision variable is the probability summation of modeled S+ and S- channel responses, with a free parameter determining the relative gains of the two channels. Our results support the existence of two channels involved in binocular combination, S+ and S-, whose relative gains are adjustable by surround context.
Topics: Humans; Vision, Binocular; Perceptual Masking; Contrast Sensitivity; Sensory Thresholds; Photic Stimulation; Psychophysics; Vision, Monocular; Adult
PubMed: 38640684
DOI: 10.1016/j.visres.2024.108396 -
Communications Biology Oct 2022Sound in noise is better detected or understood if target and masking sources originate from different locations. Mammalian physiology suggests that the...
Sound in noise is better detected or understood if target and masking sources originate from different locations. Mammalian physiology suggests that the neurocomputational process that underlies this binaural unmasking is based on two hemispheric channels that encode interaural differences in their relative neuronal activity. Here, we introduce a mathematical formulation of the two-channel model - the complex-valued correlation coefficient. We show that this formulation quantifies the amount of temporal fluctuations in interaural differences, which we suggest underlie binaural unmasking. We applied this model to an extensive library of psychoacoustic experiments, accounting for 98% of the variance across eight studies. Combining physiological plausibility with its success in explaining behavioral data, the proposed mechanism is a significant step towards a unified understanding of binaural unmasking and the encoding of interaural differences in general.
Topics: Humans; Perceptual Masking; Auditory Threshold; Noise; Sound
PubMed: 36273085
DOI: 10.1038/s42003-022-04098-x -
Journal of Psycholinguistic Research Jun 2022The "recycling hypothesis" posits that the word recognition system is built upon minimal modifications to the neural architecture used in object recognition. In two...
The "recycling hypothesis" posits that the word recognition system is built upon minimal modifications to the neural architecture used in object recognition. In two masked priming lexical decision studies, we examined whether "mirror generalization," a phenomenon in object recognition, occurs in word recognition. In Study 1, we found that mirrored repetition and mirrored transposed letter primes elicited significant and equivalent priming effects for mirrored targets. In Study 2, we found that mirrored and non-mirrored repetition primes both significantly facilitated processing of mirrored targets, but the priming effect was much larger for non-mirrored primes. In both studies, we also found evidence of gender differences as females showed faster response times and a larger mirror priming effect compared to males. Taken together, we conclude that mirror generalization occurs in the early orthographic stage of word recognition, but not in the later stage of lexical access, and there is a gender difference when reading mirror words.
Topics: Female; Humans; Male; Motor Activity; Pattern Recognition, Visual; Perceptual Masking; Reaction Time; Reading; Visual Perception
PubMed: 35267127
DOI: 10.1007/s10936-022-09857-9 -
The Journal of the Acoustical Society... Mar 2021Frequency selectivity in the amplitude modulation (AM) domain has been demonstrated using both simultaneous AM masking and forward AM masking. This has been explained...
Frequency selectivity in the amplitude modulation (AM) domain has been demonstrated using both simultaneous AM masking and forward AM masking. This has been explained using the concept of a modulation filter bank (MFB). Here, we assessed whether the MFB occurs before or after the point of binaural interaction in the auditory pathway by using forward masking in the AM domain in an ipsilateral condition (masker AM and signal AM applied to the left ear with an unmodulated carrier in the right ear) and a contralateral condition (masker AM applied to the right ear and signal AM applied to the left ear). The carrier frequency was 8 kHz, the signal AM frequency, f, was 40 or 80 Hz, and the masker AM frequency ranged from 0.25 to 4 times f. Contralateral forward AM masking did occur, but it was smaller than ipsilateral AM masking. Tuning in the AM domain was slightly sharper for ipsilateral than for contralateral masking, perhaps reflecting confusion of the signal and masker AM in the ipsilateral condition when their AM frequencies were the same. The results suggest that there might be an MFB both before and after the point in the auditory pathway where binaural interaction occurs.
Topics: Auditory Threshold; Perceptual Masking
PubMed: 33765781
DOI: 10.1121/10.0003598 -
Trends in Hearing 2022Identification of speech from a "target" talker was measured in a speech-on-speech masking task with two simultaneous "masker" talkers. The overall level of each talker...
Identification of speech from a "target" talker was measured in a speech-on-speech masking task with two simultaneous "masker" talkers. The overall level of each talker was either fixed or randomized throughout each stimulus presentation to investigate the effectiveness of level as a cue for segregating competing talkers and attending to the target. Experimental manipulations included varying the level difference between talkers and imposing three types of target level uncertainty: 1) fixed target level across trials, 2) random target level across trials, or 3) random target levels on a word-by-word basis within a trial. When the target level was predictable performance was better than corresponding conditions when the target level was uncertain. Masker confusions were consistent with a high degree of informational masking (IM). Furthermore, evidence was found for "tuning" in level and a level "release" from IM. These findings suggest that conforming to listener expectation about relative level, in addition to cues signaling talker identity, facilitates segregation of, and maintaining focus of attention on, a specific talker in multiple-talker communication situations.
Topics: Cues; Humans; Perceptual Masking; Speech; Speech Perception; Uncertainty
PubMed: 35238259
DOI: 10.1177/23312165221077555