-
Human Brain Mapping Apr 2022Outliers in neuroimaging represent spurious data or the data of unusual phenotypes that deserve special attention such as clinical follow-up. Outliers have usually been...
Outliers in neuroimaging represent spurious data or the data of unusual phenotypes that deserve special attention such as clinical follow-up. Outliers have usually been detected in a supervised or semi-supervised manner for labeled neuroimaging cohorts. There has been much less work using unsupervised outlier detection on large unlabeled cohorts like the UK Biobank brain imaging dataset. Given its large sample size, rare imaging phenotypes within this unique cohort are of interest, as they are often clinically relevant and could be informative for discovering new processes. Here, we developed a two-level outlier detection and screening methodology to characterize individual outliers from the multimodal MRI dataset of more than 15,000 UK Biobank subjects. In primary screening, using brain ventricles, white matter, cortical thickness, and functional connectivity-based imaging phenotypes, every subject was parameterized with an outlier score per imaging phenotype. Outlier scores of these imaging phenotypes had good-to-excellent test-retest reliability, with the exception of resting-state functional connectivity (RSFC). Due to the low reliability of RSFC outlier scores, RSFC outliers were excluded from further individual-level outlier screening. In secondary screening, the extreme outliers (1,026 subjects) were examined individually, and those arising from data collection/processing errors were eliminated. A representative subgroup of 120 subjects from the remaining non-artifactual outliers were radiologically reviewed, and radiological findings were identified in 97.5% of them. This study establishes an unsupervised framework for investigating rare individual imaging phenotypes within a large neuroimaging cohort.
Topics: Brain; Humans; Magnetic Resonance Imaging; Neuroimaging; Phenotype; Reproducibility of Results
PubMed: 34957633
DOI: 10.1002/hbm.25756 -
Scientific Reports Sep 2023Subspace outlier detection has emerged as a practical approach for outlier detection. Classical full space outlier detection methods become ineffective in high...
Subspace outlier detection has emerged as a practical approach for outlier detection. Classical full space outlier detection methods become ineffective in high dimensional data due to the "curse of dimensionality". Subspace outlier detection methods have great potential to overcome the problem. However, the challenge becomes how to determine which subspaces to be used for outlier detection among a huge number of all subspaces. In this paper, firstly, we propose an intuitive definition of outliers in subspaces. We study the desirable properties of subspaces for outlier detection and investigate the metrics for those properties. Then, a novel subspace outlier detection algorithm with a statistical foundation is proposed. Our method selectively leverages a limited set of the most interesting subspaces for outlier detection. Through experimental validation, we demonstrate that identifying outliers within this reduced set of highly interesting subspaces yields significantly higher accuracy compared to analyzing the entire feature space. We show by experiments that the proposed method outperforms competing subspace outlier detection approaches on real world data sets.
PubMed: 37714878
DOI: 10.1038/s41598-023-42261-4 -
Journal of Experimental Psychology.... Jan 2023According to a growing body of research, human adults are remarkably accurate at extracting intuitive statistics from graphs, such as finding the best-fitting regression...
According to a growing body of research, human adults are remarkably accurate at extracting intuitive statistics from graphs, such as finding the best-fitting regression line through a scatterplot. Here, we ask whether humans can also perform outlier rejection, a nontrivial statistical problem. In three experiments, we investigated human adults' capacity to evaluate the linear trend of a flashed scatterplot comprising 0-4 outlier datapoints. Experiment 1 showed that participants did not spontaneously reject outliers: when outliers were not mentioned, their presence biased the participants' trend judgments and regression line estimates. In Experiment 2, where participants were explicitly asked to exclude outliers, the outlier-induced bias was reduced but remained significant. In Experiment 3, where participants were asked to explicitly detect any outlier before adjusting their regression line, outlier detection was satisfactory, but the detected outliers continued to bias the regression responses, unless they were quite distant from the main regression line. We propose a simple model for outlier detection, based on the computation of a z-score that estimates how far a given datapoint is from the distribution of distances to the regression line, and we show that this model closely approximates human performance. Detection is not rejection, however, and our results suggest that humans can remain biased by outliers that they have detected. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Topics: Humans; Statistics as Topic
PubMed: 36395054
DOI: 10.1037/xhp0001065 -
Pharmaceutical Statistics May 2020Potency bioassays are used to measure biological activity. Consequently, potency is considered a critical quality attribute in manufacturing. Relative potency is...
Potency bioassays are used to measure biological activity. Consequently, potency is considered a critical quality attribute in manufacturing. Relative potency is measured by comparing the concentration-response curves of a manufactured test batch with that of a reference standard. If the curve shapes are deemed similar, the test batch is said to exhibit constant relative potency with the reference standard, a critical requirement for calibrating the potency of the final drug product. Outliers in bioassay potency data may result in the false acceptance/rejection of a bad/good sample and, if accepted, may yield a biased relative potency estimate. To avoid these issues, the USP<1032> recommends the screening of bioassay data for outliers prior to performing a relative potency analysis. In a recently published work, the effects of one or more outliers, outlier size, and outlier type on similarity testing and estimation of relative potency were thoroughly examined, confirming the USP<1032> outlier guidance. As a follow-up, several outlier detection methods, including those proposed by the USP<1010>, are evaluated and compared in this work through computer simulation. Two novel outlier detection methods are also proposed. The effects of outlier removal on similarity testing and estimation of relative potency were evaluated, resulting in recommendations for best practice.
Topics: Biological Assay; Data Interpretation, Statistical; Dose-Response Relationship, Drug; Models, Statistical; Reference Standards; Research Design
PubMed: 31762118
DOI: 10.1002/pst.1984 -
Journal of Biopharmaceutical Statistics 2014The statistical analysis of data can be heavily influenced by measurements of extreme value. If such measurements are contained in the remote tail ends of the true...
The statistical analysis of data can be heavily influenced by measurements of extreme value. If such measurements are contained in the remote tail ends of the true population distribution from which they are drawn, they are referred to as outliers. Neglecting to filter outliers from a sample can distort statistical computations and result in faulty conclusions. Conventional techniques identify measurements, whose distances from the mean exceed a selected multiple of the sample standard deviation, as outliers. Such approaches, however, can fail to classify measurements with large normalized distances as outliers. The truncated outlier filtering method first replaces the minimum and maximum of the population before computing the exclusion criterion. This mitigates the influence of abnormally large (or small) measurements on the normalized distance and hence yields a more compact criterion for outlier determination. Moreover, the method generalizes to two or more dimensions. Simulated one-dimensional and multidimensional data are analyzed. A discussion of the results is also presented.
Topics: Algorithms; Clinical Trials as Topic; Data Interpretation, Statistical; Humans; Models, Statistical; Multivariate Analysis; Normal Distribution; Sample Size; Signal-To-Noise Ratio
PubMed: 24915513
DOI: 10.1080/10543406.2014.926366 -
Entropy (Basel, Switzerland) Apr 2023Data for complex plasma-wall interactions require long-running and expensive computer simulations. Furthermore, the number of input parameters is large, which results in...
Data for complex plasma-wall interactions require long-running and expensive computer simulations. Furthermore, the number of input parameters is large, which results in low coverage of the (physical) parameter space. Unpredictable occasions of outliers create a need to conduct the exploration of this multi-dimensional space using robust analysis tools. We restate the Gaussian process (GP) method as a Bayesian adaptive exploration method for establishing surrogate surfaces in the variables of interest. On this basis, we expand the analysis by the Student-t process (TP) method in order to improve the robustness of the result with respect to outliers. The most obvious difference between both methods shows up in the marginal likelihood for the hyperparameters of the covariance function, where the TP method features a broader marginal probability distribution in the presence of outliers. Eventually, we provide first investigations, with a mixture likelihood of two Gaussians within a Gaussian process ansatz for describing either outlier or non-outlier behavior. The parameters of the two Gaussians are set such that the mixture likelihood resembles the shape of a Student-t likelihood.
PubMed: 37190472
DOI: 10.3390/e25040685 -
International Journal of Computer... Dec 2018Matching points that are derived from features or landmarks in image data is a key step in some medical imaging applications. Since most robust point matching algorithms... (Review)
Review
PURPOSE
Matching points that are derived from features or landmarks in image data is a key step in some medical imaging applications. Since most robust point matching algorithms claim to be able to deal with outliers, users may place high confidence in the matching result and use it without further examination. However, for tasks such as feature-based registration in image-guided neurosurgery, even a few mismatches, in the form of invalid displacement vectors, could cause serious consequences. As a result, having an effective tool by which operators can manually screen all matches for outliers could substantially benefit the outcome of those applications.
METHODS
We introduce a novel variogram-based outlier screening method for vectors. The variogram is a powerful geostatistical tool for characterizing the spatial dependence of stochastic processes. Since the spatial correlation of invalid displacement vectors, which are considered as vector outliers, tends to behave differently than normal displacement vectors, they can be efficiently identified on the variogram.
RESULTS
We validate the proposed method on 9 sets of clinically acquired ultrasound data. In the experiment, potential outliers are flagged on the variogram by one operator and further evaluated by 8 experienced medical imaging researchers. The matching quality of those potential outliers is approximately 1.5 lower, on a scale from 1 (bad) to 5 (good), than valid displacement vectors.
CONCLUSION
The variogram is a simple yet informative tool. While being used extensively in geostatistical analysis, it has not received enough attention in the medical imaging field. We believe there is a good deal of potential for clinically applying the proposed outlier screening method. By way of this paper, we also expect researchers to find variogram useful in other medical applications that involve motion vectors analyses.
Topics: Algorithms; Humans; Image Interpretation, Computer-Assisted; Neurosurgical Procedures; Surgery, Computer-Assisted
PubMed: 30097956
DOI: 10.1007/s11548-018-1840-5 -
Proceedings of SPIE--the International... 2020Abdominal multi-organ segmentation of computed tomography (CT) images has been the subject of extensive research interest. It presents a substantial challenge in medical...
Abdominal multi-organ segmentation of computed tomography (CT) images has been the subject of extensive research interest. It presents a substantial challenge in medical image processing, as the shape and distribution of abdominal organs can vary greatly among the population and within an individual over time. While continuous integration of novel datasets into the training set provides potential for better segmentation performance, collection of data at scale is not only costly, but also impractical in some contexts. Moreover, it remains unclear what marginal value additional data have to offer. Herein, we propose a single-pass active learning method through human quality assurance (QA). We built on a pre-trained 3D U-Net model for abdominal multi-organ segmentation and augmented the dataset either with outlier data (e.g., exemplars for which the baseline algorithm failed) or inliers (e.g., exemplars for which the baseline algorithm worked). The new models were trained using the augmented datasets with 5-fold cross-validation (for outlier data) and withheld outlier samples (for inlier data). Manual labeling of outliers increased Dice scores with outliers by 0.130, compared to an increase of 0.067 with inliers (p<0.001, two-tailed paired t-test). By adding 5 to 37 inliers or outliers to training, we find that the marginal value of adding outliers is higher than that of adding inliers. In summary, improvement on single-organ performance was obtained without diminishing multi-organ performance or significantly increasing training time. Hence, identification and correction of baseline failure cases present an effective and efficient method of selecting training data to improve algorithm performance.
PubMed: 33907347
DOI: 10.1117/12.2549365 -
Biometrics Dec 2020The aim of plant breeding trials is often to identify crop variety that are well adapt to target environments. These varieties are identified through genomic prediction...
The aim of plant breeding trials is often to identify crop variety that are well adapt to target environments. These varieties are identified through genomic prediction from the analysis of multi-environmental field trial (MET) using linear mixed models. The occurrence of outliers in MET is common and known to adversely impact the accuracy of genomic prediction yet the detection of outliers are often neglected. A number of reasons stand for this-first, complex data such as a MET give rise to distinct levels of residuals (eg, at a trial level or individual observation level). This complexity offers additional challenges for an outlier detection method. Second, many linear mixed model software packages that cater for complex variance structures needed in the analysis of MET are not well streamlined for diagnostics by practitioners. We demonstrate outlier detection methods that are simple to implement in any linear mixed model software packages and computationally fast. Although these methods are not optimal methods in outlier detection, they offer practical value for ease of application in the analysis pipeline of regularly collected data. These are demonstrated using simulation based on two real bread wheat yield METs. In particular, models that consider analysis of yield trials either independently or jointly (thus borrowing strength across trials) are considered. Case studies are presented to highlight benefit of joint analysis for outlier detection.
Topics: Genomics; Linear Models
PubMed: 31950486
DOI: 10.1111/biom.13216 -
Attention, Perception & Psychophysics Feb 2024Ensemble perception allows our visual system to process large amounts of information efficiently by summarizing its statistical properties. A key aspect of ensemble...
Ensemble perception allows our visual system to process large amounts of information efficiently by summarizing its statistical properties. A key aspect of ensemble perception is the devaluation of outlying elements, which leads to more informative summary statistics with reduced variance and a more representative mean. However, the mechanisms underlying this outlier rejection process are not well understood. One possibility is that outliers are selectively excluded before summarization. To test this, we investigated whether only weaker items were excluded from averaging. We manipulated the encoding strength of items in a display by changing the emotional intensities of faces, the spatial location of emotional outliers, and the spatial distribution of emotional faces. We found that the response to outliers varied depending on their location. Specifically, outliers were more likely to be excluded from averaging when presented in more peripheral regions, while their exclusion was partial in parafoveal regions. In other words, outlier rejection in ensemble processing is more flexible than the supposed rigid designation of weighting against outliers. Alternatively, the results fit well with hierarchically structured pooling, during which outliers are discounted more dynamically without positing any separate selective mechanism before summarization. We propose an explanation for outlier rejection in light of a recently proposed population response model of ensemble processing.
Topics: Humans; Emotions
PubMed: 38191757
DOI: 10.3758/s13414-023-02842-x