outlier - OpenMD.com Journal Search

Outlier Detection for Mass Spectrometric Data.

Methods in Molecular Biology (Clifton,... 2016

Mass spectrometry data are often generated from various biological or chemical experiments. However, due to technical reasons, outlying observations are often obtained,...

Summary PubMed

Authors: HyungJun Cho, Soo-Heang Eo

Mass spectrometry data are often generated from various biological or chemical experiments. However, due to technical reasons, outlying observations are often obtained, some of which may be extreme. Identifying the causes of outlying observations is important in the analysis of replicated MS data because elaborate pre-processing is essential in order to obtain successful analyses with reliable results, and because manual outlier detection is a time-consuming pre-processing step. It is natural to measure the variability of observations using standard deviation or interquartile range calculations, and in this work, these criteria for identifying outliers are presented. However, the low replicability and the heterogeneity of variability are often obstacles to outlier detection. Therefore, quantile regression methods for identifying outliers with low replication are also presented. The procedures are illustrated with artificial and real examples, while a software program is introduced to demonstrate how to apply these procedures in the R environment system.

Topics: Mass Spectrometry; Regression Analysis; Reproducibility of Results

PubMed: 26519171
DOI: 10.1007/978-1-4939-3106-4_5

Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier.

Genes Feb 2023

Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence,...

Summary PubMed Full Text PDF

Authors: Magdalena Kircher, Josefin Säurich, Michael Selle...

Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model performance cannot be reproduced on independent data. It is then also doubtful whether a classifier qualifies for clinical usage. We estimate classifier performances in simulated gene expression data with artificial outliers and in two real-world datasets. As a new approach, we use two outlier detection methods within a bootstrap procedure to estimate the outlier probability for each sample and evaluate classifiers before and after outlier removal by means of cross-validation. We found that the removal of outliers changed the classification performance notably. For the most part, removing outliers improved the classification results. Taking into account the fact that there are various, sometimes unclear reasons for a sample to be an outlier, we strongly advocate to always report the performance of a transcriptomics classifier with and without outliers in training and test data. This provides a more diverse picture of a classifier's performance and prevents reporting models that later turn out to be not applicable for clinical diagnoses.

Topics: Transcriptome; Gene Expression Profiling; Probability; Research Design

PubMed: 36833313
DOI: 10.3390/genes14020387

Compressed Submanifold Multifactor Analysis.

IEEE Transactions on Pattern Analysis... Mar 2017

Although widely used, Multilinear PCA (MPCA), one of the leading multilinear analysis methods, still suffers from four major drawbacks. First, it is very sensitive to...

Summary PubMed

Authors: Khoa Luu, Marios Savvides, Tien Bui...

Although widely used, Multilinear PCA (MPCA), one of the leading multilinear analysis methods, still suffers from four major drawbacks. First, it is very sensitive to outliers and noise. Second, it is unable to cope with missing values. Third, it is computationally expensive since MPCA deals with large multi-dimensional datasets. Finally, it is unable to maintain the local geometrical structures due to the averaging process. This paper proposes a novel approach named Compressed Submanifold Multifactor Analysis (CSMA) to solve the four problems mentioned above. Our approach can deal with the problem of missing values and outliers via SVD-L1. The Random Projection method is used to obtain the fast low-rank approximation of a given multifactor dataset. In addition, it is able to preserve the geometry of the original data. Our CSMA method can be used efficiently for multiple purposes, e.g. noise and outlier removal, estimation of missing values, biometric applications. We show that CSMA method can achieve good results and is very efficient in the inpainting problem as compared to [1], [2]. Our method also achieves higher face recognition rates compared to LRTC, SPMA, MPCA and some other methods, i.e. PCA, LDA and LPP, on three challenging face databases, i.e. CMU-MPIE, CMU-PIE and Extended YALE-B.

PubMed: 27101597
DOI: 10.1109/TPAMI.2016.2554107

Outliers in diffusion-weighted MRI: Exploring detection models and mitigation strategies.

NeuroImage Dec 2023

Diffusion-weighted MRI (dMRI) is a medical imaging method that can be used to investigate the brain microstructure and structural connections between different brain...

Summary PubMed

Authors: Viljami Sairanen, Jesper Andersson

Diffusion-weighted MRI (dMRI) is a medical imaging method that can be used to investigate the brain microstructure and structural connections between different brain regions. The method, however, requires relatively complex data processing frameworks and analysis pipelines. Many of these approaches are vulnerable to signal dropout artefacts that can originate from subjects moving their head during the scan. To combat these artefacts and eliminate such outliers, researchers have proposed two approaches: to replace outliers or to downweight outliers during modelling and analysis. With the rising interest in dMRI for clinical research, these types of corrections are increasingly important. Therefore, we set out to investigate the differences between outlier replacement and weighting approaches to help the dMRI community to select the best tool for their data processing pipelines. We evaluated dMRI motion correction registration and single tensor model fit pipelines using Gaussian Process and Spherical Harmonic based replacement approaches and outlier downweighting using highly realistic whole-brain simulations. As a proof of concept, we applied these approaches to dMRI infant data sets that contained varying numbers of dropout artefacts. Based on our results, we concluded that the Gaussian Process based outlier replacement provided similar tensor fit results to Gaussian Process based outlier detection and downweighting. Therefore, if only the least-squares estimate of the single tensor model is of interest, our recommendation is to use outlier replacement. However, outlier downweighting can potentially provide a more accurate estimate of the model precision which could be relevant for applications such as probabilistic tractoraphy.

Topics: Humans; Algorithms; Diffusion Magnetic Resonance Imaging; Brain; Artifacts; Least-Squares Analysis

PubMed: 37820862
DOI: 10.1016/j.neuroimage.2023.120397

Multi-site, multi-pollutant atmospheric data analysis using Riemannian geometry.

The Science of the Total Environment Sep 2023

We demonstrate the benefits of using Riemannian geometry in the analysis of multi-site, multi-pollutant atmospheric monitoring data. Our approach uses covariance...

Summary PubMed

Authors: Alexander Smith, Jinxi Hua, Benjamin de Foy...

We demonstrate the benefits of using Riemannian geometry in the analysis of multi-site, multi-pollutant atmospheric monitoring data. Our approach uses covariance matrices to encode spatio-temporal variability and correlations of multiple pollutants at different sites and times. A key property of covariance matrices is that they lie on a Riemannian manifold and one can exploit this property to facilitate dimensionality reduction, outlier detection, and spatial interpolation. Specifically, the transformation of data using Reimannian geometry provides a better data surface for interpolation and assessment of outliers compared to traditional data analysis tools that assume Euclidean geometry. We demonstrate the utility of using Riemannian geometry by analyzing a full year of atmospheric monitoring data collected from 34 monitoring stations in Beijing, China.

Topics: Algorithms; Environmental Pollutants; Data Analysis; Beijing; China

PubMed: 37230339
DOI: 10.1016/j.scitotenv.2023.164064

Anomalous values and missing data in clinical and experimental studies.

Jornal Vascular Brasileiro May 2019

During analysis of scientific research data, it is customary to encounter anomalous values or missing data. Anomalous values can be the result of errors of recording,... (Review)

Summary PubMed Full Text PDF

Review

Authors: Hélio Amante Miot

During analysis of scientific research data, it is customary to encounter anomalous values or missing data. Anomalous values can be the result of errors of recording, typing, measurement by instruments, or may be true outliers. This review discusses concepts, examples and methods for identifying and dealing with such contingencies. In the case of missing data, techniques for imputation of the values are discussed in, order to avoid exclusion of the research subject, if it is not possible to retrieve information from registration forms or to re-address the participant.

PubMed: 31320882
DOI: 10.1590/1677-5449.190004

Outlier-response pattern checks to improve measurement with the Montgomery-Asberg depression rating scale (MADRS).

Journal of Affective Disorders Feb 2022

Symptom manifestations in affective disorders can be subtle. Small imprecisions in measurement can lead to incorrect estimation of change. Previously, expert-derived...

Summary PubMed

Authors: Jonathan Rabinowitz, Alon A Rabinowitz

Symptom manifestations in affective disorders can be subtle. Small imprecisions in measurement can lead to incorrect estimation of change. Previously, expert-derived scoring inconsistency flags were developed for MADRS. Currently, we derive empirically based outlier-pattern flags, to further detect imprecisions in ratings. NEWMEDS data repository of almost 25,000 MADRS administrations from 11 registration trials of antidepressants was used to identify outlier response patterns reflecting potentially careless responses. Coverage of these flags was compared to previously published expert derived flags. Both sets of flags were also further tested in Monte Carlo simulated data as a proxy to applying flags under conditions of known inconsistency. The outlier flags derived provide cutting points to identify: (1) under and overuse of values (e.g., Scoring "1″ on 6 or more items), (2) disproportionate use of even or odd response choices (e.g., 8 or more odd values), (3) longest consecutive use of value (e.g., more than 5 items in a row scored with same value), (4) high variability within administration (standard deviation greater than 1.8), (5) outlier responses on multiple items (i.e., multivariate outliers), and (6) outlier scoring (e.g., scoring 4,5 or 6 on item 1). Outlier response flags were raised in 26% of the MADRS administration and in 97% of the Monte Carlo data. Of administrations with no expert flag, 21.7% had an outlier flag and of administrations with at least one expert flag, 27.7% also had an outlier flag. Outlier-pattern flags appear to be a useful adjunct to expert derived flags in the quest to improve measurement in clinical trials.

Topics: Antidepressive Agents; Depression; Humans; Mood Disorders; Psychiatric Status Rating Scales; Reproducibility of Results

PubMed: 34952105
DOI: 10.1016/j.jad.2021.12.076

Deep Outlier Handling for Image Deblurring.

IEEE Transactions on Image Processing :... 2021

Outlier handling has attracted considerable attention recently but remains challenging for image deblurring. Existing approaches mainly depend on iterative outlier...

Summary PubMed

Authors: Jiangxin Dong, Jinshan Pan

Outlier handling has attracted considerable attention recently but remains challenging for image deblurring. Existing approaches mainly depend on iterative outlier detection steps to explicitly or implicitly reduce the influence of outliers on image deblurring. However, these outlier detection steps usually involve heuristic operations and iterative optimization processes, which are complex and time-consuming. In contrast, we propose to learn a deep convolutional neural network to directly estimate the confidence map, which can identify reliable inliers and outliers from the blurred image and thus facilitates the following deblurring process. We analyze that the proposed algorithm incorporated with the learned confidence map is effective in handling outliers and does not require ad-hoc outlier detection steps which are critical to existing outlier handling methods. Compared to existing approaches, the proposed algorithm is more efficient and can be applied to both non-blind and blind image deblurring. Extensive experimental results demonstrate that the proposed algorithm performs favorably against state-of-the-art methods in terms of accuracy and efficiency.

PubMed: 33417555
DOI: 10.1109/TIP.2020.3048679

Outlier Guided Optimization of Abdominal Segmentation.

Proceedings of SPIE--the International... 2020

Abdominal multi-organ segmentation of computed tomography (CT) images has been the subject of extensive research interest. It presents a substantial challenge in medical...

Summary PubMed Full Text PDF

Authors: Yuchen Xu, Olivia Tang, Yucheng Tang...

Abdominal multi-organ segmentation of computed tomography (CT) images has been the subject of extensive research interest. It presents a substantial challenge in medical image processing, as the shape and distribution of abdominal organs can vary greatly among the population and within an individual over time. While continuous integration of novel datasets into the training set provides potential for better segmentation performance, collection of data at scale is not only costly, but also impractical in some contexts. Moreover, it remains unclear what marginal value additional data have to offer. Herein, we propose a single-pass active learning method through human quality assurance (QA). We built on a pre-trained 3D U-Net model for abdominal multi-organ segmentation and augmented the dataset either with outlier data (e.g., exemplars for which the baseline algorithm failed) or inliers (e.g., exemplars for which the baseline algorithm worked). The new models were trained using the augmented datasets with 5-fold cross-validation (for outlier data) and withheld outlier samples (for inlier data). Manual labeling of outliers increased Dice scores with outliers by 0.130, compared to an increase of 0.067 with inliers (p<0.001, two-tailed paired t-test). By adding 5 to 37 inliers or outliers to training, we find that the marginal value of adding outliers is higher than that of adding inliers. In summary, improvement on single-organ performance was obtained without diminishing multi-organ performance or significantly increasing training time. Hence, identification and correction of baseline failure cases present an effective and efficient method of selecting training data to improve algorithm performance.

PubMed: 33907347
DOI: 10.1117/12.2549365