-
IEEE Transactions on Image Processing :... 2022Outlier detection is to separate anomalous data from inliers in the dataset. Recently, the most deep learning methods of outlier detection leverage an auxiliary...
Outlier detection is to separate anomalous data from inliers in the dataset. Recently, the most deep learning methods of outlier detection leverage an auxiliary reconstruction task by assuming that outliers are more difficult to recover than normal samples (inliers). However, it is not always true in deep auto-encoder (AE) based models. The auto-encoder based detectors may recover certain outliers even if outliers are not in the training data, because they do not constrain the feature learning. Instead, we think outlier detection can be done in the feature space by measuring the distance between outliers' features and the consistency feature of inliers. To achieve this, we propose an unsupervised outlier detection method using a memory module and a contrastive learning module (MCOD). The memory module constrains the consistency of features, which merely represent the normal data. The contrastive learning module learns more discriminative features, which boosts the distinction between outliers and inliers. Extensive experiments on four benchmark datasets show that our proposed MCOD performs well and outperforms eleven state-of-the-art methods.
Topics: Algorithms; Learning
PubMed: 36215361
DOI: 10.1109/TIP.2022.3211476 -
IEEE Transactions on Pattern Analysis... Sep 2019Multi-dimensional scaling (MDS) plays a central role in data-exploration, dimensionality reduction and visualization. State-of-the-art MDS algorithms are not robust to...
Multi-dimensional scaling (MDS) plays a central role in data-exploration, dimensionality reduction and visualization. State-of-the-art MDS algorithms are not robust to outliers, yielding significant errors in the embedding even when only a handful of outliers are present. In this paper, we introduce a technique to detect and filter outliers based on geometric reasoning. We test the validity of triangles formed by three points, and mark a triangle as broken if its triangle inequality does not hold. The premise of our work is that unlike inliers, outlier distances tend to break many triangles. Our method is tested and its performance is evaluated on various datasets and distributions of outliers. We demonstrate that for a reasonable amount of outliers, e.g., under 20 percent, our method is effective, and leads to a high embedding quality.
PubMed: 29994700
DOI: 10.1109/TPAMI.2018.2851513 -
Sensors (Basel, Switzerland) May 2020Geometric model fitting is a fundamental issue in computer vision, and the fitting accuracy is affected by outliers. In order to eliminate the impact of the outliers,...
Geometric model fitting is a fundamental issue in computer vision, and the fitting accuracy is affected by outliers. In order to eliminate the impact of the outliers, the inlier threshold or scale estimator is usually adopted. However, a single inlier threshold cannot satisfy multiple models in the data, and scale estimators with a certain noise distribution model work poorly in geometric model fitting. It can be observed that the residuals of outliers are big for all true models in the data, which makes the consensus of the outliers. Based on this observation, we propose a preference analysis method based on residual histograms to study the outlier consensus for outlier detection in this paper. We have found that the outlier consensus makes the outliers gather away from the inliers on the designed residual histogram preference space, which is quite convenient to separate outliers from inliers through linkage clustering. After the outliers are detected and removed, a linkage clustering with permutation preference is introduced to segment the inliers. In addition, in order to make the linkage clustering process stable and robust, an alternative sampling and clustering framework is proposed in both the outlier detection and inlier segmentation processes. The experimental results also show that the outlier detection scheme based on residual histogram preference can detect most of the outliers in the data sets, and the fitting results are better than most of the state-of-the-art methods in geometric multi-model fitting.
PubMed: 32471177
DOI: 10.3390/s20113037 -
Journal of Experimental Psychology.... Jan 2022When researchers choose to identify and exclude outliers from their data, should they do so across all the data, or within experimental conditions? A survey of recent...
When researchers choose to identify and exclude outliers from their data, should they do so across all the data, or within experimental conditions? A survey of recent papers published in the shows that both methods are widely used, and common data visualization techniques suggest that outliers should be excluded at the condition-level. However, I highlight in the present paper that removing outliers by condition runs against the logic of hypothesis testing, and that this practice leads to unacceptable increases in false-positive rates. I demonstrate that this conclusion holds true across a variety of statistical tests, exclusion criterion and cutoffs, sample sizes, and data types, and shows in simulated experiments and in a reanalysis of existing data that by-condition exclusions can result in false-positive rates as high as 43%. I finally demonstrate that by-condition exclusions are a specific case of a more general issue: Any outlier exclusion procedure that is not blind to the hypothesis that researchers want to test may result in inflated Type I errors. I conclude by offering best practices and recommendations for excluding outliers. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Topics: Data Interpretation, Statistical; Data Visualization; Humans; Research Design
PubMed: 34060886
DOI: 10.1037/xge0001069 -
PeerJ. Computer Science 2022Outliers are data points that significantly deviate from other data points in a data set because of different mechanisms or unusual processes. Outlier detection is one...
Outliers are data points that significantly deviate from other data points in a data set because of different mechanisms or unusual processes. Outlier detection is one of the intensively studied research topics for identification of novelties, frauds, anomalies, deviations or exceptions in addition to its use for data cleansing in data science. In this study, we propose two novel outlier detection approaches using the typicality degrees which are the partitioning result of unsupervised possibilistic clustering algorithms. The proposed approaches are based on finding the atypical data points below a predefined threshold value, a possibilistic level for evaluating a point as an outlier. The experiments on the synthetic and real data sets showed that the proposed approaches can be successfully used to detect outliers without considering the structure and distribution of the features in multidimensional data sets.
PubMed: 36262121
DOI: 10.7717/peerj-cs.1060 -
... IEEE International Conference on... Dec 2022Outlier detection is a fundamental data analytics technique often used for many security applications. Numerous outlier detection techniques exist, and in most cases are...
Outlier detection is a fundamental data analytics technique often used for many security applications. Numerous outlier detection techniques exist, and in most cases are used to directly identify outliers without any interaction. Typically the underlying data used is often high dimensional and complex. Even though outliers may be identified, since humans can easily grasp low dimensional spaces, it is difficult for a security expert to understand/visualize why a particular event or record has been identified as an outlier. In this paper we study the extent to which outlier detection techniques work in smaller dimensions and how well dimensional reduction techniques still enable accurate detection of outliers. This can help us to understand the extent to which data can be visualized while still retaining the intrinsic outlyingness of the outliers.
PubMed: 38094985
DOI: 10.1109/tps-isa56441.2022.00028 -
Methods in Molecular Biology (Clifton,... 2016Mass spectrometry data are often generated from various biological or chemical experiments. However, due to technical reasons, outlying observations are often obtained,...
Mass spectrometry data are often generated from various biological or chemical experiments. However, due to technical reasons, outlying observations are often obtained, some of which may be extreme. Identifying the causes of outlying observations is important in the analysis of replicated MS data because elaborate pre-processing is essential in order to obtain successful analyses with reliable results, and because manual outlier detection is a time-consuming pre-processing step. It is natural to measure the variability of observations using standard deviation or interquartile range calculations, and in this work, these criteria for identifying outliers are presented. However, the low replicability and the heterogeneity of variability are often obstacles to outlier detection. Therefore, quantile regression methods for identifying outliers with low replication are also presented. The procedures are illustrated with artificial and real examples, while a software program is introduced to demonstrate how to apply these procedures in the R environment system.
Topics: Mass Spectrometry; Regression Analysis; Reproducibility of Results
PubMed: 26519171
DOI: 10.1007/978-1-4939-3106-4_5 -
IEEE ... International Conference on... Jul 2022When it comes to observing and measuring human gait data for further analysis, determining whether the observed behavior is within the normal range of variability, or...
When it comes to observing and measuring human gait data for further analysis, determining whether the observed behavior is within the normal range of variability, or should be considered abnormal, is very challenging. Moreover, usually gait data are multivariate including motion capture, electromyography, force measurements, etc., each source having its own unique causes of irregularities and anomalies. This paper introduces a unique algorithm for outlier detection in periodic gait data using multiple sources and multiple procedures to improve the overall accuracy. The proposed algorithm's performance is evaluated using realistic synthetic gait data to gauge its accuracy to a truly objective known solution. It is shown that the proposed method is able to detect 91.2% of the true outliers in an extensive synthetic dataset, while only producing false positives at a rate of 0.1%, outperforming other procedures usually utilized in gait data outlier detection. The proposed method is a systematic way of removing outliers from gait data, with direct applications to human biomechanics, rehabilitation and robotics, and can be applied to other scientific fields dealing with periodic data.
Topics: Algorithms; Biomechanical Phenomena; Electromyography; Gait; Humans
PubMed: 36176090
DOI: 10.1109/ICORR55369.2022.9896411 -
Attention, Perception & Psychophysics Apr 2021It is known that the visual system can efficiently extract mean and variance information, facilitating the detection of outliers. However, no research to date has...
It is known that the visual system can efficiently extract mean and variance information, facilitating the detection of outliers. However, no research to date has directly investigated whether ensemble perception mechanisms contribute to outlier representation precision. We specifically were interested in how the distinctiveness of outliers impacts their precision. Across two experiments, we compared how accurately viewers represented the orientation of spatial outliers that varied in distinctiveness and found that increased outlier distinctiveness resulted in greater precision. Based on comparisons of our data to simulations reflecting particular selective strategies, we eliminated the possibility that participants were selectively processing the outlier, at the expense of the ensemble. Thus, we argued that participants separately represented distinct outliers along with ensemble summaries of the remaining items in a display. We also found that outlier distinctiveness moderated the precision of how the remaining items were summarized. We discuss these findings in relation to computational capacity and constraints of ensemble perception mechanisms.
Topics: Humans; Orientation; Orientation, Spatial; Perception
PubMed: 33728510
DOI: 10.3758/s13414-021-02270-9 -
Sensors (Basel, Switzerland) Apr 2020With the advent of unmanned aerial vehicles (UAVs), a major area of interest in the research field of UAVs has been vision-aided inertial navigation systems (V-INS). In...
With the advent of unmanned aerial vehicles (UAVs), a major area of interest in the research field of UAVs has been vision-aided inertial navigation systems (V-INS). In the front-end of V-INS, image processing extracts information about the surrounding environment and determines features or points of interest. With the extracted vision data and inertial measurement unit (IMU) dead reckoning, the most widely used algorithm for estimating vehicle and feature states in the back-end of V-INS is an extended Kalman filter (EKF). An important assumption of the EKF is Gaussian white noise. In fact, measurement outliers that arise in various realistic conditions are often non-Gaussian. A lack of compensation for unknown noise parameters often leads to a serious impact on the reliability and robustness of these navigation systems. To compensate for uncertainties of the outliers, we require modified versions of the estimator or the incorporation of other techniques into the filter. The main purpose of this paper is to develop accurate and robust V-INS for UAVs, in particular, those for situations pertaining to such unknown outliers. Feature correspondence in image processing front-end rejects vision outliers, and then a statistic test in filtering back-end detects the remaining outliers of the vision data. For frequent outliers occurrence, variational approximation for Bayesian inference derives a way to compute the optimal noise precision matrices of the measurement outliers. The overall process of outlier removal and adaptation is referred to here as "outlier-adaptive filtering". Even though almost all approaches of V-INS remove outliers by some method, few researchers have treated outlier adaptation in V-INS in much detail. Here, results from flight datasets validate the improved accuracy of V-INS employing the proposed outlier-adaptive filtering framework.
PubMed: 32260451
DOI: 10.3390/s20072036