-
PloS One 2023This study proposes a robust outlier detection method based on the circular median for non-parametric linear-circular regression in case the response variable includes...
This study proposes a robust outlier detection method based on the circular median for non-parametric linear-circular regression in case the response variable includes outlier(s) and the residuals are Wrapped-Cauchy distributed. Nadaraya-Watson and local linear regression methods were employed to obtain non-parametric regression fits. The proposed method's performance was investigated by using a real dataset and a comprehensive simulation study with different sample sizes, contamination, and heterogeneity degrees. The method performs quite well in medium and higher contamination degrees, and its performance increases as the sample size and the homogeneity of data increase. In addition, when the response variable of linear-circular regression contains outliers, the Local Linear Estimation method fits the data set better than the Nadaraya Watson method.
Topics: Humans; Linear Models; Computer Simulation; Drug Contamination; Sample Size; Seizures
PubMed: 37307265
DOI: 10.1371/journal.pone.0286448 -
IEEE Transactions on Image Processing :... 2022In this paper, we propose a novel multi-scale attention based network (called MSA-Net) for feature matching problems. Current deep networks based feature matching...
In this paper, we propose a novel multi-scale attention based network (called MSA-Net) for feature matching problems. Current deep networks based feature matching methods suffer from limited effectiveness and robustness when applied to different scenarios, due to random distributions of outliers and insufficient information learning. To address this issue, we propose a multi-scale attention block to enhance the robustness to outliers, for improving the representational ability of the feature map. In addition, we also design a novel context channel refine block and a context spatial refine block to mine the information context with less parameters along channel and spatial dimensions, respectively. The proposed MSA-Net is able to effectively infer the probability of correspondences being inliers with less parameters. Extensive experiments on outlier removal and relative pose estimation have shown the performance improvements of our network over current state-of-the-art methods with less parameters on both outdoor and indoor datasets. Notably, our proposed network achieves an 11.7% improvement at error threshold 5° without RANSAC than the state-of-the-art method on relative pose estimation task when trained on YFCC100M dataset.
PubMed: 35776808
DOI: 10.1109/TIP.2022.3186535 -
Analytical Methods : Advancing Methods... Aug 2023Raman spectroscopy is a promising diagnostic tool for brain gliomas, owing to its non-invasive and high information density properties. However, identifying patterns in...
Raman spectroscopy is a promising diagnostic tool for brain gliomas, owing to its non-invasive and high information density properties. However, identifying patterns in glioma cancer tissue and healthy tissue in the brain is challenging, and outlier spectra resulting from operator error or changes in external conditions can compromise the model's robustness and generalizability to new data. Given the heterogeneity of glioma tissue, the within-group variance of data obtained by a portable Raman spectrometer is relatively high, and inconsistencies in instrument repeatability and experimental conditions can lead to an incompact distribution of non-outlier points, complicating outlier detection. Strict outlier criteria may result in the deletion of non-outlier points, leading to reduced sample utilization. To address these issues, we propose the SPCN outlier detection algorithm, which segments and prunes a competitive network to extract global outlier features, identifies topological errors, and divides initial outlier domains using the α-β region segmentation method. The algorithm also proposes a two-stage pruning method based on the characteristics of the manifold map and visualizes the outlier measure using a normalized histogram. Compared to traditional methods, SPCN is label-free and does not require an estimation of outlier distance threshold or data distribution density. We compared the accuracy of six outlier detection algorithms using Raman spectra collected from brain glioma tissues of 113 patients and examined changes in pattern recognition accuracy after removing the outliers, confirming the precision and robustness of SPCN. This method has the potential to enhance the accuracy and reliability of glioma diagnosis Raman spectroscopy and can also be applied to outlier detection in other spectra such as near infrared and middle infrared.
Topics: Humans; Spectrum Analysis, Raman; Reproducibility of Results; Glioma; Algorithms; Brain
PubMed: 37489762
DOI: 10.1039/d3ay00748k -
BioData Mining Sep 2023There are not currently any univariate outlier detection algorithms that transform and model arbitrarily shaped distributions to remove univariate outliers. Some...
There are not currently any univariate outlier detection algorithms that transform and model arbitrarily shaped distributions to remove univariate outliers. Some algorithms model skew, even fewer model kurtosis, and none of them model bimodality and monotonicity. To overcome these challenges, we have implemented an algorithm for Skew and Tail-heaviness Adjusted Removal of Outliers (STAR_outliers) that robustly removes univariate outliers from distributions with many different shape profiles, including extreme skew, extreme kurtosis, bimodality, and monotonicity. We show that STAR_outliers removes simulated outliers with greater recall and precision than several general algorithms, and it also models the outlier bounds of real data distributions with greater accuracy.Background Reliably removing univariate outliers from arbitrarily shaped distributions is a difficult task. Incorrectly assuming unimodality or overestimating tail heaviness fails to remove outliers, while underestimating tail heaviness incorrectly removes regular data from the tails. Skew often produces one heavy tail and one light tail, and we show that several sophisticated outlier removal algorithms often fail to remove outliers from the light tail. Multivariate outlier detection algorithms have recently become popular, but having tested PyOD's multivariate outlier removal algorithms, we found them to be inadequate for univariate outlier removal. They usually do not allow for univariate input, and they do not fit their distributions of outliership scores with a model on which an outlier threshold can be accurately established. Thus, there is a need for a flexible outlier removal algorithm that can model arbitrarily shaped univariate distributions.Results In order to effectively model arbitrarily shaped univariate distributions, we have combined several well-established algorithms into a new algorithm called STAR_outliers. STAR_outliers removes more simulated true outliers and fewer non-outliers than several other univariate algorithms. These include several normality-assuming outlier removal methods, PyOD's isolation forest (IF) outlier removal algorithm (ACM Transactions on Knowledge Discovery from Data (TKDD) 6:3, 2012) with default settings, and an IQR based algorithm by Verardi and Vermandele that removes outliers while accounting for skew and kurtosis (Verardi and Vermandele, Journal de la Société Française de Statistique 157:90-114, 2016). Since the IF algorithm's default model poorly fit the outliership scores, we also compared the isolation forest algorithm with a model that entails removing as many datapoints as STAR_outliers does in order of decreasing outliership scores. We also compared these algorithms on the publicly available 2018 National Health and Nutrition Examination Survey (NHANES) data by setting the outlier threshold to keep values falling within the main 99.3 percent of the fitted model's domain. We show that our STAR_outliers algorithm removes significantly closer to 0.7 percent of values from these features than other outlier removal methods on average.Conclusions STAR_outliers is an easily implemented python package for removing outliers that outperforms multiple commonly used methods of univariate outlier removal.
PubMed: 37667378
DOI: 10.1186/s13040-023-00342-0 -
Journal of Nutrition Education and... Jan 2021The goal of this study was to explore the impact of 5 decision rules for removing outliers from adolescent food frequency questionnaire (FFQ) data.
OBJECTIVE
The goal of this study was to explore the impact of 5 decision rules for removing outliers from adolescent food frequency questionnaire (FFQ) data.
DESIGN
This secondary analysis used baseline and 3-month data from a weight loss intervention clinical trial.
PARTICIPANTS
African American adolescents (n = 181) were recruited from outpatient clinics and community health fairs.
VARIABLES MEASURED
Data collected included self-reported FFQ and mediators of weight (food addiction, depressive symptoms, and relative reinforcing value of food), caregiver-reported executive functioning, and objectively measured weight status (percentage overweight).
ANALYSIS
Descriptive statistics examined patterns in study variables at baseline and follow-up. Correlational analyses explored the relationships between FFQ data and key study variables at baseline and follow-up.
RESULTS
Compared with not removing outliers, using decision rules reduced the number of cases and restricted the range of data. The magnitude of baseline FFQ-mediator relationships was attenuated under all decision rules but varied (increasing, decreasing, and reversing direction) at follow-up. Decision rule use increased the magnitude of change in FFQ estimated energy intake and significantly strengthened its relationship with weight change under 2 fixed range decision rules.
CONCLUSIONS AND IMPLICATIONS
Results suggest careful evaluation of outliers and testing and reporting the effects of different outlier decision rules through sensitivity analyses.
Topics: Adolescent; Diet; Diet Records; Diet Surveys; Energy Intake; Female; Humans; Male; Motivation; Reproducibility of Results; Surveys and Questionnaires
PubMed: 33012663
DOI: 10.1016/j.jneb.2020.08.002 -
BMC Health Services Research Sep 2021As healthcare systems strive for efficiency, hospital "length of stay outliers" have the potential to significantly impact a hospital's overall utilization. There is a...
BACKGROUND
As healthcare systems strive for efficiency, hospital "length of stay outliers" have the potential to significantly impact a hospital's overall utilization. There is a tendency to exclude such "outlier" stays in local quality improvement and data reporting due to their assumed rare occurrence and disproportionate ability to skew mean and other summary data. This study sought to assess the influence of length of stay (LOS) outliers on inpatient length of stay and hospital capacity over a 5-year period at a large urban academic medical center.
METHODS
From January 2014 through December 2019, 169,645 consecutive inpatient cases were analyzed and assigned an expected LOS based on national academic center benchmarks. Cases in the top 1% of national sample LOS by diagnosis were flagged as length of stay outliers.
RESULTS
From 2014 to 2019, mean outlier LOS increased (40.98 to 45.11 days), as did inpatient LOS with outliers excluded (5.63 to 6.19 days). Outlier cases increased both in number (from 297 to 412) and as a percent of total discharges (0.98 to 1.56%), and outlier patient days increased from 6.7 to 9.8% of total inpatient plus observation days over the study period.
CONCLUSIONS
Outlier cases utilize a disproportionate and increasing share of hospital resources and available beds. The current tendency to exclude such outlier stays in data reporting due to assumed rare occurrence may need to be revisited. Outlier stays require distinct and targeted interventions to appropriately reduce length of stay to both improve patient care and maintain hospital capacity.
Topics: Hospitals, Urban; Humans; Length of Stay; Quality Improvement; Retrospective Studies
PubMed: 34503494
DOI: 10.1186/s12913-021-06972-6 -
Sensors (Basel, Switzerland) Apr 2020With the advent of unmanned aerial vehicles (UAVs), a major area of interest in the research field of UAVs has been vision-aided inertial navigation systems (V-INS). In...
With the advent of unmanned aerial vehicles (UAVs), a major area of interest in the research field of UAVs has been vision-aided inertial navigation systems (V-INS). In the front-end of V-INS, image processing extracts information about the surrounding environment and determines features or points of interest. With the extracted vision data and inertial measurement unit (IMU) dead reckoning, the most widely used algorithm for estimating vehicle and feature states in the back-end of V-INS is an extended Kalman filter (EKF). An important assumption of the EKF is Gaussian white noise. In fact, measurement outliers that arise in various realistic conditions are often non-Gaussian. A lack of compensation for unknown noise parameters often leads to a serious impact on the reliability and robustness of these navigation systems. To compensate for uncertainties of the outliers, we require modified versions of the estimator or the incorporation of other techniques into the filter. The main purpose of this paper is to develop accurate and robust V-INS for UAVs, in particular, those for situations pertaining to such unknown outliers. Feature correspondence in image processing front-end rejects vision outliers, and then a statistic test in filtering back-end detects the remaining outliers of the vision data. For frequent outliers occurrence, variational approximation for Bayesian inference derives a way to compute the optimal noise precision matrices of the measurement outliers. The overall process of outlier removal and adaptation is referred to here as "outlier-adaptive filtering". Even though almost all approaches of V-INS remove outliers by some method, few researchers have treated outlier adaptation in V-INS in much detail. Here, results from flight datasets validate the improved accuracy of V-INS employing the proposed outlier-adaptive filtering framework.
PubMed: 32260451
DOI: 10.3390/s20072036 -
Big Data Oct 2023Anomaly detection is crucial in a variety of domains, such as fraud detection, disease diagnosis, and equipment defect detection. With the development of deep learning,...
Anomaly detection is crucial in a variety of domains, such as fraud detection, disease diagnosis, and equipment defect detection. With the development of deep learning, anomaly detection with Bayesian neural networks (BNNs) becomes a novel research topic in recent years. This article aims to propose a widely applicable method of outlier detection (a category of anomaly detection) using BNNs based on uncertainty measurement. There are three kinds of uncertainties generated in the prediction of BNNs: epistemic uncertainty, aleatoric uncertainty, and (model) misspecification uncertainty. Although the approaches in previous studies are adopted to measure epistemic and aleatoric uncertainty, a new method of utilizing loss functions to quantify misspecification uncertainty is proposed in this article. Then, these three uncertainty sources are merged together by specific combination models to construct total prediction uncertainty. In this study, the key idea is that the observations with high total prediction uncertainty should correspond to outliers in the data. The method of this research is applied to the experiments on Modified National Institute of Standards and Technology (MNIST) dataset and Taxi dataset, respectively. From the results, if the network is appropriately constructed and well-trained and model parameters are carefully tuned, most anomalous images in MNIST dataset and all the abnormal traffic periods in Taxi dataset can be nicely detected. In addition, the performance of this method is compared with the BNN anomaly detection methods proposed before and the classical Local Outlier Factor and Density-Based Spatial Clustering of Applications with Noise methods. This study links the classification of uncertainties in essence with anomaly detection and takes the lead to consider combining different uncertainty sources to reform detection outcomes instead of using only single uncertainty each time.
Topics: Bayes Theorem; Fraud; Neural Networks, Computer; Spatial Analysis; Deep Learning
PubMed: 36706252
DOI: 10.1089/big.2021.0343 -
IEEE Transactions on Image Processing :... 2022When neural networks are employed for high-stakes decision-making, it is desirable that they provide explanations for their prediction in order for us to understand the...
When neural networks are employed for high-stakes decision-making, it is desirable that they provide explanations for their prediction in order for us to understand the features that have contributed to the decision. At the same time, it is important to flag potential outliers for in-depth verification by domain experts. In this work we propose to unify two differing aspects of explainability with outlier detection. We argue for a broader adoption of prototype-based student networks capable of providing an example-based explanation for their prediction and at the same time identify regions of similarity between the predicted sample and the examples. The examples are real prototypical cases sampled from the training set via a novel iterative prototype replacement algorithm. Furthermore, we propose to use the prototype similarity scores for identifying outliers. We compare performance in terms of the classification, explanation quality and outlier detection of our proposed network with baselines. We show that our prototype-based networks extending beyond similarity kernels deliver meaningful explanations and promising outlier detection results without compromising classification accuracy.
PubMed: 34793299
DOI: 10.1109/TIP.2021.3127847 -
Knowledge-based Systems Feb 2022The presence of outliers can severely degrade learned representations and performance of deep learning methods and hence disproportionately affect the training process,...
The presence of outliers can severely degrade learned representations and performance of deep learning methods and hence disproportionately affect the training process, leading to incorrect conclusions about the data. For example, anomaly detection using deep generative models is typically only possible when similar anomalies (or outliers) are not present in the training data. Here we focus on variational autoencoders (VAEs). While the VAE is a popular framework for anomaly detection tasks, we observe that the VAE is unable to detect outliers when the training data contains anomalies that have the same distribution as those in test data. In this paper we focus on robustness to outliers in training data in VAE settings using concepts from robust statistics. We propose a variational lower bound that leads to a robust VAE model that has the same computational complexity as the standard VAE and contains a single automatically-adjusted tuning parameter to control the degree of robustness. We present mathematical formulations for robust variational autoencoders (RVAEs) for Bernoulli, Gaussian and categorical variables. The RVAE model is based on beta-divergence rather than the standard Kullback-Leibler (KL) divergence. We demonstrate the performance of our proposed -divergence-based autoencoder for a variety of image and categorical datasets showing improved robustness to outliers both qualitatively and quantitatively. We also illustrate the use of our robust VAE for detection of lesions in brain images, formulated as an anomaly detection task. Finally, we suggest a method to tune the hyperparameter of RVAE which makes our model completely unsupervised.
PubMed: 36714396
DOI: 10.1016/j.knosys.2021.107886