-
Scientific Reports Feb 2023Outlier detection is an important topic in machine learning and has been used in a wide range of applications. Outliers are objects that are few in number and deviate...
Outlier detection is an important topic in machine learning and has been used in a wide range of applications. Outliers are objects that are few in number and deviate from the majority of objects. As a result of these two properties, we show that outliers are susceptible to a mechanism called fluctuation. This article proposes a method called fluctuation-based outlier detection (FBOD) that achieves a low linear time complexity and detects outliers purely based on the concept of fluctuation without employing any distance, density or isolation measure. Fundamentally different from all existing methods. FBOD first converts the Euclidean structure datasets into graphs by using random links, then propagates the feature value according to the connection of the graph. Finally, by comparing the difference between the fluctuation of an object and its neighbors, FBOD determines the object with a larger difference as an outlier. The results of experiments comparing FBOD with eight state-of-the-art algorithms on eight real-worlds tabular datasets and three video datasets show that FBOD outperforms its competitors in the majority of cases and that FBOD has only 5% of the execution time of the fastest algorithm. The experiment codes are available at: https://github.com/FluctuationOD/Fluctuation-based-Outlier-Detection .
PubMed: 36765095
DOI: 10.1038/s41598-023-29549-1 -
The VLDB Journal : Very Large Data... 2022While many techniques for outlier detection have been proposed in the literature, the interpretation of detected outliers is often left to users. As a result, it is...
While many techniques for outlier detection have been proposed in the literature, the interpretation of detected outliers is often left to users. As a result, it is difficult for users to promptly take appropriate actions concerning the detected outliers. To lessen this difficulty, when outliers are identified, they should be presented together with their explanations. There are survey papers on outlier detection, but none exists for outlier explanations. To fill this gap, in this paper, we present a survey on outlier explanations in which meaningful knowledge is mined from anomalous data to explain them. We define different types of outlier explanations and discuss the challenges in generating each type. We review the existing outlier explanation techniques and discuss how they address the challenges. We also discuss the applications of outlier explanations and review the existing methods used to evaluate outlier explanations. Furthermore, we discuss possible future research directions.
PubMed: 35095253
DOI: 10.1007/s00778-021-00721-1 -
Genomics Jan 2021The ΔΔct method estimates fold change in gene expression data from RT-PCR assay. The ΔΔct estimate aggregates replicates using mean and standard deviation (sd) and...
The ΔΔct method estimates fold change in gene expression data from RT-PCR assay. The ΔΔct estimate aggregates replicates using mean and standard deviation (sd) and is not robust to outliers which are in practice often removed before the non-outlying replicates are aggregated. The alternative of using robust statistics such as median and median absolute deviation (MAD) to aggregate the replicates is not done in practice perhaps because the distribution of a robust ΔΔct estimate based on median and MAD is not straightforward to deduce. We introduce a robust ΔΔct estimate and deduce an approximate distribution for it. Simulations show that when data has outliers, the robust ΔΔct estimate compared to the non-robust ΔΔct estimate leads to significantly reduced confidence interval length and a coverage close to the nominal coverage. The analysis of an RT-PCR data from a Novartis clinical trial demonstrates benefit of a robust ΔΔct estimate.
Topics: Algorithms; Biomarkers, Tumor; Clinical Trials as Topic; Gene Expression Profiling; Humans; Real-Time Polymerase Chain Reaction; Reference Standards
PubMed: 33309766
DOI: 10.1016/j.ygeno.2020.12.009 -
Entropy (Basel, Switzerland) Jun 2022In this article, we evaluate the efficiency and performance of two clustering algorithms: AHC (Agglomerative Hierarchical Clustering) and K-Means. We are aware that...
In this article, we evaluate the efficiency and performance of two clustering algorithms: AHC (Agglomerative Hierarchical Clustering) and K-Means. We are aware that there are various linkage options and distance measures that influence the clustering results. We assess the quality of clustering using the Davies-Bouldin and Dunn cluster validity indexes. The main contribution of this research is to verify whether the quality of clusters without outliers is higher than those with outliers in the data. To do this, we compare and analyze outlier detection algorithms depending on the applied clustering algorithm. In our research, we use and compare the LOF (Local Outlier Factor) and COF (Connectivity-based Outlier Factor) algorithms for detecting outliers before and after removing 1%, 5%, and 10% of outliers. Next, we analyze how the quality of clustering has improved. In the experiments, three real data sets were used with a different number of instances.
PubMed: 35885141
DOI: 10.3390/e24070917 -
Magnetic Resonance in Medicine Sep 2024To present and assess an outlier mitigation method that makes free-running volumetric cardiovascular MRI (CMR) more robust to motion.
PURPOSE
To present and assess an outlier mitigation method that makes free-running volumetric cardiovascular MRI (CMR) more robust to motion.
METHODS
The proposed method, called compressive recovery with outlier rejection (CORe), models outliers in the measured data as an additive auxiliary variable. We enforce MR physics-guided group sparsity on the auxiliary variable, and jointly estimate it along with the image using an iterative algorithm. For evaluation, CORe is first compared to traditional compressed sensing (CS), robust regression (RR), and an existing outlier rejection method using two simulation studies. Then, CORe is compared to CS using seven three-dimensional (3D) cine, 12 rest four-dimensional (4D) flow, and eight stress 4D flow imaging datasets.
RESULTS
Our simulation studies show that CORe outperforms CS, RR, and the existing outlier rejection method in terms of normalized mean square error and structural similarity index across 55 different realizations. The expert reader evaluation of 3D cine images demonstrates that CORe is more effective in suppressing artifacts while maintaining or improving image sharpness. Finally, 4D flow images show that CORe yields more reliable and consistent flow measurements, especially in the presence of involuntary subject motion or exercise stress.
CONCLUSION
An outlier rejection method is presented and tested using simulated and measured data. This method can help suppress motion artifacts in a wide range of free-running CMR applications.
Topics: Humans; Algorithms; Imaging, Three-Dimensional; Magnetic Resonance Imaging, Cine; Artifacts; Computer Simulation; Motion; Image Processing, Computer-Assisted; Magnetic Resonance Imaging; Image Interpretation, Computer-Assisted; Reproducibility of Results; Heart
PubMed: 38733066
DOI: 10.1002/mrm.30123 -
Journal of the Royal Statistical... Apr 2022We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called...
We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set () as a subset of class labels, possibly empty. It tries to optimize the out-of-sample performance, aiming to include the correct class and to detect outliers as often as possible. BCOPS returns no prediction (corresponding to () equal to the empty set) if it infers to be an outlier. The proposed method combines supervised learning algorithms with conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given procedure. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.
PubMed: 35910400
DOI: 10.1111/rssb.12443 -
Entropy (Basel, Switzerland) Oct 2022One of the consequences of the big data revolution is that data are more heterogeneous than ever. A new challenge appears when mixed-type data sets evolve over time and...
One of the consequences of the big data revolution is that data are more heterogeneous than ever. A new challenge appears when mixed-type data sets evolve over time and we are interested in the comparison among individuals. In this work, we propose a new protocol that integrates robust distances and visualization techniques for dynamic mixed data. In particular, given a time t∈T={1,2,…,N}, we start by measuring the proximity of individuals in heterogeneous data by means of a robustified version of Gower's metric (proposed by the authors in a previous work) yielding to a collection of distance matrices {D(t),∀t∈T}. To monitor the evolution of distances and outlier detection over time, we propose several graphical tools: First, we track the evolution of pairwise distances via line graphs; second, a dynamic box plot is obtained to identify individuals which showed minimum or maximum disparities; third, to visualize individuals that are systematically far from the others and detect potential outliers, we use the proximity plots, which are line graphs based on a proximity function computed on {D(t),∀t∈T}; fourth, the evolution of the inter-distances between individuals is analyzed via dynamic multiple multidimensional scaling maps. These visualization tools were implemented in the Shinny application in R, and the methodology is illustrated on a real data set related to COVID-19 healthcare, policy and restriction measures about the 2020-2021 COVID-19 pandemic across EU Member States.
PubMed: 37420419
DOI: 10.3390/e24101399 -
Frontiers in Nutrition 2023Observational studies suggest that vitamin D supplementation may be effective in preventing myasthenia gravis (MG). However, the causal relationship between circulating...
INTRODUCTION
Observational studies suggest that vitamin D supplementation may be effective in preventing myasthenia gravis (MG). However, the causal relationship between circulating vitamin D levels and MG remains unclear. This study aimed to examine the genetic causality of circulating vitamin D and MG using data from large population-based genome-wide association studies (GWAS).
METHODS
SNPs (single nucleotide polymorphisms) strongly associated with exposure were selected. Two-sample Mendelian Randomization (MR) was performed with inverse variance weighting (IVW), MR-Egger (Mendelian randomization-Egger), weight median and MR-PRESSO (Mendelian randomization pleiotropy residual sum and outlier) methods. Heterogeneity was tested via IVW and MR-Egger. Pleiotropy was tested using MR-Egger intercept test and MR-PRESSO method. MR-PRESSO was also used to detect outliers. Leave-one-out analysis was used to identify SNPs with potential effect. Reverse MR analysis was also performed.
RESULT
In IVW, circulating vitamin D levels had no causal effect on MG [OR = 0.91 (0.67-1.22), = 0.532] and MG had no causal effect on circulating vitamin D [OR = 1.01 (099-1.02), = 0.663]. No heterogeneity or pleiotropy was observed ( > 0.05). Other MR methods also agreed with IVW results.
CONCLUSION
This study provides the causal relationship between genetically predicted circulating vitamin D levels and MG and provides new insights into the genetics of MG.
PubMed: 37538922
DOI: 10.3389/fnut.2023.1171830 -
Frontiers in Digital Health 2023This paper compares three finite element-based methods used in a physics-based non-rigid registration approach and reports on the progress made over the last 15 years.... (Review)
Review
This paper compares three finite element-based methods used in a physics-based non-rigid registration approach and reports on the progress made over the last 15 years. Large brain shifts caused by brain tumor removal affect registration accuracy by creating point and element outliers. A combination of approximation- and geometry-based point and element outlier rejection improves the rigid registration error by 2.5 mm and meets the real-time constraints (4 min). In addition, the paper raises several questions and presents two open problems for the robust estimation and improvement of registration error in the presence of outliers due to sparse, noisy, and incomplete data. It concludes with preliminary results on leveraging Quantum Computing, a promising new technology for computationally intensive problems like Feature Detection and Block Matching in addition to finite element solver; all three account for 75% of computing time in deformable registration.
PubMed: 38144260
DOI: 10.3389/fdgth.2023.1283726 -
BMC Medical Research Methodology Oct 2023Growth studies rely on longitudinal measurements, typically represented as trajectories. However, anthropometry is prone to errors that can generate outliers. While...
BACKGROUND
Growth studies rely on longitudinal measurements, typically represented as trajectories. However, anthropometry is prone to errors that can generate outliers. While various methods are available for detecting outlier measurements, a gold standard has yet to be identified, and there is no established method for outlying trajectories. Thus, outlier types and their effects on growth pattern detection still need to be investigated. This work aimed to assess the performance of six methods at detecting different types of outliers, propose two novel methods for outlier trajectory detection and evaluate how outliers affect growth pattern detection.
METHODS
We included 393 healthy infants from The Applied Research Group for Kids (TARGet Kids!) cohort and 1651 children with severe malnutrition from the co-trimoxazole prophylaxis clinical trial. We injected outliers of three types and six intensities and applied four outlier detection methods for measurements (model-based and World Health Organization cut-offs-based) and two for trajectories. We also assessed growth pattern detection before and after outlier injection using time series clustering and latent class mixed models. Error type, intensity, and population affected method performance.
RESULTS
Model-based outlier detection methods performed best for measurements with precision between 5.72-99.89%, especially for low and moderate error intensities. The clustering-based outlier trajectory method had high precision of 14.93-99.12%. Combining methods improved the detection rate to 21.82% in outlier measurements. Finally, when comparing growth groups with and without outliers, the outliers were shown to alter group membership by 57.9 -79.04%.
CONCLUSIONS
World Health Organization cut-off-based techniques were shown to perform well in few very particular cases (extreme errors of high intensity), while model-based techniques performed well, especially for moderate errors of low intensity. Clustering-based outlier trajectory detection performed exceptionally well across all types and intensities of errors, indicating a potential strategic change in how outliers in growth data are viewed. Finally, the importance of detecting outliers was shown, given its impact on children growth studies, as demonstrated by comparing results of growth group detection.
Topics: Child; Humans; Cluster Analysis; Research Design; Infant; Child Development
PubMed: 37833647
DOI: 10.1186/s12874-023-02045-w