outlier - OpenMD.com Journal Search

Fluctuation-based outlier detection.

Scientific Reports Feb 2023

Outlier detection is an important topic in machine learning and has been used in a wide range of applications. Outliers are objects that are few in number and deviate...

Summary PubMed Full Text PDF

Authors: Xusheng Du, Enguang Zuo, Zheng Chu...

Outlier detection is an important topic in machine learning and has been used in a wide range of applications. Outliers are objects that are few in number and deviate from the majority of objects. As a result of these two properties, we show that outliers are susceptible to a mechanism called fluctuation. This article proposes a method called fluctuation-based outlier detection (FBOD) that achieves a low linear time complexity and detects outliers purely based on the concept of fluctuation without employing any distance, density or isolation measure. Fundamentally different from all existing methods. FBOD first converts the Euclidean structure datasets into graphs by using random links, then propagates the feature value according to the connection of the graph. Finally, by comparing the difference between the fluctuation of an object and its neighbors, FBOD determines the object with a larger difference as an outlier. The results of experiments comparing FBOD with eight state-of-the-art algorithms on eight real-worlds tabular datasets and three video datasets show that FBOD outperforms its competitors in the majority of cases and that FBOD has only 5% of the execution time of the fastest algorithm. The experiment codes are available at: https://github.com/FluctuationOD/Fluctuation-based-Outlier-Detection .

PubMed: 36765095
DOI: 10.1038/s41598-023-29549-1

A survey on outlier explanations.

The VLDB Journal : Very Large Data... 2022

While many techniques for outlier detection have been proposed in the literature, the interpretation of detected outliers is often left to users. As a result, it is...

Summary PubMed Full Text PDF

Authors: Egawati Panjei, Le Gruenwald, Eleazar Leal...

While many techniques for outlier detection have been proposed in the literature, the interpretation of detected outliers is often left to users. As a result, it is difficult for users to promptly take appropriate actions concerning the detected outliers. To lessen this difficulty, when outliers are identified, they should be presented together with their explanations. There are survey papers on outlier detection, but none exists for outlier explanations. To fill this gap, in this paper, we present a survey on outlier explanations in which meaningful knowledge is mined from anomalous data to explain them. We define different types of outlier explanations and discuss the challenges in generating each type. We review the existing outlier explanation techniques and discuss how they address the challenges. We also discuss the applications of outlier explanations and review the existing methods used to evaluate outlier explanations. Furthermore, we discuss possible future research directions.

PubMed: 35095253
DOI: 10.1007/s00778-021-00721-1

Robust ΔΔct estimate.

Genomics Jan 2021

The ΔΔct method estimates fold change in gene expression data from RT-PCR assay. The ΔΔct estimate aggregates replicates using mean and standard deviation (sd) and...

Summary PubMed

Authors: Arun Kumar, Daniel Lorand

The ΔΔct method estimates fold change in gene expression data from RT-PCR assay. The ΔΔct estimate aggregates replicates using mean and standard deviation (sd) and is not robust to outliers which are in practice often removed before the non-outlying replicates are aggregated. The alternative of using robust statistics such as median and median absolute deviation (MAD) to aggregate the replicates is not done in practice perhaps because the distribution of a robust ΔΔct estimate based on median and MAD is not straightforward to deduce. We introduce a robust ΔΔct estimate and deduce an approximate distribution for it. Simulations show that when data has outliers, the robust ΔΔct estimate compared to the non-robust ΔΔct estimate leads to significantly reduced confidence interval length and a coverage close to the nominal coverage. The analysis of an RT-PCR data from a Novartis clinical trial demonstrates benefit of a robust ΔΔct estimate.

Topics: Algorithms; Biomarkers, Tumor; Clinical Trials as Topic; Gene Expression Profiling; Humans; Real-Time Polymerase Chain Reaction; Reference Standards

PubMed: 33309766
DOI: 10.1016/j.ygeno.2020.12.009

How the Outliers Influence the Quality of Clustering?

Entropy (Basel, Switzerland) Jun 2022

In this article, we evaluate the efficiency and performance of two clustering algorithms: AHC (Agglomerative Hierarchical Clustering) and K-Means. We are aware that...

Summary PubMed Full Text PDF

Authors: Agnieszka Nowak-Brzezińska, Igor Gaibei

In this article, we evaluate the efficiency and performance of two clustering algorithms: AHC (Agglomerative Hierarchical Clustering) and K-Means. We are aware that there are various linkage options and distance measures that influence the clustering results. We assess the quality of clustering using the Davies-Bouldin and Dunn cluster validity indexes. The main contribution of this research is to verify whether the quality of clusters without outliers is higher than those with outliers in the data. To do this, we compare and analyze outlier detection algorithms depending on the applied clustering algorithm. In our research, we use and compare the LOF (Local Outlier Factor) and COF (Connectivity-based Outlier Factor) algorithms for detecting outliers before and after removing 1%, 5%, and 10% of outliers. Next, we analyze how the quality of clustering has improved. In the experiments, three real data sets were used with a different number of instances.

PubMed: 35885141
DOI: 10.3390/e24070917

Motion-robust free-running volumetric cardiovascular MRI.

Magnetic Resonance in Medicine Sep 2024

To present and assess an outlier mitigation method that makes free-running volumetric cardiovascular MRI (CMR) more robust to motion.

Summary PubMed Full Text PDF

Authors: Syed M Arshad, Lee C Potter, Chong Chen...

PURPOSE

To present and assess an outlier mitigation method that makes free-running volumetric cardiovascular MRI (CMR) more robust to motion.

METHODS

The proposed method, called compressive recovery with outlier rejection (CORe), models outliers in the measured data as an additive auxiliary variable. We enforce MR physics-guided group sparsity on the auxiliary variable, and jointly estimate it along with the image using an iterative algorithm. For evaluation, CORe is first compared to traditional compressed sensing (CS), robust regression (RR), and an existing outlier rejection method using two simulation studies. Then, CORe is compared to CS using seven three-dimensional (3D) cine, 12 rest four-dimensional (4D) flow, and eight stress 4D flow imaging datasets.

RESULTS

Our simulation studies show that CORe outperforms CS, RR, and the existing outlier rejection method in terms of normalized mean square error and structural similarity index across 55 different realizations. The expert reader evaluation of 3D cine images demonstrates that CORe is more effective in suppressing artifacts while maintaining or improving image sharpness. Finally, 4D flow images show that CORe yields more reliable and consistent flow measurements, especially in the presence of involuntary subject motion or exercise stress.

CONCLUSION

An outlier rejection method is presented and tested using simulated and measured data. This method can help suppress motion artifacts in a wide range of free-running CMR applications.

Topics: Humans; Algorithms; Imaging, Three-Dimensional; Magnetic Resonance Imaging, Cine; Artifacts; Computer Simulation; Motion; Image Processing, Computer-Assisted; Magnetic Resonance Imaging; Image Interpretation, Computer-Assisted; Reproducibility of Results; Heart

PubMed: 38733066
DOI: 10.1002/mrm.30123

Prediction and outlier detection in classification problems.

Journal of the Royal Statistical... Apr 2022

We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called...

Summary PubMed Full Text PDF

Authors: Leying Guan, Robert Tibshirani

We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set () as a subset of class labels, possibly empty. It tries to optimize the out-of-sample performance, aiming to include the correct class and to detect outliers as often as possible. BCOPS returns no prediction (corresponding to () equal to the empty set) if it infers to be an outlier. The proposed method combines supervised learning algorithms with conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given procedure. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.

PubMed: 35910400
DOI: 10.1111/rssb.12443

Dynamic Mixed Data Analysis and Visualization.

Entropy (Basel, Switzerland) Oct 2022

One of the consequences of the big data revolution is that data are more heterogeneous than ever. A new challenge appears when mixed-type data sets evolve over time and...

Summary PubMed Full Text PDF

Authors: Aurea Grané, Giancarlo Manzi, Silvia Salini...

One of the consequences of the big data revolution is that data are more heterogeneous than ever. A new challenge appears when mixed-type data sets evolve over time and we are interested in the comparison among individuals. In this work, we propose a new protocol that integrates robust distances and visualization techniques for dynamic mixed data. In particular, given a time t∈T={1,2,…,N}, we start by measuring the proximity of individuals in heterogeneous data by means of a robustified version of Gower's metric (proposed by the authors in a previous work) yielding to a collection of distance matrices {D(t),∀t∈T}. To monitor the evolution of distances and outlier detection over time, we propose several graphical tools: First, we track the evolution of pairwise distances via line graphs; second, a dynamic box plot is obtained to identify individuals which showed minimum or maximum disparities; third, to visualize individuals that are systematically far from the others and detect potential outliers, we use the proximity plots, which are line graphs based on a proximity function computed on {D(t),∀t∈T}; fourth, the evolution of the inter-distances between individuals is analyzed via dynamic multiple multidimensional scaling maps. These visualization tools were implemented in the Shinny application in R, and the methodology is illustrated on a real data set related to COVID-19 healthcare, policy and restriction measures about the 2020-2021 COVID-19 pandemic across EU Member States.

PubMed: 37420419
DOI: 10.3390/e24101399

Causal effect of vitamin D on myasthenia gravis: a two-sample Mendelian randomization study.

Frontiers in Nutrition 2023

Observational studies suggest that vitamin D supplementation may be effective in preventing myasthenia gravis (MG). However, the causal relationship between circulating...

Summary PubMed Full Text PDF

Authors: Yidan Fan, Huaiying Huang, Xiangda Chen...

INTRODUCTION

Observational studies suggest that vitamin D supplementation may be effective in preventing myasthenia gravis (MG). However, the causal relationship between circulating vitamin D levels and MG remains unclear. This study aimed to examine the genetic causality of circulating vitamin D and MG using data from large population-based genome-wide association studies (GWAS).

METHODS

SNPs (single nucleotide polymorphisms) strongly associated with exposure were selected. Two-sample Mendelian Randomization (MR) was performed with inverse variance weighting (IVW), MR-Egger (Mendelian randomization-Egger), weight median and MR-PRESSO (Mendelian randomization pleiotropy residual sum and outlier) methods. Heterogeneity was tested via IVW and MR-Egger. Pleiotropy was tested using MR-Egger intercept test and MR-PRESSO method. MR-PRESSO was also used to detect outliers. Leave-one-out analysis was used to identify SNPs with potential effect. Reverse MR analysis was also performed.

RESULT

In IVW, circulating vitamin D levels had no causal effect on MG [OR = 0.91 (0.67-1.22), = 0.532] and MG had no causal effect on circulating vitamin D [OR = 1.01 (099-1.02), = 0.663]. No heterogeneity or pleiotropy was observed ( > 0.05). Other MR methods also agreed with IVW results.

CONCLUSION

This study provides the causal relationship between genetically predicted circulating vitamin D levels and MG and provides new insights into the genetics of MG.

PubMed: 37538922
DOI: 10.3389/fnut.2023.1171830

Comparison of physics-based deformable registration methods for image-guided neurosurgery.

Frontiers in Digital Health 2023

This paper compares three finite element-based methods used in a physics-based non-rigid registration approach and reports on the progress made over the last 15 years.... (Review)

Summary PubMed Full Text PDF

Review

Authors: Nikos Chrisochoides, Yixun Liu, Fotis Drakopoulos...

This paper compares three finite element-based methods used in a physics-based non-rigid registration approach and reports on the progress made over the last 15 years. Large brain shifts caused by brain tumor removal affect registration accuracy by creating point and element outliers. A combination of approximation- and geometry-based point and element outlier rejection improves the rigid registration error by 2.5 mm and meets the real-time constraints (4 min). In addition, the paper raises several questions and presents two open problems for the robust estimation and improvement of registration error in the presence of outliers due to sparse, noisy, and incomplete data. It concludes with preliminary results on leveraging Quantum Computing, a promising new technology for computationally intensive problems like Feature Detection and Block Matching in addition to finite element solver; all three account for 75% of computing time in deformable registration.

PubMed: 38144260
DOI: 10.3389/fdgth.2023.1283726

New approaches and technical considerations in detecting outlier measurements and trajectories in longitudinal children growth data.

BMC Medical Research Methodology Oct 2023

Growth studies rely on longitudinal measurements, typically represented as trajectories. However, anthropometry is prone to errors that can generate outliers. While...

Summary PubMed Full Text PDF

Authors: Paraskevi Massara, Arooj Asrar, Celine Bourdon...

BACKGROUND

Growth studies rely on longitudinal measurements, typically represented as trajectories. However, anthropometry is prone to errors that can generate outliers. While various methods are available for detecting outlier measurements, a gold standard has yet to be identified, and there is no established method for outlying trajectories. Thus, outlier types and their effects on growth pattern detection still need to be investigated. This work aimed to assess the performance of six methods at detecting different types of outliers, propose two novel methods for outlier trajectory detection and evaluate how outliers affect growth pattern detection.

METHODS

We included 393 healthy infants from The Applied Research Group for Kids (TARGet Kids!) cohort and 1651 children with severe malnutrition from the co-trimoxazole prophylaxis clinical trial. We injected outliers of three types and six intensities and applied four outlier detection methods for measurements (model-based and World Health Organization cut-offs-based) and two for trajectories. We also assessed growth pattern detection before and after outlier injection using time series clustering and latent class mixed models. Error type, intensity, and population affected method performance.

RESULTS

Model-based outlier detection methods performed best for measurements with precision between 5.72-99.89%, especially for low and moderate error intensities. The clustering-based outlier trajectory method had high precision of 14.93-99.12%. Combining methods improved the detection rate to 21.82% in outlier measurements. Finally, when comparing growth groups with and without outliers, the outliers were shown to alter group membership by 57.9 -79.04%.

CONCLUSIONS

World Health Organization cut-off-based techniques were shown to perform well in few very particular cases (extreme errors of high intensity), while model-based techniques performed well, especially for moderate errors of low intensity. Clustering-based outlier trajectory detection performed exceptionally well across all types and intensities of errors, indicating a potential strategic change in how outliers in growth data are viewed. Finally, the importance of detecting outliers was shown, given its impact on children growth studies, as demonstrated by comparing results of growth group detection.

Topics: Child; Humans; Cluster Analysis; Research Design; Infant; Child Development

PubMed: 37833647
DOI: 10.1186/s12874-023-02045-w