-
Scientific Reports Feb 2023Outlier detection is an important topic in machine learning and has been used in a wide range of applications. Outliers are objects that are few in number and deviate...
Outlier detection is an important topic in machine learning and has been used in a wide range of applications. Outliers are objects that are few in number and deviate from the majority of objects. As a result of these two properties, we show that outliers are susceptible to a mechanism called fluctuation. This article proposes a method called fluctuation-based outlier detection (FBOD) that achieves a low linear time complexity and detects outliers purely based on the concept of fluctuation without employing any distance, density or isolation measure. Fundamentally different from all existing methods. FBOD first converts the Euclidean structure datasets into graphs by using random links, then propagates the feature value according to the connection of the graph. Finally, by comparing the difference between the fluctuation of an object and its neighbors, FBOD determines the object with a larger difference as an outlier. The results of experiments comparing FBOD with eight state-of-the-art algorithms on eight real-worlds tabular datasets and three video datasets show that FBOD outperforms its competitors in the majority of cases and that FBOD has only 5% of the execution time of the fastest algorithm. The experiment codes are available at: https://github.com/FluctuationOD/Fluctuation-based-Outlier-Detection .
PubMed: 36765095
DOI: 10.1038/s41598-023-29549-1 -
The VLDB Journal : Very Large Data... 2022While many techniques for outlier detection have been proposed in the literature, the interpretation of detected outliers is often left to users. As a result, it is...
While many techniques for outlier detection have been proposed in the literature, the interpretation of detected outliers is often left to users. As a result, it is difficult for users to promptly take appropriate actions concerning the detected outliers. To lessen this difficulty, when outliers are identified, they should be presented together with their explanations. There are survey papers on outlier detection, but none exists for outlier explanations. To fill this gap, in this paper, we present a survey on outlier explanations in which meaningful knowledge is mined from anomalous data to explain them. We define different types of outlier explanations and discuss the challenges in generating each type. We review the existing outlier explanation techniques and discuss how they address the challenges. We also discuss the applications of outlier explanations and review the existing methods used to evaluate outlier explanations. Furthermore, we discuss possible future research directions.
PubMed: 35095253
DOI: 10.1007/s00778-021-00721-1 -
Entropy (Basel, Switzerland) May 2023Outliers are often present in data and many algorithms exist to find these outliers. Often we can verify these outliers to determine whether they are data errors or not....
Outliers are often present in data and many algorithms exist to find these outliers. Often we can verify these outliers to determine whether they are data errors or not. Unfortunately, checking such points is time-consuming and the underlying issues leading to the data error can change over time. An outlier detection approach should therefore be able to optimally use the knowledge gained from the verification of the ground truth and adjust accordingly. With advances in machine learning, this can be achieved by applying reinforcement learning on a statistical outlier detection approach. The approach uses an ensemble of proven outlier detection methods in combination with a reinforcement learning approach to tune the coefficients of the ensemble with every additional bit of data. The performance and the applicability of the reinforcement learning outlier detection approach are illustrated using granular data reported by Dutch insurers and pension funds under the Solvency II and FTK frameworks. The application shows that outliers can be identified by the ensemble learner. Moreover, applying the reinforcement learner on top of the ensemble model can further improve the results by optimising the coefficients of the ensemble learner.
PubMed: 37372186
DOI: 10.3390/e25060842 -
Journal of Affective Disorders Feb 2022Symptom manifestations in affective disorders can be subtle. Small imprecisions in measurement can lead to incorrect estimation of change. Previously, expert-derived...
Symptom manifestations in affective disorders can be subtle. Small imprecisions in measurement can lead to incorrect estimation of change. Previously, expert-derived scoring inconsistency flags were developed for MADRS. Currently, we derive empirically based outlier-pattern flags, to further detect imprecisions in ratings. NEWMEDS data repository of almost 25,000 MADRS administrations from 11 registration trials of antidepressants was used to identify outlier response patterns reflecting potentially careless responses. Coverage of these flags was compared to previously published expert derived flags. Both sets of flags were also further tested in Monte Carlo simulated data as a proxy to applying flags under conditions of known inconsistency. The outlier flags derived provide cutting points to identify: (1) under and overuse of values (e.g., Scoring "1″ on 6 or more items), (2) disproportionate use of even or odd response choices (e.g., 8 or more odd values), (3) longest consecutive use of value (e.g., more than 5 items in a row scored with same value), (4) high variability within administration (standard deviation greater than 1.8), (5) outlier responses on multiple items (i.e., multivariate outliers), and (6) outlier scoring (e.g., scoring 4,5 or 6 on item 1). Outlier response flags were raised in 26% of the MADRS administration and in 97% of the Monte Carlo data. Of administrations with no expert flag, 21.7% had an outlier flag and of administrations with at least one expert flag, 27.7% also had an outlier flag. Outlier-pattern flags appear to be a useful adjunct to expert derived flags in the quest to improve measurement in clinical trials.
Topics: Antidepressive Agents; Depression; Humans; Mood Disorders; Psychiatric Status Rating Scales; Reproducibility of Results
PubMed: 34952105
DOI: 10.1016/j.jad.2021.12.076 -
Genes Feb 2023Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence,...
Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model performance cannot be reproduced on independent data. It is then also doubtful whether a classifier qualifies for clinical usage. We estimate classifier performances in simulated gene expression data with artificial outliers and in two real-world datasets. As a new approach, we use two outlier detection methods within a bootstrap procedure to estimate the outlier probability for each sample and evaluate classifiers before and after outlier removal by means of cross-validation. We found that the removal of outliers changed the classification performance notably. For the most part, removing outliers improved the classification results. Taking into account the fact that there are various, sometimes unclear reasons for a sample to be an outlier, we strongly advocate to always report the performance of a transcriptomics classifier with and without outliers in training and test data. This provides a more diverse picture of a classifier's performance and prevents reporting models that later turn out to be not applicable for clinical diagnoses.
Topics: Transcriptome; Gene Expression Profiling; Probability; Research Design
PubMed: 36833313
DOI: 10.3390/genes14020387 -
IEEE Transactions on Cybernetics Aug 2022Outlier detection is one of the most important research directions in data mining. However, most of the current research focuses on outlier detection for categorical or...
Outlier detection is one of the most important research directions in data mining. However, most of the current research focuses on outlier detection for categorical or numerical attribute data. There are few studies on the outlier detection of mixed attribute data. In this article, we introduce fuzzy rough sets (FRSs) to deal with the problem of outlier detection in mixed attribute data. Since the outlier detection model of the classical rough set is only applicable to the categorical attribute data, we use FRS to generalize the outlier detection model and construct a generalized outlier detection model based on fuzzy rough granules. First, the granule outlier degree (GOD) is defined to characterize the outlier degree of fuzzy rough granules by employing the fuzzy approximation accuracy. Then, the outlier factor based on fuzzy rough granules is constructed by integrating the GOD and the corresponding weights to characterize the outlier degree of objects. Furthermore, the corresponding fuzzy rough granules-based outlier detection (FRGOD) algorithm is designed. The effectiveness of the FRGOD algorithm is evaluated through experiments on 16 real-world datasets. The experimental results show that the algorithm is more flexible for detecting outliers and is suitable for numerical, categorical, and mixed attribute data.
Topics: Algorithms; Data Mining; Fuzzy Logic
PubMed: 33750721
DOI: 10.1109/TCYB.2021.3058780 -
Journal of Medical Internet Research May 2021Perioperative quantitative monitoring of neuromuscular function in patients receiving neuromuscular blockers has become internationally recognized as an absolute and...
BACKGROUND
Perioperative quantitative monitoring of neuromuscular function in patients receiving neuromuscular blockers has become internationally recognized as an absolute and core necessity in modern anesthesia care. Because of their kinetic nature, artifactual recordings of acceleromyography-based neuromuscular monitoring devices are not unusual. These generate a great deal of cynicism among anesthesiologists, constituting an obstacle toward their widespread adoption. Through outlier analysis techniques, monitoring devices can learn to detect and flag signal abnormalities. Outlier analysis (or anomaly detection) refers to the problem of finding patterns in data that do not conform to expected behavior.
OBJECTIVE
This study was motivated by the development of a smartphone app intended for neuromuscular monitoring based on combined accelerometric and angular hand movement data. During the paired comparison stage of this app against existing acceleromyography monitoring devices, it was noted that the results from both devices did not always concur. This study aims to engineer a set of features that enable the detection of outliers in the form of erroneous train-of-four (TOF) measurements from an acceleromyographic-based device. These features are tested for their potential in the detection of erroneous TOF measurements by developing an outlier detection algorithm.
METHODS
A data set encompassing 533 high-sensitivity TOF measurements from 35 patients was created based on a multicentric open label trial of a purpose-built accelero- and gyroscopic-based neuromuscular monitoring app. A basic set of features was extracted based on raw data while a second set of features was purpose engineered based on TOF pattern characteristics. Two cost-sensitive logistic regression (CSLR) models were deployed to evaluate the performance of these features. The final output of the developed models was a binary classification, indicating if a TOF measurement was an outlier or not.
RESULTS
A total of 7 basic features were extracted based on raw data, while another 8 features were engineered based on TOF pattern characteristics. The model training and testing were based on separate data sets: one with 319 measurements (18 outliers) and a second with 214 measurements (12 outliers). The F1 score (95% CI) was 0.86 (0.48-0.97) for the CSLR model with engineered features, significantly larger than the CSLR model with the basic features (0.29 [0.17-0.53]; P<.001).
CONCLUSIONS
The set of engineered features and their corresponding incorporation in an outlier detection algorithm have the potential to increase overall neuromuscular monitoring data consistency. Integrating outlier flagging algorithms within neuromuscular monitors could potentially reduce overall acceleromyography-based reliability issues.
TRIAL REGISTRATION
ClinicalTrials.gov NCT03605225; https://clinicaltrials.gov/ct2/show/NCT03605225.
Topics: Accelerometry; Humans; Machine Learning; Neuromuscular Blockade; Neuromuscular Monitoring; Reproducibility of Results
PubMed: 34152273
DOI: 10.2196/25913 -
Heliyon Jan 2024This review aimed to harmoniously summarize and compare outlier rates for various cardiac troponin (cTn) assays, including high-sensitivity-cTn (hs-cTn) assays and... (Review)
Review
OBJECTIVES
This review aimed to harmoniously summarize and compare outlier rates for various cardiac troponin (cTn) assays, including high-sensitivity-cTn (hs-cTn) assays and contemporary cTn (generation of assays prior to hs-cTn ones) assays, from the published studies.
METHODS
The PRISMA guidelines were utilized to perform this systematic review. Five databases, including PubMed, Scopus, Embase, Cochrane Library, and Web of Science, were searched using specific keywords up to June 30th, 2023. Studies reporting specifically calculated outlier rates for cTn assays when conducting in-vitro diagnosis in human samples were included. Selected studies were then further assessed using the GRADE tool.
RESULTS
Thirteen studies were included. The data from the studies were summarized statistically in this review. The results showed substantial evidence of improved analytical robustness or reduced respective mean rates of outliers, critical outliers, and analytical outliers for hs-cTn assays (0.14 %, 0.18 %, and 0.18 %) compared to contemporary cTn assays (0.63 %, 0.71 %, and 0.50 %).
CONCLUSION
The findings offer promisingly provide a comprehensive reference for laboratory scientists and clinical staff in choosing the most suitable cTn assay for patient care regrading outlier rates. Besides, this review reveals the advancements of hs-cTn assays with lower outlier rates than contemporary cTn assays. The emerging challenges for continuously improving analytical robustness of cTn assays are also elaborated.
PubMed: 38205298
DOI: 10.1016/j.heliyon.2023.e23788 -
IEEE Transactions on Image Processing :... 2021Outlier handling has attracted considerable attention recently but remains challenging for image deblurring. Existing approaches mainly depend on iterative outlier...
Outlier handling has attracted considerable attention recently but remains challenging for image deblurring. Existing approaches mainly depend on iterative outlier detection steps to explicitly or implicitly reduce the influence of outliers on image deblurring. However, these outlier detection steps usually involve heuristic operations and iterative optimization processes, which are complex and time-consuming. In contrast, we propose to learn a deep convolutional neural network to directly estimate the confidence map, which can identify reliable inliers and outliers from the blurred image and thus facilitates the following deblurring process. We analyze that the proposed algorithm incorporated with the learned confidence map is effective in handling outliers and does not require ad-hoc outlier detection steps which are critical to existing outlier handling methods. Compared to existing approaches, the proposed algorithm is more efficient and can be applied to both non-blind and blind image deblurring. Extensive experimental results demonstrate that the proposed algorithm performs favorably against state-of-the-art methods in terms of accuracy and efficiency.
PubMed: 33417555
DOI: 10.1109/TIP.2020.3048679 -
Frontiers in Physiology 2023As an important technique for data pre-processing, outlier detection plays a crucial role in various real applications and has gained substantial attention, especially...
As an important technique for data pre-processing, outlier detection plays a crucial role in various real applications and has gained substantial attention, especially in medical fields. Despite the importance of outlier detection, many existing methods are vulnerable to the distribution of outliers and require prior knowledge, such as the outlier proportion. To address this problem to some extent, this article proposes an adaptive mini-minimum spanning tree-based outlier detection (MMOD) method, which utilizes a novel distance measure by scaling the Euclidean distance. For datasets containing different densities and taking on different shapes, our method can identify outliers without prior knowledge of outlier percentages. The results on both real-world medical data corpora and intuitive synthetic datasets demonstrate the effectiveness of the proposed method compared to state-of-the-art methods.
PubMed: 37900945
DOI: 10.3389/fphys.2023.1233341