-
Lancet (London, England) Dec 2001
Topics: Hospital Mortality; Humans; Models, Statistical; Outliers, DRG; Records; United Kingdom
PubMed: 11755647
DOI: 10.1016/s0140-6736(01)07118-5 -
Sensors (Basel, Switzerland) Sep 2022The Internet of Things (IoT) refers to a system of interconnected, internet-connected devices and sensors that allows the collection and dissemination of data. The data...
The Internet of Things (IoT) refers to a system of interconnected, internet-connected devices and sensors that allows the collection and dissemination of data. The data provided by these sensors may include outliers or exhibit anomalous behavior as a result of attack activities or device failure, for example. However, the majority of existing outlier detection algorithms rely on labeled data, which is frequently hard to obtain in the IoT domain. More crucially, the IoT's data volume is continually increasing, necessitating the requirement for predicting and identifying the classes of future data. In this study, we propose an unsupervised technique based on a deep Variational Auto-Encoder (VAE) to detect outliers in IoT data by leveraging the characteristic of the reconstruction ability and the low-dimensional representation of the input data's latent variables of the VAE. First, the input data are standardized. Then, we employ the VAE to find a reconstructed output representation from the low-dimensional representation of the latent variables of the input data. Finally, the reconstruction error between the original observation and the reconstructed one is used as an outlier score. Our model was trained only using normal data with no labels in an unsupervised manner and evaluated using Statlog (Landsat Satellite) dataset. The unsupervised model achieved promising and comparable results with the state-of-the-art outlier detection schemes with a precision of ≈90% and an F1 score of 79%.
PubMed: 36081083
DOI: 10.3390/s22176617 -
Journal of Applied Statistics 2020Outlier detection can be seen as a pre-processing step for locating data points in a data sample, which do not conform to the majority of observations. Various...
Outlier detection can be seen as a pre-processing step for locating data points in a data sample, which do not conform to the majority of observations. Various techniques and methods for outlier detection can be found in the literature dealing with different types of data. However, many data sets are inflated by true zeros and, in addition, some components/variables might be of compositional nature. Important examples of such data sets are the Structural Earnings Survey, the Structural Business Statistics, the European Statistics on Income and Living Conditions, tax data or - as in this contribution - household expenditure data which are used, for example, to estimate the Purchase Power Parity of a country. In this work, robust univariate and multivariate outlier detection methods are compared by a complex simulation study that considers various challenges included in data sets, namely structural (true) zeros, missing values, and compositional variables. These circumstances make it difficult or impossible to flag true outliers and influential observations by well-known outlier detection methods. Our aim is to assess the performance of outlier detection methods in terms of their effectiveness to identify outliers when applied to challenging data sets such as the household expenditures data surveyed all over the world. Moreover, different methods are evaluated through a close-to-reality simulation study. Differences in performance of univariate and multivariate robust techniques for outlier detection and their shortcomings are reported. We found that robust multivariate methods outperform robust univariate methods. The best performing methods in finding the outliers and in providing a low false discovery rate were found to be the generalized S estimators (GSE), the BACON-EEM algorithm and a compositional method (CoDa-Cov). In addition, these methods performed also best when the outliers are imputed based on the corresponding outlier detection method and indicators are estimated from the data sets.
PubMed: 35707025
DOI: 10.1080/02664763.2019.1671961 -
BMC Health Services Research Jan 2023Institutions or clinicians (units) are often compared according to a performance indicator such as in-hospital mortality. Several approaches have been proposed for the...
BACKGROUND
Institutions or clinicians (units) are often compared according to a performance indicator such as in-hospital mortality. Several approaches have been proposed for the detection of outlying units, whose performance deviates from the overall performance.
METHODS
We provide an overview of three approaches commonly used to monitor institutional performances for outlier detection. These are the common-mean model, the 'Normal-Poisson' random effects model and the 'Logistic' random effects model. For the latter we also propose a visualisation technique. The common-mean model assumes that the underlying true performance of all units is equal and that any observed variation between units is due to chance. Even after applying case-mix adjustment, this assumption is often violated due to overdispersion and a post-hoc correction may need to be applied. The random effects models relax this assumption and explicitly allow the true performance to differ between units, thus offering a more flexible approach. We discuss the strengths and weaknesses of each approach and illustrate their application using audit data from England and Wales on Adult Cardiac Surgery (ACS) and Percutaneous Coronary Intervention (PCI).
RESULTS
In general, the overdispersion-corrected common-mean model and the random effects approaches produced similar p-values for the detection of outliers. For the ACS dataset (41 hospitals) three outliers were identified in total but only one was identified by all methods above. For the PCI dataset (88 hospitals), seven outliers were identified in total but only two were identified by all methods. The common-mean model uncorrected for overdispersion produced several more outliers. The reason for observing similar p-values for all three approaches could be attributed to the fact that the between-hospital variance was relatively small in both datasets, resulting only in a mild violation of the common-mean assumption; in this situation, the overdispersion correction worked well.
CONCLUSION
If the common-mean assumption is likely to hold, all three methods are appropriate to use for outlier detection and their results should be similar. Random effect methods may be the preferred approach when the common-mean assumption is likely to be violated.
Topics: Humans; Percutaneous Coronary Intervention; Hospitals; Risk Adjustment; Logistic Models; England
PubMed: 36627627
DOI: 10.1186/s12913-022-08995-z -
ISA Transactions Oct 2022This paper proposes an adaptive dual control with outlier detection that is robust to the occurrence of outliers in uncertain systems. Outliers occasionally exist in...
This paper proposes an adaptive dual control with outlier detection that is robust to the occurrence of outliers in uncertain systems. Outliers occasionally exist in system process noise and observation noise, which could cause poor parameter estimation and degraded control performance of uncertain systems. For this reason, we devise an online outlier detection mechanism to filter the outliers so as to enhance the parameter estimation of uncertain systems. The devised mechanism makes decisions on outlier detection via the generated predicted regions where the newly arriving data is expected to locate, and the predicted regions are updated in real-time according to the historical data. The detection mechanism is integrated into the design of adaptive dual control, which is derived based on the bicriterial method. Compared with classical dual control merely considering uncertainty in input and output data stream, we are the first to include the uncontrollable excitations into the structure of dual control to fit practical scenarios, and this inclusion also provides an extensive cover on outliers to be detected. The improved performance of the proposed approach is verified using a mathematical model through one-time simulation and Monte Carlo simulations under different conditions, and we also evaluate our method in the control of fermentation sterilization process for more convincing results.
Topics: Computer Simulation; Models, Theoretical; Monte Carlo Method; Uncertainty
PubMed: 35131093
DOI: 10.1016/j.isatra.2022.01.021 -
Spectrochimica Acta. Part A, Molecular... Nov 2022Due to the high dimensionality and non-linearity of the near infrared (NIR) spectra data result the difficulty of the outlier measure. This paper proposed a probability...
Due to the high dimensionality and non-linearity of the near infrared (NIR) spectra data result the difficulty of the outlier measure. This paper proposed a probability based outlier detection method, which adopted the distribution probability of the spectra data to identify outliers at each wavelength by using of copula function. The negative logarithmic function was also used to quantify the overall variation of the joint distribution for the outliers. This method not only enlarges the difference of the spectra between typical samples and outliers, but also can be adapted to multi-type of outliers. Moreover, the jump degree in statistics was introduced for the automated determination of threshold for the outliers, which avoids the threshold setting problem in empirical way and the misjudgment of the outliers. In order to investigate the effectiveness of the algorithm, the recognition of different cases and types of outliers were applied, and compared with the commonly used PCA-Mahalanobis distance, spectral residual (SR) and leverage methods. The experimental results showed that the probability based outlier detection method effectively improved the performance of outlier identification and calibration for NIR analysis.
Topics: Algorithms; Calibration; Probability; Spectroscopy, Near-Infrared
PubMed: 35717926
DOI: 10.1016/j.saa.2022.121473 -
Research Synthesis Methods Apr 2010The presence of outliers and influential cases may affect the validity and robustness of the conclusions from a meta-analysis. While researchers generally agree that it...
The presence of outliers and influential cases may affect the validity and robustness of the conclusions from a meta-analysis. While researchers generally agree that it is necessary to examine outlier and influential case diagnostics when conducting a meta-analysis, limited studies have addressed how to obtain such diagnostic measures in the context of a meta-analysis. The present paper extends standard diagnostic procedures developed for linear regression analyses to the meta-analytic fixed- and random/mixed-effects models. Three examples are used to illustrate the usefulness of these procedures in various research settings. Issues related to these diagnostic procedures in meta-analysis are also discussed. Copyright © 2010 John Wiley & Sons, Ltd.
PubMed: 26061377
DOI: 10.1002/jrsm.11 -
IEEE Transactions on Pattern Analysis... Sep 2023With the development of 3D matching technology, correspondence-based point cloud registration gains more attention. Unfortunately, 3D keypoint techniques inevitably...
With the development of 3D matching technology, correspondence-based point cloud registration gains more attention. Unfortunately, 3D keypoint techniques inevitably produce a large number of outliers, i.e., outlier rate is often larger than 95%. Guaranteed outlier removal (GORE) Bustos and Chin has shown very good robustness to extreme outliers. However, the high computational cost (exponential in the worst case) largely limits its usages in practice. In this paper, we propose the first O(N) time GORE method, called quadratic-time GORE (QGORE), which preserves the globally optimal solution while largely increases the efficiency. QGORE leverages a simple but effective voting idea via geometric consistency for upper bound estimation, which achieves almost the same tightness as the one in GORE. We also present a one-point RANSAC by exploring "rotation correspondence" for lower bound estimation, which largely reduces the number of iterations of traditional 3-point RANSAC. Further, we propose a l -like adaptive estimator for optimization. Extensive experiments show that QGORE achieves the same robustness and optimality as GORE while being 1 ∼ 2 orders faster. The source code will be made publicly available.
PubMed: 37030708
DOI: 10.1109/TPAMI.2023.3262780 -
Statistics in Medicine May 1990This paper concerns techniques for detection of a potential outlier or extreme observation in a bioavailability/bioequivalence study. A bioavailability analysis that...
This paper concerns techniques for detection of a potential outlier or extreme observation in a bioavailability/bioequivalence study. A bioavailability analysis that includes possible outlying values may affect the decision on bioequivalence. We consider a general crossover model that takes into account period and formulation effects. We derive two test procedures, the likelihood distance and the estimates distance, to detect potential outliers. We show that the two procedures relate to a chi-square distribution with three degrees of freedom. The main purpose of this paper is to exhibit and discuss these two general approaches of outliers detection in the context of a bioavailability/bioequivalence study. To illustrate these approaches, we use data from three-way crossover experiment in the pharmaceutical industry that concerned the comparison of the bioavailability of two test formulations and a standard (reference) formulation of a drug. This example demonstrates the influence of an outlying value in the study of bioequivalence.
Topics: Analysis of Variance; Biological Availability; Chemistry, Pharmaceutical; Chi-Square Distribution; Drug Evaluation; Humans; Likelihood Functions; Linear Models; Models, Statistical; Pharmacokinetics; Reference Standards
PubMed: 2135947
DOI: 10.1002/sim.4780090508 -
Annual International Conference of the... Aug 2016The present covariance based outlier detection algorithm selects from a candidate set of feature vectors that are best at identifying outliers. Features extracted from...
The present covariance based outlier detection algorithm selects from a candidate set of feature vectors that are best at identifying outliers. Features extracted from biomedical and health informatics data can be more informative in disease assessment and there are no restrictions on the nature and number of features that can be tested. But an important challenge for an algorithm operating on a set of features is for it to winnow the effective features from the ineffective ones. The powerful algorithm described in this paper leverages covariance information from the time series data to identify features with the highest sensitivity for outlier identification. Empirical results demonstrate the efficacy of the method.
Topics: Algorithms; Data Interpretation, Statistical; Humans; Medical Informatics; Models, Theoretical; ROC Curve; Reproducibility of Results; Statistics as Topic
PubMed: 28268856
DOI: 10.1109/EMBC.2016.7591264