-
Scientific Reports Mar 2024The OLS model is built on the assumption of normality in the distribution of error terms. However, this assumption can be easily violated, especially when there are...
The OLS model is built on the assumption of normality in the distribution of error terms. However, this assumption can be easily violated, especially when there are outliers in the data. A single outlier can disrupt the normality assumption of error terms, making the OLS model less effective. In such situations, M-estimators (MEs) come into play to obtain reliable estimates. We introduce a redescending M-estimators (RME) for robust regression to handle datasets with outliers. The proposed RME produces more robust estimates by effectively managing the influence of outliers, even at lower values of the tuning constant. We compared the performance of this estimator with existing RMEs using real-life data examples and an extensive simulation study. The results show that our suggested RME is more efficient than the compared ME in various situations.
PubMed: 38532107
DOI: 10.1038/s41598-024-57906-1 -
Clinical Biochemistry Feb 2007This paper examines the pitfalls that arise when an outlier is assessed using a criterion based on a fixed multiple of the standard deviation rather than an established... (Review)
Review
OBJECTIVES
This paper examines the pitfalls that arise when an outlier is assessed using a criterion based on a fixed multiple of the standard deviation rather than an established statistical test. Although the former approach is statistically invalid, it is the favored method for identifying outliers in Ontario laboratory quality control protocols.
DESIGN AND METHODS
Computer simulations are used to calculate the probability of a false positive result (classifying a valid observation as an outlier) when outlier criteria based on fixed multiples of the standard deviation are applied to samples containing no outliers.
RESULTS
The estimated probability of a false positive result is tabulated over various sample sizes. Outlier criteria based on fixed multiples of the standard deviation are shown to be highly inefficient.
CONCLUSIONS
This work presents arguments for discontinuing the widespread practice of using outlier criteria based on fixed multiples of the standard deviation to identify outliers in univariate samples.
Topics: Clinical Chemistry Tests; Computer Simulation; False Positive Reactions; Humans; Laboratories; Ontario; Quality Control; Reproducibility of Results; Statistics as Topic
PubMed: 17222814
DOI: 10.1016/j.clinbiochem.2006.08.019 -
Statistical Applications in Genetics... 2009In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they...
In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they appear even in widely used benchmark data sets and can negatively affect microarray data analysis. It is important to identify outliers in order to explore underlying experimental or biological problems and remove erroneous data. We propose an outlier detection method based on principal component analysis (PCA) and robust estimation of Mahalanobis distances that is fully automatic. We demonstrate that our outlier detection method identifies biologically significant outliers with high accuracy and that outlier removal improves the prediction accuracy of classifiers. Our outlier detection method is closely related to existing robust PCA methods, so we compare our outlier detection method to a prominent robust PCA method.
Topics: Colonic Neoplasms; Databases, Genetic; Humans; Oligonucleotide Array Sequence Analysis; Outliers, DRG; Principal Component Analysis
PubMed: 19222380
DOI: 10.2202/1544-6115.1426 -
Journal of Experimental Psychology.... Nov 2020Ensemble perception-the encoding of objects by their group properties-is known to be resistant to outlier noise. However, this resistance is somewhat paradoxical: how...
Ensemble perception-the encoding of objects by their group properties-is known to be resistant to outlier noise. However, this resistance is somewhat paradoxical: how can the visual system determine which stimuli are outliers without already having derived statistical properties of the ensemble? A simple solution would be that ensemble perception is not a simple, one-step process; instead, outliers are detected through iterative computations that identify items with high deviance from the mean and reduce their weight in the representation over time. Here we tested this hypothesis. In Experiment 1, we found evidence that outliers are discounted from mean orientation judgments, extending previous results from ensemble face perception. In Experiment 2, we tested the timing of outlier rejection by having participants perform speeded judgments of sets with or without outliers. We observed significant increases in reaction time (RT) when outliers were present, but a decrease compared to no-outlier sets of matched range suggesting that range alone did not drive RTs. In Experiment 3 we tested the timing by which outlier noise reduces over time. We presented sets for variable exposure durations and found that noise decreases linearly over time. Altogether these results suggest that ensemble representations are optimized through iterative computations aimed at reducing noise. The finding that ensemble perception is an iterative process provides a useful framework for understanding contextual effects on ensemble perception. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Topics: Adolescent; Adult; Female; Humans; Male; Pattern Recognition, Visual; Perceptual Masking; Reaction Time; Space Perception; Young Adult
PubMed: 32757592
DOI: 10.1037/xhp0000857 -
The Journal of Trauma and Acute Care... Aug 2019Expected performance rates for various outcome metrics are a hallmark of hospital quality indicators used by Agency of Healthcare Research and Quality, Center for...
BACKGROUND
Expected performance rates for various outcome metrics are a hallmark of hospital quality indicators used by Agency of Healthcare Research and Quality, Center for Medicare and Medicaid Services, and National Quality Forum. The identification of outlier hospitals with above- and below-expected mortality for emergency general surgery (EGS) operations is therefore of great value for EGS quality improvement initiatives. The aim of this study was to determine hospital variation in mortality after EGS operations, and compare characteristics between outlier hospitals.
METHODS
Using data from the California State Inpatient Database (2010-2011), we identified patients who underwent one of eight common EGS operations. Expected mortality was obtained from a Bayesian model, adjusting for both patient- and hospital-level variables. A hospital-level standardized mortality ratio (SMR) was constructed (ratio of observed to expected deaths). Only hospitals performing three or more of each operation were included. An "outlier" hospital was defined as having an SMR with 80% confidence interval that did not cross 1.0. High- and low-mortality SMR outliers were compared.
RESULTS
There were 140,333 patients included from 220 hospitals. Standardized mortality ratio varied from a high of 2.6 (mortality, 160% higher than expected) to a low of 0.2 (mortality, 80% lower than expected); 12 hospitals were high SMR outliers, and 28 were low SMR outliers. Standardized mortality was over three times worse in the high SMR outliers compared with the low SMR outliers (1.7 vs. 0.5; p < 0.001). Hospital-, patient-, and operative-level characteristics were equivalent in each outlier group.
CONCLUSION
There exists significant hospital variation in standardized mortality after EGS operations. High SMR outliers have significant excess mortality, while low SMR outliers have superior EGS survival. Common hospital-level characteristics do not explain the wide gap between underperforming and overperforming outlier institutions. These findings suggest that SMR can help guide assessment of EGS performance across hospitals; further research is essential to identify and define the hospital processes of care which translate into optimal EGS outcomes.
LEVEL OF EVIDENCE
Epidemiologic Study, level III.
Topics: California; Emergencies; Female; Hospital Mortality; Hospitals; Humans; Male; Middle Aged; Quality Improvement; Quality Indicators, Health Care; Surgical Procedures, Operative
PubMed: 30908450
DOI: 10.1097/TA.0000000000002271 -
Bioanalysis Oct 2020Guidelines like United States Pharmacopeia 1032 [1] and pharm.Eur. [2] acknowledge that cell-based bioassays are complex methods and thus prone to outliers. However,...
Guidelines like United States Pharmacopeia 1032 [1] and pharm.Eur. [2] acknowledge that cell-based bioassays are complex methods and thus prone to outliers. However, investigations into root causes of outliers are often inconclusive. We have established a procedure (including quality control and documentation) implemented in a freely available software application which includes not only the experience of the analyst but also information of historical data. This action limit outlier test is unique to our knowledge. Action limit outlier test allows the determination of outliers efficiently which lead to a significant reduction of false positives in comparison with the traditional outlier test ROUT [3] or Rosner [4] alone as shown by our simulated data (58 and 44% reduction of false positives for ROUT and Rosner, respectively).
Topics: Biological Assay; Dose-Response Relationship, Drug; Humans
PubMed: 33025795
DOI: 10.4155/bio-2020-0189 -
IEEE Transactions on Pattern Analysis... Jul 2023Registration is a basic yet crucial task in point cloud processing. In correspondence-based point cloud registration, matching correspondences by point feature...
Registration is a basic yet crucial task in point cloud processing. In correspondence-based point cloud registration, matching correspondences by point feature techniques may lead to an extremely high outlier (false correspondence) ratio. Current outlier removal methods still suffer from low efficiency, accuracy, and recall rate. We use an intuitive method to describe the 6-DOF (degree of freedom) curtailment process in point cloud registration and propose an outlier removal strategy based on the reliability of the correspondence graph. The method constructs the corresponding graph according to the given correspondences and designs the concept of the reliability degree of the graph node for optimal candidate selection and the reliability degree of the graph edge to obtain the global maximum consensus set. The presented method achieves fast and accurate outliers removal along with gradual aligning parameters estimation. Extensive experiments on simulations and challenging real-world datasets demonstrate that the proposed method can still perform effective point cloud registration even the correspondence outlier ratio is over 99%, and the efficiency is better than the state-of-the-art. Code is available at https://github.com/WPC-WHU/GROR.
PubMed: 37015572
DOI: 10.1109/TPAMI.2022.3226498 -
Pharmaceutical Statistics Nov 2018The USP<1032> guidelines recommend the screening of bioassay data for outliers prior to performing a relative potency (RP) analysis. The guidelines, however, do not...
The USP<1032> guidelines recommend the screening of bioassay data for outliers prior to performing a relative potency (RP) analysis. The guidelines, however, do not offer advice on the size or type of outlier that should be removed prior to model fitting and calculation of RP. Computer simulation was used to investigate the consequences of ignoring the USP<1032> guidance to remove outliers. For biotherapeutics and vaccines, outliers in potency data may result in the false acceptance/rejection of a bad/good lot of drug product. Biological activity, measured through a potency bioassay, is considered a critical quality attribute in manufacturing. If the concentration-response potency curve of a test sample is deemed to be similar in shape to that of the reference standard, the curves are said to exhibit constant RP, an essential criterion for the interpretation of a RP. One or more outliers in the concentration-response data, however, may result in a failure to declare similarity or may yield a biased RP estimate. Concentration-response curves for test and reference were computer generated with constant RP from four-parameter logistic curves. Single outlier, multiple outlier, and whole-curve outlier scenarios were explored for their effects on the similarity testing and on the RP estimation. Though the simulations point to situations for which outlier removal is unnecessary, the results generally support the USP<1032> recommendation and illustrate the impact on the RP calculation when application of outlier removal procedures are discounted.
Topics: Biological Assay; Computer Simulation; Data Interpretation, Statistical; Dose-Response Relationship, Drug; Guidelines as Topic; Humans
PubMed: 30112804
DOI: 10.1002/pst.1893 -
IEEE Transactions on Neural Networks... Jan 2022Anomaly detection suffers from unbalanced data since anomalies are quite rare. Synthetically generated anomalies are a solution to such ill or not fully defined data....
Anomaly detection suffers from unbalanced data since anomalies are quite rare. Synthetically generated anomalies are a solution to such ill or not fully defined data. However, synthesis requires an expressive representation to guarantee the quality of the generated data. In this article, we propose a two-level hierarchical latent space representation that distills inliers' feature descriptors [through autoencoders (AEs)] into more robust representations based on a variational family of distributions (through a variational AE) for zero-shot anomaly generation. From the learned latent distributions, we select those that lie on the outskirts of the training data as synthetic-outlier generators. Also, we synthesize from them, i.e., generate negative samples without seen them before, to train binary classifiers. We found that the use of the proposed hierarchical structure for feature distillation and fusion creates robust and general representations that allow us to synthesize pseudo outlier samples. Also, in turn, train robust binary classifiers for true outlier detection (without the need for actual outliers during training). We demonstrate the performance of our proposal on several benchmarks for anomaly detection.
PubMed: 33064654
DOI: 10.1109/TNNLS.2020.3027667 -
BMC Bioinformatics Aug 2015Multiple sequence alignments (MSA) are widely used in sequence analysis for a variety of tasks. Outlier sequences can make downstream analyses unreliable or make the...
BACKGROUND
Multiple sequence alignments (MSA) are widely used in sequence analysis for a variety of tasks. Outlier sequences can make downstream analyses unreliable or make the alignments less accurate while they are being constructed. This paper describes a simple method for automatically detecting outliers and accompanying software called OD-seq. It is based on finding sequences whose average distance to the rest of the sequences in a dataset, is anomalous.
RESULTS
The software can take a MSA, distance matrix or set of unaligned sequences as input. Outlier sequences are found by examining the average distance of each sequence to the rest. Anomalous average distances are then found using the interquartile range of the distribution of average distances or by bootstrapping them. The complexity of any analysis of a distance matrix is normally at least O(N(2)) for N sequences. This is prohibitive for large N but is reduced here by using the mBed algorithm from Clustal Omega. This reduces the complexity to O(N log(N)) which makes even very large alignments easy to analyse on a single core. We tested the ability of OD-seq to detect outliers using artificial test cases of sequences from Pfam families, seeded with sequences from other Pfam families. Using a MSA as input, OD-seq is able to detect outliers with very high sensitivity and specificity.
CONCLUSION
OD-seq is a practical and simple method to detect outliers in MSAs. It can also detect outliers in sets of unaligned sequences, but with reduced accuracy. For medium sized alignments, of a few thousand sequences, it can detect outliers in a few seconds. Software available as http://www.bioinf.ucd.ie/download/od-seq.tar.gz.
Topics: ATP-Binding Cassette Transporters; Algorithms; Amino Acid Sequence; Humans; Molecular Sequence Data; Sequence Alignment; Sequence Analysis, Protein; Sequence Homology, Amino Acid; Software
PubMed: 26303676
DOI: 10.1186/s12859-015-0702-1