outlier - OpenMD.com Journal Search

Robust ΔΔct estimate.

Genomics Jan 2021

The ΔΔct method estimates fold change in gene expression data from RT-PCR assay. The ΔΔct estimate aggregates replicates using mean and standard deviation (sd) and...

Summary PubMed Full Text

Authors: Arun Kumar, Daniel Lorand

The ΔΔct method estimates fold change in gene expression data from RT-PCR assay. The ΔΔct estimate aggregates replicates using mean and standard deviation (sd) and is not robust to outliers which are in practice often removed before the non-outlying replicates are aggregated. The alternative of using robust statistics such as median and median absolute deviation (MAD) to aggregate the replicates is not done in practice perhaps because the distribution of a robust ΔΔct estimate based on median and MAD is not straightforward to deduce. We introduce a robust ΔΔct estimate and deduce an approximate distribution for it. Simulations show that when data has outliers, the robust ΔΔct estimate compared to the non-robust ΔΔct estimate leads to significantly reduced confidence interval length and a coverage close to the nominal coverage. The analysis of an RT-PCR data from a Novartis clinical trial demonstrates benefit of a robust ΔΔct estimate.

Topics: Algorithms; Biomarkers, Tumor; Clinical Trials as Topic; Gene Expression Profiling; Humans; Real-Time Polymerase Chain Reaction; Reference Standards

PubMed: 33309766
DOI: 10.1016/j.ygeno.2020.12.009

Robust genetic interaction analysis.

Briefings in Bioinformatics Mar 2019

For the risk, progression, and response to treatment of many complex diseases, it has been increasingly recognized that genetic interactions (including gene-gene and... (Review)

Summary PubMed Full Text PDF

Review

Authors: Mengyun Wu, Shuangge Ma

For the risk, progression, and response to treatment of many complex diseases, it has been increasingly recognized that genetic interactions (including gene-gene and gene-environment interactions) play important roles beyond the main genetic and environmental effects. In practical genetic interaction analyses, model mis-specification and outliers/contaminations in response variables and covariates are not uncommon, and demand robust analysis methods. Compared with their nonrobust counterparts, robust genetic interaction analysis methods are significantly less popular but are gaining attention fast. In this article, we provide a comprehensive review of robust genetic interaction analysis methods, on their methodologies and applications, for both marginal and joint analysis, and for addressing model mis-specification as well as outliers/contaminations in response variables and covariates.

Topics: Epistasis, Genetic; Gene-Environment Interaction; Humans; Models, Genetic

PubMed: 29897421
DOI: 10.1093/bib/bby033

A survey on outlier explanations.

The VLDB Journal : Very Large Data... 2022

While many techniques for outlier detection have been proposed in the literature, the interpretation of detected outliers is often left to users. As a result, it is...

Summary PubMed Full Text PDF

Authors: Egawati Panjei, Le Gruenwald, Eleazar Leal...

While many techniques for outlier detection have been proposed in the literature, the interpretation of detected outliers is often left to users. As a result, it is difficult for users to promptly take appropriate actions concerning the detected outliers. To lessen this difficulty, when outliers are identified, they should be presented together with their explanations. There are survey papers on outlier detection, but none exists for outlier explanations. To fill this gap, in this paper, we present a survey on outlier explanations in which meaningful knowledge is mined from anomalous data to explain them. We define different types of outlier explanations and discuss the challenges in generating each type. We review the existing outlier explanation techniques and discuss how they address the challenges. We also discuss the applications of outlier explanations and review the existing methods used to evaluate outlier explanations. Furthermore, we discuss possible future research directions.

PubMed: 35095253
DOI: 10.1007/s00778-021-00721-1

Fluctuation-based outlier detection.

Scientific Reports Feb 2023

Outlier detection is an important topic in machine learning and has been used in a wide range of applications. Outliers are objects that are few in number and deviate...

Summary PubMed Full Text PDF

Authors: Xusheng Du, Enguang Zuo, Zheng Chu...

Outlier detection is an important topic in machine learning and has been used in a wide range of applications. Outliers are objects that are few in number and deviate from the majority of objects. As a result of these two properties, we show that outliers are susceptible to a mechanism called fluctuation. This article proposes a method called fluctuation-based outlier detection (FBOD) that achieves a low linear time complexity and detects outliers purely based on the concept of fluctuation without employing any distance, density or isolation measure. Fundamentally different from all existing methods. FBOD first converts the Euclidean structure datasets into graphs by using random links, then propagates the feature value according to the connection of the graph. Finally, by comparing the difference between the fluctuation of an object and its neighbors, FBOD determines the object with a larger difference as an outlier. The results of experiments comparing FBOD with eight state-of-the-art algorithms on eight real-worlds tabular datasets and three video datasets show that FBOD outperforms its competitors in the majority of cases and that FBOD has only 5% of the execution time of the fastest algorithm. The experiment codes are available at: https://github.com/FluctuationOD/Fluctuation-based-Outlier-Detection .

PubMed: 36765095
DOI: 10.1038/s41598-023-29549-1

Robust reduced-rank regression.

Biometrika Sep 2017

In high-dimensional multivariate regression problems, enforcing low rank in the coefficient matrix offers effective dimension reduction, which greatly facilitates...

Summary PubMed Full Text PDF

Authors: Y She, K Chen

In high-dimensional multivariate regression problems, enforcing low rank in the coefficient matrix offers effective dimension reduction, which greatly facilitates parameter estimation and model interpretation. However, commonly used reduced-rank methods are sensitive to data corruption, as the low-rank dependence structure between response variables and predictors is easily distorted by outliers. We propose a robust reduced-rank regression approach for joint modelling and outlier detection. The problem is formulated as a regularized multivariate regression with a sparse mean-shift parameterization, which generalizes and unifies some popular robust multivariate methods. An efficient thresholding-based iterative procedure is developed for optimization. We show that the algorithm is guaranteed to converge and that the coordinatewise minimum point produced is statistically accurate under regularity conditions. Our theoretical investigations focus on non-asymptotic robust analysis, demonstrating that joint rank reduction and outlier detection leads to improved prediction accuracy. In particular, we show that redescending [Formula: see text]-functions can essentially attain the minimax optimal error rate, and in some less challenging problems convex regularization guarantees the same low error rate. The performance of the proposed method is examined through simulation studies and real-data examples.

PubMed: 29430036
DOI: 10.1093/biomet/asx032

How the Outliers Influence the Quality of Clustering?

Entropy (Basel, Switzerland) Jun 2022

In this article, we evaluate the efficiency and performance of two clustering algorithms: AHC (Agglomerative Hierarchical Clustering) and K-Means. We are aware that...

Summary PubMed Full Text PDF

Authors: Agnieszka Nowak-Brzezińska, Igor Gaibei

In this article, we evaluate the efficiency and performance of two clustering algorithms: AHC (Agglomerative Hierarchical Clustering) and K-Means. We are aware that there are various linkage options and distance measures that influence the clustering results. We assess the quality of clustering using the Davies-Bouldin and Dunn cluster validity indexes. The main contribution of this research is to verify whether the quality of clusters without outliers is higher than those with outliers in the data. To do this, we compare and analyze outlier detection algorithms depending on the applied clustering algorithm. In our research, we use and compare the LOF (Local Outlier Factor) and COF (Connectivity-based Outlier Factor) algorithms for detecting outliers before and after removing 1%, 5%, and 10% of outliers. Next, we analyze how the quality of clustering has improved. In the experiments, three real data sets were used with a different number of instances.

PubMed: 35885141
DOI: 10.3390/e24070917

Evaluating Effect of Albendazole on Infection: A Systematic Review Article.

Iranian Journal of Parasitology 2016

The aim of the study was assessment of defaults and conducted meta-analysis of the efficacy of single-dose oral albendazole against infection. (Review)

Summary PubMed Full Text PDF

Review

Authors: Toraj Ahmadi Jouybari, Khadije Najaf Ghobadi, Bahare Lotfi...

BACKGROUND

The aim of the study was assessment of defaults and conducted meta-analysis of the efficacy of single-dose oral albendazole against infection.

METHODS

We searched PubMed, ISI Web of Science, Science Direct, the Cochrane Central Register of Controlled Trials, and WHO library databases between 1983 and 2014. Data from 13 clinical trial articles were used. Each article was included the effect of single oral dose (400 mg) albendazole and placebo in treating two groups of patients with infection. For both groups in each article, sample size, the number of those with infection, and the number of those recovered following the intake of albendazole were identified and recorded. The relative risk and variance were computed. Funnel plot, Beggs and Eggers tests were used for assessment of publication bias. The random effect variance shift outlier model and likelihood ratio test were applied for detecting outliers. In order to detect influence, DFFITS values, Cook's distances and COVRATIO were used. Data were analyzed using STATA and R software.

RESULTS

The article number 13 and 9 were outlier and influence, respectively. Outlier is diagnosed by variance shift of target study in inferential method and by RR value in graphical method. Funnel plot and Beggs test did not show the publication bias (=0.272). However, the Eggers test confirmed it (=0.034). Meta-analysis after removal of article 13 showed that relative risk was 1.99 (CI 95% 1.71 - 2.31).

CONCLUSION

The estimated RR and our meta-analyses show that treatment of with single oral doses of albendazole is unsatisfactory. New anthelminthics are urgently needed.

PubMed: 28127355
DOI: No ID Found

Prediction and outlier detection in classification problems.

Journal of the Royal Statistical... Apr 2022

We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called...

Summary PubMed Full Text PDF

Authors: Leying Guan, Robert Tibshirani

We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set () as a subset of class labels, possibly empty. It tries to optimize the out-of-sample performance, aiming to include the correct class and to detect outliers as often as possible. BCOPS returns no prediction (corresponding to () equal to the empty set) if it infers to be an outlier. The proposed method combines supervised learning algorithms with conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given procedure. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.

PubMed: 35910400
DOI: 10.1111/rssb.12443

Statistical Approaches to Candidate Biomarker Panel Selection.

Advances in Experimental Medicine and... 2016

The statistical analysis of robust biomarker candidates is a complex process, and is involved in several key steps in the overall biomarker development pipeline (see... (Review)

Summary PubMed Full Text PDF

Review

Authors: Heidi M Spratt, Hyunsu Ju

The statistical analysis of robust biomarker candidates is a complex process, and is involved in several key steps in the overall biomarker development pipeline (see Fig. 22.1, Chap. 19 ). Initially, data visualization (Sect. 22.1, below) is important to determine outliers and to get a feel for the nature of the data and whether there appear to be any differences among the groups being examined. From there, the data must be pre-processed (Sect. 22.2) so that outliers are handled, missing values are dealt with, and normality is assessed. Once the processed data has been cleaned and is ready for downstream analysis, hypothesis tests (Sect. 22.3) are performed, and proteins that are differentially expressed are identified. Since the number of differentially expressed proteins is usually larger than warrants further investigation (50+ proteins versus just a handful that will be considered for a biomarker panel), some sort of feature reduction (Sect. 22.4) should be performed to narrow the list of candidate biomarkers down to a more reasonable number. Once the list of proteins has been reduced to those that are likely most useful for downstream classification purposes, unsupervised or supervised learning is performed (Sects. 22.5 and 22.6, respectively).

Topics: Algorithms; Biomarkers; Computational Biology; Data Interpretation, Statistical; Data Mining; Databases, Protein; High-Throughput Screening Assays; Humans; Mass Spectrometry; Models, Statistical; Proteins; Proteome; Proteomics; Software

PubMed: 27975231
DOI: 10.1007/978-3-319-41448-5_22

Comparison of physics-based deformable registration methods for image-guided neurosurgery.

Frontiers in Digital Health 2023

This paper compares three finite element-based methods used in a physics-based non-rigid registration approach and reports on the progress made over the last 15 years.... (Review)

Summary PubMed Full Text PDF

Review

Authors: Nikos Chrisochoides, Yixun Liu, Fotis Drakopoulos...

This paper compares three finite element-based methods used in a physics-based non-rigid registration approach and reports on the progress made over the last 15 years. Large brain shifts caused by brain tumor removal affect registration accuracy by creating point and element outliers. A combination of approximation- and geometry-based point and element outlier rejection improves the rigid registration error by 2.5 mm and meets the real-time constraints (4 min). In addition, the paper raises several questions and presents two open problems for the robust estimation and improvement of registration error in the presence of outliers due to sparse, noisy, and incomplete data. It concludes with preliminary results on leveraging Quantum Computing, a promising new technology for computationally intensive problems like Feature Detection and Block Matching in addition to finite element solver; all three account for 75% of computing time in deformable registration.

PubMed: 38144260
DOI: 10.3389/fdgth.2023.1283726