-
BMC Genomics Jun 2016Genome-scale functional genomic screens across large cell line panels provide a rich resource for discovering tumor vulnerabilities that can lead to the next generation...
BACKGROUND
Genome-scale functional genomic screens across large cell line panels provide a rich resource for discovering tumor vulnerabilities that can lead to the next generation of targeted therapies. Their data analysis typically has focused on identifying genes whose knockdown enhances response in various pre-defined genetic contexts, which are limited by biological complexities as well as the incompleteness of our knowledge. We thus introduce a complementary data mining strategy to identify genes with exceptional sensitivity in subsets, or outlier groups, of cell lines, allowing an unbiased analysis without any a priori assumption about the underlying biology of dependency.
RESULTS
Genes with outlier features are strongly and specifically enriched with those known to be associated with cancer and relevant biological processes, despite no a priori knowledge being used to drive the analysis. Identification of exceptional responders (outliers) may not lead only to new candidates for therapeutic intervention, but also tumor indications and response biomarkers for companion precision medicine strategies. Several tumor suppressors have an outlier sensitivity pattern, supporting and generalizing the notion that tumor suppressors can play context-dependent oncogenic roles.
CONCLUSIONS
The novel application of outlier analysis described here demonstrates a systematic and data-driven analytical strategy to decipher large-scale functional genomic data for oncology target and precision medicine discoveries.
Topics: Biomarkers, Tumor; Cell Line, Tumor; Cell Transformation, Neoplastic; Computational Biology; Drug Discovery; Gene Expression Profiling; Genome, Human; Genomics; High-Throughput Nucleotide Sequencing; Humans; Molecular Targeted Therapy; Neoplasms; Precision Medicine; Signal Transduction
PubMed: 27296290
DOI: 10.1186/s12864-016-2807-y -
Frontiers in Psychology 2021In response time (RT) research, RT outliers are typically excluded from statistical analysis to improve the signal-to-noise ratio. Nevertheless, there exist several...
In response time (RT) research, RT outliers are typically excluded from statistical analysis to improve the signal-to-noise ratio. Nevertheless, there exist several methods for outlier exclusion. This poses the question, how these methods differ with respect to recovering the uncontaminated RT distribution. In the present simulation study, two RT distributions with a given population difference were simulated in each iteration. RTs were replaced by outliers following two different approaches. The first approach generated outliers at the tails of the distribution, the second one inserted outliers overlapping with the genuine RT distribution. We applied ten different outlier exclusion methods and tested, how many pairs of distributions significantly differed. Outlier exclusion methods were compared in terms of bias. Bias was defined as the deviation of the proportion of significant differences after outlier exclusion from the proportion of significant differences in the uncontaminated samples (before introducing outliers). Our results showed large differences in bias between the exclusion methods. Some methods showed a high rate of Type-I errors and should therefore clearly not be used. Overall, our results showed that applying an exclusion method based on z-scores / standard deviations introduced only small biases, while the absence of outlier exclusion showed the largest absolute bias.
PubMed: 34194371
DOI: 10.3389/fpsyg.2021.675558 -
Ecology and Evolution Apr 2017Investigating the extent (or the existence) of local adaptation is crucial to understanding how populations adapt. When experiments or fitness measurements are difficult...
Investigating the extent (or the existence) of local adaptation is crucial to understanding how populations adapt. When experiments or fitness measurements are difficult or impossible to perform in natural populations, genomic techniques allow us to investigate local adaptation through the comparison of allele frequencies and outlier loci along environmental clines. The thick-billed murre () is a highly philopatric colonial arctic seabird that occupies a significant environmental gradient, shows marked phenotypic differences among colonies, and has large effective population sizes. To test whether thick-billed murres from five colonies along the eastern Canadian Arctic coast show genomic signatures of local adaptation to their breeding grounds, we analyzed geographic variation in genome-wide markers mapped to a newly assembled thick-billed murre reference genome. We used outlier analyses to detect loci putatively under selection, and clustering analyses to investigate patterns of differentiation based on 2220 genomewide single nucleotide polymorphisms (SNPs) and 137 outlier SNPs. We found no evidence of population structure among colonies using all loci but found population structure based on outliers only, where birds from the two northernmost colonies (Minarets and Prince Leopold) grouped with birds from the southernmost colony (Gannet), and birds from Coats and Akpatok were distinct from all other colonies. Although results from our analyses did not support local adaptation along the latitudinal cline of breeding colonies, outlier loci grouped birds from different colonies according to their non-breeding distributions, suggesting that outliers may be informative about adaptation and/or demographic connectivity associated with their migration patterns or nonbreeding grounds.
PubMed: 28405300
DOI: 10.1002/ece3.2819 -
BMC Bioinformatics Jun 2020High throughput RNA sequencing is a powerful approach to study gene expression. Due to the complex multiple-steps protocols in data acquisition, extreme deviation of a...
BACKGROUND
High throughput RNA sequencing is a powerful approach to study gene expression. Due to the complex multiple-steps protocols in data acquisition, extreme deviation of a sample from samples of the same treatment group may occur due to technical variation or true biological differences. The high-dimensionality of the data with few biological replicates make it challenging to accurately detect those samples, and this issue is not well studied in the literature currently. Robust statistics is a family of theories and techniques aim to detect the outliers by first fitting the majority of the data and then flagging data points that deviate from it. Robust statistics have been widely used in multivariate data analysis for outlier detection in chemometrics and engineering. Here we apply robust statistics on RNA-seq data analysis.
RESULTS
We report the use of two robust principal component analysis (rPCA) methods, PcaHubert and PcaGrid, to detect outlier samples in multiple simulated and real biological RNA-seq data sets with positive control outlier samples. PcaGrid achieved 100% sensitivity and 100% specificity in all the tests using positive control outliers with varying degrees of divergence. We applied rPCA methods and classical principal component analysis (cPCA) on an RNA-Seq data set profiling gene expression of the external granule layer in the cerebellum of control and conditional SnoN knockout mice. Both rPCA methods detected the same two outlier samples but cPCA failed to detect any. We performed differentially expressed gene detection before and after outlier removal as well as with and without batch effect modeling. We validated gene expression changes using quantitative reverse transcription PCR and used the result as reference to compare the performance of eight different data analysis strategies. Removing outliers without batch effect modeling performed the best in term of detecting biologically relevant differentially expressed genes.
CONCLUSIONS
rPCA implemented in the PcaGrid function is an accurate and objective method to detect outlier samples. It is well suited for high-dimensional data with small sample sizes like RNA-seq data. Outlier removal can significantly improve the performance of differential gene detection and downstream functional analysis.
Topics: Animals; Cerebellum; Female; Male; Mice, Knockout; Principal Component Analysis; Proto-Oncogene Proteins; RNA-Seq; Reverse Transcriptase Polymerase Chain Reaction
PubMed: 32600248
DOI: 10.1186/s12859-020-03608-0 -
Mathematical Biosciences and... Nov 2020Measurement outliers are easily caused by illumination, surface texture, human factors and so on during the process of microscopic topography measurement. These numerous...
Measurement outliers are easily caused by illumination, surface texture, human factors and so on during the process of microscopic topography measurement. These numerous cloud point noise will heavily affect instrument measurement accuracy and surface reconstruction quality. We propose a quick and accurate method for removing outliers based on social circle algorithm. First, the gaussian kernel function is used to calculate the voting value to determine the social circle's initial point, and then select the appropriate social circle radius and search window based on the initial point, and finally expand the social circle through an iterative method. Points which are not in the social circle can be considered as outliers and filtered out. The experimental results show the good performance of the algorithm with comparison to the existing filtering methods. The developed method has great potential in microscopic topography reconstruction, fitting and other point cloud processing tasks.
Topics: Algorithms; Cloud Computing; Humans; Imaging, Three-Dimensional
PubMed: 33378937
DOI: 10.3934/mbe.2020413 -
IEEE Transactions on Cybernetics Oct 2021In this article, a new outlier-resistant recursive filtering problem (RF) is studied for a class of multisensor multirate networked systems under the weighted...
In this article, a new outlier-resistant recursive filtering problem (RF) is studied for a class of multisensor multirate networked systems under the weighted try-once-discard (WTOD) protocol. The sensors are sampled with a period that is different from the state updating period of the system. In order to lighten the communication burden and alleviate the network congestions, the WTOD protocol is implemented in the sensor-to-filter channel to schedule the order of the data transmission of the sensors. In the case of the measurement outliers, a saturation function is employed in the filter structure to constrain the innovations contaminated by the measurement outliers, thereby maintaining satisfactory filtering performance. By resorting to the solution to a matrix difference equation, an upper bound is first obtained on the covariance of the filtering error, and the gain matrix of the filter is then characterized to minimize the derived upper bound. Furthermore, the exponential boundedness of the filtering error dynamics is analyzed in the mean square sense. Finally, the usefulness of the proposed outlier-resistant RF scheme is verified by simulation examples.
PubMed: 33001816
DOI: 10.1109/TCYB.2020.3021194 -
Entropy (Basel, Switzerland) Jul 2023The present article is devoted to outlier detection in phases of human movement. The aim was to find the most efficient machine learning method to detect abnormal...
The present article is devoted to outlier detection in phases of human movement. The aim was to find the most efficient machine learning method to detect abnormal segments inside physical activities in which there is a probability of origin from other activities. The problem was reduced to a classification task. The new method is proposed based on a nested binary classifier. Test experiments were then conducted using several of the most popular machine learning algorithms (linear regression, support vector machine, -nearest neighbor, decision trees). Each method was separately tested on three datasets varying in characteristics and number of records. We set out to evaluate the effectiveness of the models, basic measures of classifier evaluation, and confusion matrices. The nested binary classifier was compared with deep neural networks. Our research shows that the method of nested binary classifiers can be considered an effective way of recognizing outlier patterns for HAR systems.
PubMed: 37628151
DOI: 10.3390/e25081121 -
International Journal of Clinical... Nov 2020This paper analyzes the potential outliers in the bioanalytical and clinical part of a bioequivalence study, the effect on bioequivalence decisions whether or not it is...
OBJECTIVE
This paper analyzes the potential outliers in the bioanalytical and clinical part of a bioequivalence study, the effect on bioequivalence decisions whether or not it is appropriate to eliminate them from the statistical evaluation of bioequivalence.
MATERIALS AND METHODS
The clinical part was a cross-over, two periods, two sequences bioequivalence study concerning two piroxicam formulations, on healthy subjects. A simulation study evaluated the influence of 10% errors on the percent bias of calculated concentrations from nominal ones.
RESULTS
In bioequivalence studies, it is not possible to distinguish between relevant types of outliers based only on statistical criteria. The "problem" is particularly acute when the omission of outliers leads to a bias in the decision concerning bioequivalence from rejection to acceptance. In such cases, there is the suspicion of subjective analysis and torture of data. The effect of analytical errors at high plasma levels was criticized for the calculated concentrations in the neighborhood of lower limit of quantification. Errors at low concentrations have a less significant effect. In the pharmacokinetic analysis, several types of outliers were shown: single points, curves, pairs of curves corresponding to the same subject, intrasubject ratios of areas under curves and maximum concentrations. These pharmacokinetic outliers could have had, at the same time, bioanalytical, physiological and physicochemical causes.
CONCLUSION
Considering the results, it was proposed the following algorithm in the analysis of outlier data and outlier subjects in bioequivalence studies: evaluation of the implications of the decision concerning elimination of outliers on the decision concerning bioequivalence; application of the statistic tests for detection of outliers data; evaluations from the point of view of physiological pharmacokinetics, final decision concerning elimination of outliers.
Topics: Algorithms; Cross-Over Studies; Humans; Piroxicam; Therapeutic Equivalency
PubMed: 32870154
DOI: 10.5414/CP203794 -
TAG. Theoretical and Applied Genetics.... Apr 2016We review and propose several methods for identifying possible outliers and evaluate their properties. The methods are applied to a genomic prediction program in hybrid...
We review and propose several methods for identifying possible outliers and evaluate their properties. The methods are applied to a genomic prediction program in hybrid rye. Many plant breeders use ANOVA-based software for routine analysis of field trials. These programs may offer specific in-built options for residual analysis that are lacking in current REML software. With the advance of molecular technologies, there is a need to switch to REML-based approaches, but without losing the good features of outlier detection methods that have proven useful in the past. Our aims were to compare the variance component estimates between ANOVA and REML approaches, to scrutinize the outlier detection method of the ANOVA-based package PlabStat and to propose and evaluate alternative procedures for outlier detection. We compared the outputs produced using ANOVA and REML approaches of four published datasets of generalized lattice designs. Five outlier detection methods are explained step by step. Their performance was evaluated by measuring the true positive rate and the false positive rate in a dataset with artificial outliers simulated in several scenarios. An implementation of genomic prediction using an empirical rye multi-environment trial was used to assess the outlier detection methods with respect to the predictive abilities of a mixed model for each method. We provide a detailed explanation of how the PlabStat outlier detection methodology can be translated to REML-based software together with the evaluation of alternative methods to identify outliers. The method combining the Bonferroni-Holm test to judge each residual and the residual standardization strategy of PlabStat exhibited good ability to detect outliers in small and large datasets and under a genomic prediction application. We recommend the use of outlier detection methods as a decision support in the routine data analyses of plant breeding experiments.
Topics: Analysis of Variance; Genomics; Likelihood Functions; Models, Genetic; Models, Statistical; Plant Breeding; Secale; Software
PubMed: 26883044
DOI: 10.1007/s00122-016-2666-6 -
Clinical Chemistry and Laboratory... Aug 2018Definition and elimination of outliers is a key element for medical laboratories establishing or verifying reference intervals (RIs). Especially as inclusion of just a...
BACKGROUND
Definition and elimination of outliers is a key element for medical laboratories establishing or verifying reference intervals (RIs). Especially as inclusion of just a few outlying observations may seriously affect the determination of the reference limits. Many methods have been developed for definition of outliers. Several of these methods are developed for the normal distribution and often data require transformation before outlier elimination.
METHODS
We have developed a non-parametric transformation independent outlier definition. The new method relies on drawing reproducible histograms. This is done by using defined bin sizes above and below the median. The method is compared to the method recommended by CLSI/IFCC, which uses Box-Cox transformation (BCT) and Tukey's fences for outlier definition. The comparison is done on eight simulated distributions and an indirect clinical datasets.
RESULTS
The comparison on simulated distributions shows that without outliers added the recommended method in general defines fewer outliers. However, when outliers are added on one side the proposed method often produces better results. With outliers on both sides the methods are equally good. Furthermore, it is found that the presence of outliers affects the BCT, and subsequently affects the determined limits of current recommended methods. This is especially seen in skewed distributions. The proposed outlier definition reproduced current RI limits on clinical data containing outliers.
CONCLUSIONS
We find our simple transformation independent outlier detection method as good as or better than the currently recommended methods.
Topics: Adult; Blood Chemical Analysis; Female; Humans; Laboratories, Hospital; Male; Reference Values; Statistics, Nonparametric
PubMed: 29634477
DOI: 10.1515/cclm-2018-0025