-
The British Journal of Mathematical and... Nov 2016Let r and r be two dependent estimates of Pearson's correlation. There is a substantial literature on testing H : ρ = ρ , the hypothesis that the population...
Let r and r be two dependent estimates of Pearson's correlation. There is a substantial literature on testing H : ρ = ρ , the hypothesis that the population correlation coefficients are equal. However, it is well known that Pearson's correlation is not robust. Even a single outlier can have a substantial impact on Pearson's correlation, resulting in a misleading understanding about the strength of the association among the bulk of the points. A way of mitigating this concern is to use a correlation coefficient that guards against outliers, many of which have been proposed. But apparently there are no results on how to compare dependent robust correlation coefficients when there is heteroscedasicity. Extant results suggest that a basic percentile bootstrap will perform reasonably well. This paper reports simulation results indicating the extent to which this is true when using Spearman's rho, a Winsorized correlation or a skipped correlation.
Topics: Algorithms; Computer Simulation; Data Interpretation, Statistical; Linear Models; Regression Analysis; Statistical Distributions; Statistics as Topic; Statistics, Nonparametric
PubMed: 27114391
DOI: 10.1111/bmsp.12069 -
American Journal of Orthodontics and... Feb 2016
Topics: Dental Research; Factor Analysis, Statistical; Humans; Linear Models; Statistical Distributions; Statistics as Topic
PubMed: 27228579
DOI: 10.1016/j.ajodo.2015.11.010 -
IEEE Transactions on Medical Imaging Apr 2023Anomaly detection in fundus images remains challenging due to the fact that fundus images often contain diverse types of lesions with various properties in locations,...
Anomaly detection in fundus images remains challenging due to the fact that fundus images often contain diverse types of lesions with various properties in locations, sizes, shapes, and colors. Current methods achieve anomaly detection mainly through reconstructing or separating the fundus image background from a fundus image under the guidance of a set of normal fundus images. The reconstruction methods, however, ignore the constraint from lesions. The separation methods primarily model the diverse lesions with pixel-based independent and identical distributed (i.i.d.) properties, neglecting the individualized variations of different types of lesions and their structural properties. And hence, these methods may have difficulty to well distinguish lesions from fundus image backgrounds especially with the normal personalized variations (NPV). To address these challenges, we propose a patch-based non-i.i.d. mixture of Gaussian (MoG) to model diverse lesions for adapting to their statistical distribution variations in different fundus images and their patch-like structural properties. Further, we particularly introduce the weighted Schatten p-norm as the metric of low-rank decomposition for enhancing the accuracy of the learned fundus image backgrounds and reducing false-positives caused by NPV. With the individualized modeling of the diverse lesions and the background learning, fundus image backgrounds and NPV are finely learned and subsequently distinguished from diverse lesions, to ultimately improve the anomaly detection. The proposed method is evaluated on two real-world databases and one artificial database, outperforming the state-of-the-art methods.
Topics: Fundus Oculi; Normal Distribution; Databases, Factual
PubMed: 36446017
DOI: 10.1109/TMI.2022.3225422 -
PloS One 2020Birth defects are prenatal morphological or functional anomalies. Associations among them are studied to identify their etiopathogenesis. The graph theory methods allow...
Birth defects are prenatal morphological or functional anomalies. Associations among them are studied to identify their etiopathogenesis. The graph theory methods allow analyzing relationships among a complete set of anomalies. A graph consists of nodes which represent the entities (birth defects in the present work), and edges that join nodes indicating the relationships among them. The aim of the present study was to validate the graph theory methods to study birth defect associations. All birth defects monitoring records from the Estudio Colaborativo Latino Americano de Malformaciones Congénitas gathered between 1967 and 2017 were used. From around 5 million live and stillborn infants, 170,430 had one or more birth defects. Volume-adjusted Chi-Square was used to determine the association strength between two birth defects and to weight the graph edges. The complete birth defect graph showed a Log-Normal degree distribution and its characteristics differed from random, scale-free and small-world graphs. The graph comprised 118 nodes and 550 edges. Birth defects with the highest centrality values were nonspecific codes such as Other upper limb anomalies. After partition, the graph yielded 12 groups; most of them were recognizable and included conditions such as VATER and OEIS associations, and Patau syndrome. Our findings validate the graph theory methods to study birth defect associations. This method may contribute to identify underlying etiopathogeneses as well as to improve coding systems.
Topics: Congenital Abnormalities; Data Science; Databases, Factual; Humans; Infant, Newborn; Statistical Distributions
PubMed: 32442191
DOI: 10.1371/journal.pone.0233529 -
NeuroImage Oct 2014In neuroscience, as in many other fields of science and engineering, it is crucial to assess the causal interactions among multivariate time series. Granger causality...
In neuroscience, as in many other fields of science and engineering, it is crucial to assess the causal interactions among multivariate time series. Granger causality has been increasingly used to identify causal influence between time series based on multivariate autoregressive models. Such an approach is based on linear regression framework with implicit Gaussian assumption of model noise residuals having constant variance. As a consequence, this measure cannot detect the cause-effect relationship in high-order moments and nonlinear causality. Here, we propose an effective model-free, copula-based Granger causality measure that can be used to reveal nonlinear and high-order moment causality. We first formulate Granger causality as the log-likelihood ratio in terms of conditional distribution, and then derive an efficient estimation procedure using conditional copula. We use resampling techniques to build a baseline null-hypothesis distribution from which statistical significance can be derived. We perform a series of simulations to investigate the performance of our copula-based Granger causality, and compare its performance against other state-of-the-art techniques. Our method is finally applied to neural field potential time series recorded from visual cortex of a monkey while performing a visual illusion task.
Topics: Animals; Computer Simulation; Data Interpretation, Statistical; Evoked Potentials, Visual; Illusions; Macaca; Models, Statistical; Neurosciences; Statistical Distributions; Visual Cortex
PubMed: 24945669
DOI: 10.1016/j.neuroimage.2014.06.013 -
Bioinformatics (Oxford, England) Oct 2019Under two biologically different conditions, we are often interested in identifying differentially expressed genes. It is usually the case that the assumption of equal...
MOTIVATION
Under two biologically different conditions, we are often interested in identifying differentially expressed genes. It is usually the case that the assumption of equal variances on the two groups is violated for many genes where a large number of them are required to be filtered or ranked. In these cases, exact tests are unavailable and the Welch's approximate test is most reliable one. The Welch's test involves two layers of approximations: approximating the distribution of the statistic by a t-distribution, which in turn depends on approximate degrees of freedom. This study attempts to improve upon Welch's approximate test by avoiding one layer of approximation.
RESULTS
We introduce a new distribution that generalizes the t-distribution and propose a Monte Carlo based test that uses only one layer of approximation for statistical inferences. Experimental results based on extensive simulation studies show that the Monte Carol based tests enhance the statistical power and performs better than Welch's t-approximation, especially when the equal variance assumption is not met and the sample size of the sample with a larger variance is smaller. We analyzed two gene-expression datasets, namely the childhood acute lymphoblastic leukemia gene-expression dataset with 22 283 genes and Golden Spike dataset produced by a controlled experiment with 13 966 genes. The new test identified additional genes of interest in both datasets. Some of these genes have been proven to play important roles in medical literature.
AVAILABILITY AND IMPLEMENTATION
R scripts and the R package mcBFtest is available in CRAN and to reproduce all reported results are available at the GitHub repository, https://github.com/iullah1980/MCTcodes.
SUPPLEMENTARY INFORMATION
Supplementary data is available at Bioinformatics online.
Topics: Biometry; Gene Expression; Monte Carlo Method; Sample Size; Statistical Distributions
PubMed: 30874796
DOI: 10.1093/bioinformatics/btz189 -
CPT: Pharmacometrics & Systems... May 2020Pharmacometric models using lognormal distributions have become commonplace in pharmacokinetic-pharmacodynamic investigations. The extent to which it can be interpreted...
Pharmacometric models using lognormal distributions have become commonplace in pharmacokinetic-pharmacodynamic investigations. The extent to which it can be interpreted by traditional description of variability through the normal distribution remains elusive. In this tutorial, the comparison is made using formal approximation methods. The quality of the resulting approximation was assessed by the similarity of prediction intervals (PIs) to true values, illustrated using 80% PIs. Approximated PIs were close to true values when lognormal standard deviation (omega) was smaller than about 0.25, depending mostly on the desired precision. With increasing omega values, the precision of approximation worsens and starts to deteriorate at omega values of about 1. With such high omega values, there is no resemblance between the lognormal and normal distribution anymore. To support dissemination and interpretation of these nonlinear properties, some additional statistics are discussed in the context of the three regions of behavior of the lognormal distribution.
Topics: Humans; Models, Biological; Models, Statistical; Normal Distribution; Pharmacokinetics
PubMed: 32198841
DOI: 10.1002/psp4.12507 -
Transfusion Oct 2017
Topics: Benzylamines; Blood Cell Count; Cyclams; Hematopoietic Stem Cell Mobilization; Hematopoietic Stem Cells; Heterocyclic Compounds; Humans; Peripheral Blood Stem Cell Transplantation; Statistical Distributions
PubMed: 28815620
DOI: 10.1111/trf.14277 -
PloS One 2021This paper studies the distribution of the firm size for the Colombian economy showing evidence against the Gibrat's law, which assumes a stable lognormal distribution....
This paper studies the distribution of the firm size for the Colombian economy showing evidence against the Gibrat's law, which assumes a stable lognormal distribution. On the contrary, we propose a lognormal expansion that captures deviations from the lognormal distribution with additional terms that allow a better fit at the upper distribution tail, which is overestimated according to the lognormal distribution. As a consequence, concentration indexes should be addressed consistently with the lognormal expansion. Through a dynamic panel data approach, we also show that firm growth is persistent and highly dependent on firm characteristics, including size, age, and leverage -these results neglect Gibrat's law for the Colombian case.
Topics: Statistical Distributions
PubMed: 34242353
DOI: 10.1371/journal.pone.0254487 -
Biostatistics (Oxford, England) Jul 2020We propose a novel model for hierarchical time-to-event data, for example, healthcare data in which patients are grouped by their healthcare provider. The most common...
We propose a novel model for hierarchical time-to-event data, for example, healthcare data in which patients are grouped by their healthcare provider. The most common model for this kind of data is the Cox proportional hazard model, with frailties that are common to patients in the same group and given a parametric distribution. We relax the parametric frailty assumption in this class of models by using a non-parametric discrete distribution. This improves the flexibility of the model by allowing very general frailty distributions and enables the data to be clustered into groups of healthcare providers with a similar frailty. A tailored Expectation-Maximization algorithm is proposed for estimating the model parameters, methods of model selection are compared, and the code is assessed in simulation studies. This model is particularly useful for administrative data in which there are a limited number of covariates available to explain the heterogeneity associated with the risk of the event. We apply the model to a clinical administrative database recording times to hospital readmission, and related covariates, for patients previously admitted once to hospital for heart failure, and we explore latent clustering structures among healthcare providers.
Topics: Algorithms; Cluster Analysis; Computer Simulation; Health Personnel; Humans; Patient Admission; Proportional Hazards Models; Statistical Distributions; Statistics, Nonparametric; Time Factors; Time-to-Treatment
PubMed: 30590499
DOI: 10.1093/biostatistics/kxy071