-
Veterinary Clinical Pathology Sep 2021Inaccuracy in estimating reference intervals (RIs) is a problem with small sample sizes.
BACKGROUND
Inaccuracy in estimating reference intervals (RIs) is a problem with small sample sizes.
OBJECTIVES
This study aimed to identify the most accurate statistical methods to estimate RIs based on sample size and population distribution shape. We also studied the accuracy of sample frequency distribution histograms to retrieve the original population distribution and compared strategies based on the histogram and goodness-of-fit test.
METHODS
The statistical methods that best enhanced accuracy were determined for various sample sizes (n = 20-60) and population distributions (Gaussian, log-normal, and left-skewed) were determined by repeated-measures ANOVA and posthoc analyses. Frequency distribution histograms were built from 900 samples of five different sizes randomly extracted from six simulated populations. Three reviewers classified the population distributions from visual assessments of a sample histogram, and the classification error rate was calculated. RI accuracy was compared among the strategies based on the histograms and goodness-of-fit tests.
RESULTS
The parametric, nonparametric, and robust methods enhanced lower reference limit estimation accuracy for Gaussian, log-normal, and left-skewed distributions, respectively. The parametric, nonparametric bootstrap, and nonparametric methods enhanced the upper limit estimation accuracy for Gaussian, log-normal, and left-skewed distributions, respectively. Regardless of sample size, sample histogram assessments properly classified the original population distribution 71% to 93.9% of the time, depending on the reviewers. In this study, the strategy based on histograms assessed by the statistician was significantly more precise and accurate than the strategy based on the goodness-of-fit test (P < 0.001).
CONCLUSIONS
A strategy based on histograms might enhance the accuracy of RI estimations. However, relevant inter-reviewer variations in histogram interpretation were detected. Factors affecting inter-reviewer variations should be further explored.
Topics: Animals; Computer Simulation; Computers; Normal Distribution; Reference Values; Sample Size
PubMed: 34476826
DOI: 10.1111/vcp.13000 -
Psychonomic Bulletin & Review Jun 2017For scatterplots with gaussian distributions of dots, the perception of Pearson correlation r can be described by two simple laws: a linear one for discrimination, and a...
For scatterplots with gaussian distributions of dots, the perception of Pearson correlation r can be described by two simple laws: a linear one for discrimination, and a logarithmic one for perceived magnitude (Rensink & Baldridge, 2010). The underlying perceptual mechanisms, however, remain poorly understood. To cast light on these, four different distributions of datapoints were examined. The first had 100 points with equal variance in both dimensions. Consistent with earlier results, just noticeable difference (JND) was a linear function of the distance away from r = 1, and the magnitude of perceived correlation a logarithmic function of this quantity. In addition, these laws were linked, with the intercept of the JND line being the inverse of the bias in perceived magnitude. Three other conditions were also examined: a dot cloud with 25 points, a horizontal compression of the cloud, and a cloud with a uniform distribution of dots. Performance was found to be similar in all conditions. The generality and form of these laws suggest that what underlies correlation perception is not a geometric structure such as the shape of the dot cloud, but the shape of the probability distribution of the dots, likely inferred via a form of ensemble coding. It is suggested that this reflects the ability of observers to perceive the information entropy in an image, with this quantity used as a proxy for Pearson correlation.
Topics: Data Display; Form Perception; Humans; Mathematics; Perception; Statistical Distributions; Visual Perception
PubMed: 27785683
DOI: 10.3758/s13423-016-1174-7 -
PLoS Computational Biology Feb 2024Outbreaks of emerging and zoonotic infections represent a substantial threat to human health and well-being. These outbreaks tend to be characterised by highly...
Outbreaks of emerging and zoonotic infections represent a substantial threat to human health and well-being. These outbreaks tend to be characterised by highly stochastic transmission dynamics with intense variation in transmission potential between cases. The negative binomial distribution is commonly used as a model for transmission in the early stages of an epidemic as it has a natural interpretation as the convolution of a Poisson contact process and a gamma-distributed infectivity. In this study we expand upon the negative binomial model by introducing a beta-Poisson mixture model in which infectious individuals make contacts at the points of a Poisson process and then transmit infection along these contacts with a beta-distributed probability. We show that the negative binomial distribution is a limit case of this model, as is the zero-inflated Poisson distribution obtained by combining a Poisson-distributed contact process with an additional failure probability. We assess the beta-Poisson model's applicability by fitting it to secondary case distributions (the distribution of the number of subsequent cases generated by a single case) estimated from outbreaks covering a range of pathogens and geographical settings. We find that while the beta-Poisson mixture can achieve a closer to fit to data than the negative binomial distribution, it is consistently outperformed by the negative binomial in terms of Akaike Information Criterion, making it a suboptimal choice on parsimonious grounds. The beta-Poisson performs similarly to the negative binomial model in its ability to capture features of the secondary case distribution such as overdispersion, prevalence of superspreaders, and the probability of a case generating zero subsequent cases. Despite this possible shortcoming, the beta-Poisson distribution may still be of interest in the context of intervention modelling since its structure allows for the simulation of measures which change contact structures while leaving individual-level infectivity unchanged, and vice-versa.
Topics: Humans; Models, Statistical; Computer Simulation; Poisson Distribution; Binomial Distribution; Disease Outbreaks
PubMed: 38330050
DOI: 10.1371/journal.pcbi.1011856 -
BMC Medical Research Methodology Jan 2022We consider cluster size data of SARS-CoV-2 transmissions for a number of different settings from recently published data. The statistical characteristics of...
BACKGROUND
We consider cluster size data of SARS-CoV-2 transmissions for a number of different settings from recently published data. The statistical characteristics of superspreading events are commonly described by fitting a negative binomial distribution to secondary infection and cluster size data as an alternative to the Poisson distribution as it is a longer tailed distribution, with emphasis given to the value of the extra parameter which allows the variance to be greater than the mean. Here we investigate whether other long tailed distributions from more general extended Poisson process modelling can better describe the distribution of cluster sizes for SARS-CoV-2 transmissions.
METHODS
We use the extended Poisson process modelling (EPPM) approach with nested sets of models that include the Poisson and negative binomial distributions to assess the adequacy of models based on these standard distributions for the data considered.
RESULTS
We confirm the inadequacy of the Poisson distribution in most cases, and demonstrate the inadequacy of the negative binomial distribution in some cases.
CONCLUSIONS
The probability of a superspreading event may be underestimated by use of the negative binomial distribution as much larger tail probabilities are indicated by EPPM distributions than negative binomial alternatives. We show that the large shared accommodation, meal and work settings, of the settings considered, have the potential for more severe superspreading events than would be predicted by a negative binomial distribution. Therefore public health efforts to prevent transmission in such settings should be prioritised.
Topics: Binomial Distribution; COVID-19; Humans; Pandemics; Poisson Distribution; SARS-CoV-2
PubMed: 35094680
DOI: 10.1186/s12874-022-01517-9 -
Scientific Reports Dec 2017A taxonomy is a standardized framework to classify and organize items into categories. Hierarchical taxonomies are ubiquitous, ranging from the classification of...
A taxonomy is a standardized framework to classify and organize items into categories. Hierarchical taxonomies are ubiquitous, ranging from the classification of organisms to the file system on a computer. Characterizing the typical distribution of items within taxonomic categories is an important question with applications in many disciplines. Ecologists have long sought to account for the patterns observed in species-abundance distributions (the number of individuals per species found in some sample), and computer scientists study the distribution of files per directory. Is there a universal statistical distribution describing how many items are typically found in each category in large taxonomies? Here, we analyze a wide array of large, real-world datasets - including items lost and found on the New York City transit system, library books, and a bacterial microbiome - and discover such an underlying commonality. A simple, non-parametric branching model that randomly categorizes items and takes as input only the total number of items and the total number of categories is quite successful in reproducing the observed abundance distributions. This result may shed light on patterns in species-abundance distributions long observed in ecology. The model also predicts the number of taxonomic categories that remain unrepresented in a finite sample.
Topics: Databases, Factual; Models, Biological; Statistical Distributions
PubMed: 29213056
DOI: 10.1038/s41598-017-17168-6 -
PloS One 2023In this work, a new flexible class, called the type-I extended-F family, is proposed. A special sub-model of the proposed class, called type-I extended-Weibull (TIEx-W)...
In this work, a new flexible class, called the type-I extended-F family, is proposed. A special sub-model of the proposed class, called type-I extended-Weibull (TIEx-W) distribution, is explored in detail. Basic properties of the TIEx-W distribution are provided. The parameters of the TIEx-W distribution are obtained by eight classical methods of estimation. The performance of these estimators is explored using Monte Carlo simulation results for small and large samples. Besides, the Bayesian estimation of the model parameters under different loss functions for the real data set is also provided. The importance and flexibility of the TIEx-W model are illustrated by analyzing an insurance data. The real-life insurance data illustrates that the TIEx-W distribution provides better fit as compared to competing models such as Lindley-Weibull, exponentiated Weibull, Kumaraswamy-Weibull, α logarithmic transformed Weibull, and beta Weibull distributions, among others.
Topics: Likelihood Functions; Bayes Theorem; Computer Simulation; Statistical Distributions; Monte Carlo Method
PubMed: 36730300
DOI: 10.1371/journal.pone.0275430 -
Sensors (Basel, Switzerland) Dec 2020Positioning systems are used to determine position coordinates in navigation (air, land and marine). The accuracy of an object's position is described by the position...
Consistency of the Empirical Distributions of Navigation Positioning System Errors with Theoretical Distributions-Comparative Analysis of the DGPS and EGNOS Systems in the Years 2006 and 2014.
Positioning systems are used to determine position coordinates in navigation (air, land and marine). The accuracy of an object's position is described by the position error and a statistical analysis can determine its measures, which usually include: Root Mean Square (RMS), twice the Distance Root Mean Square (2DRMS), Circular Error Probable (CEP) and Spherical Probable Error (SEP). It is commonly assumed in navigation that position errors are random and that their distribution are consistent with the normal distribution. This assumption is based on the popularity of the Gauss distribution in science, the simplicity of calculating RMS values for 68% and 95% probabilities, as well as the intuitive perception of randomness in the statistics which this distribution reflects. It should be noted, however, that the necessary conditions for a random variable to be normally distributed include the independence of measurements and identical conditions of their realisation, which is not the case in the iterative method of determining successive positions, the filtration of coordinates or the dependence of the position error on meteorological conditions. In the preface to this publication, examples are provided which indicate that position errors in some navigation systems may not be consistent with the normal distribution. The subsequent section describes basic statistical tests for assessing the fit between the empirical and theoretical distributions (Anderson-Darling, chi-square and Kolmogorov-Smirnov). Next, statistical tests of the position error distributions of very long Differential Global Positioning System (DGPS) and European Geostationary Navigation Overlay Service (EGNOS) campaigns from different years (2006 and 2014) were performed with the number of measurements per session being 900'000 fixes. In addition, the paper discusses selected statistical distributions that fit the empirical measurement results better than the normal distribution. Research has shown that normal distribution is not the optimal statistical distribution to describe position errors of navigation systems. The distributions that describe navigation positioning system errors more accurately include: beta, gamma, logistic and lognormal distributions.
PubMed: 33374776
DOI: 10.3390/s21010031 -
Mathematical Biosciences and... Jun 2023The present study is based on the derivation of a new extension of the Poisson distribution using the Ramos-Louzada distribution. Several statistical properties of the...
The present study is based on the derivation of a new extension of the Poisson distribution using the Ramos-Louzada distribution. Several statistical properties of the new distribution are derived including, factorial moments, moment-generating function, probability moments, skewness, kurtosis, and dispersion index. Some reliability properties are also derived. The model parameter is estimated using different classical estimation techniques. A comprehensive simulation study was used to identify the best estimation method. Bayesian estimation with a gamma prior is also utilized to estimate the parameter. Three examples were used to demonstrate the utility of the proposed model. These applications revealed that the PRL-based model outperforms certain existing competing one-parameter discrete models such as the discrete Rayleigh, Poisson, discrete inverted Topp-Leone, discrete Pareto and discrete Burr-Hatke distributions.
Topics: Bayes Theorem; Poisson Distribution; COVID-19; Humans; Computer Simulation; Models, Theoretical
PubMed: 37679125
DOI: 10.3934/mbe.2023628 -
PeerJ 2022The gamma distribution is commonly used to model environmental data. However, rainfall data often contain zero observations, which violates the assumption that all...
The gamma distribution is commonly used to model environmental data. However, rainfall data often contain zero observations, which violates the assumption that all observations must be positive in a gamma distribution, and so a gamma model with excess zeros treated as a binary random variable is required. Rainfall dispersion is important and interesting, the confidence intervals for the variance of a gamma distribution with excess zeros help to examine rainfall intensity, which may be high or low risk. Herein, we propose confidence intervals for the variance of a gamma distribution with excess zeros by using fiducial quantities and parametric bootstrapping, as well as Bayesian credible intervals and highest posterior density intervals based on the Jeffreys', uniform, or normal-gamma-beta prior. The performances of the proposed confidence interval were evaluated by establishing their coverage probabilities and average lengths via Monte Carlo simulations. The fiducial quantity confidence interval performed the best for a small probability of the sample containing zero observations () whereas the Bayesian credible interval based on the normal-gamma-beta prior performed the best for large . Rainfall data from the Kiew Lom Dam in Lampang province, Thailand, are used to illustrate the efficacies of the proposed methods in practice.
Topics: Bayes Theorem; Thailand; Probability; Statistical Distributions; Risk
PubMed: 36132216
DOI: 10.7717/peerj.14023 -
Ecology Jul 2019Reproduction by individuals is typically recorded as count data (e.g., number of fledglings from a nest or inflorescences on a plant) and commonly modeled using Poisson...
Reproduction by individuals is typically recorded as count data (e.g., number of fledglings from a nest or inflorescences on a plant) and commonly modeled using Poisson or negative binomial distributions, which assume that variance is greater than or equal to the mean. However, distributions of reproductive effort are often underdispersed (i.e., variance < mean). When used in hypothesis tests, models that ignore underdispersion will be overly conservative and may fail to detect significant patterns. Here we show that generalized Poisson (GP) and Conway-Maxwell-Poisson (CMP) distributions are better choices for modeling reproductive effort because they can handle both overdispersion and underdispersion; we provide examples of how ecologists can use GP and CMP distributions in generalized linear models (GLMs) and generalized linear mixed models (GLMMs) to quantify patterns in reproduction. Using a new R package, glmmTMB, we construct GLMMs to investigate how rainfall and population density influence the number of fledglings in the warbler Oreothlypis celata and how flowering rate of Heliconia acuminata differs between fragmented and continuous forest. We also demonstrate how to deal with zero-inflation, which occurs when there are more zeros than expected in the distribution, e.g., due to complete reproductive failure by some individuals.
Topics: Animals; Linear Models; Longitudinal Studies; Models, Statistical; Poisson Distribution; Reproduction
PubMed: 30916779
DOI: 10.1002/ecy.2706