-
Psychonomic Bulletin & Review Aug 2021Researchers sometimes use informal judgment for statistical model diagnostics and assumption checking. Informal judgment might seem more desirable than formal judgment... (Review)
Review
Researchers sometimes use informal judgment for statistical model diagnostics and assumption checking. Informal judgment might seem more desirable than formal judgment because of a paradox: Formal hypothesis tests of assumptions appear to become less useful as sample size increases. We suggest that this paradox can be resolved by evaluating both formal and informal statistical judgment via a simplified signal detection framework. In 4 studies, we used this approach to compare informal judgments of normality diagnostic graphs (histograms, Q-Q plots, and P-P plots) to the performance of several formal tests (Shapiro-Wilk test, Kolmogorov-Smirnov test, etc.). Participants judged whether or not graphs of sample data came from a normal population (Experiments 1-2) or whether or not from a population close enough to normal for a parametric test to be more powerful than a nonparametric one (Experiments 3-4). Across all experiments, participants' informal judgments showed lower discriminability than did formal hypothesis tests. This pattern occurred even after participants were given 400 training trials with feedback, a financial incentive, and ecologically valid distribution shapes. The discriminability advantage of formal normality tests led to slightly more powerful follow-up tests (parametric vs. nonparametric). Overall, the framework used here suggests that formal model diagnostics may be more desirable than informal ones.
Topics: Humans; Judgment; Models, Statistical; Normal Distribution; Sample Size
PubMed: 33660213
DOI: 10.3758/s13423-021-01879-z -
General two-parameter distribution: Statistical properties, estimation, and application on COVID-19.PloS One 2023In this paper, we introduced a novel general two-parameter statistical distribution which can be presented as a mix of both exponential and gamma distributions. Some...
In this paper, we introduced a novel general two-parameter statistical distribution which can be presented as a mix of both exponential and gamma distributions. Some statistical properties of the general model were derived mathematically. Many estimation methods studied the estimation of the proposed model parameters. A new statistical model was presented as a particular case of the general two-parameter model, which is used to study the performance of the different estimation methods with the randomly generated data sets. Finally, the COVID-19 data set was used to show the superiority of the particular case for fitting real-world data sets over other compared well-known models.
Topics: Humans; COVID-19; Models, Statistical; Statistical Distributions
PubMed: 36753497
DOI: 10.1371/journal.pone.0281474 -
American Journal of Epidemiology Mar 2020Random-effects meta-analysis is one of the mainstream methods for research synthesis. The heterogeneity in meta-analyses is usually assumed to follow a normal...
Random-effects meta-analysis is one of the mainstream methods for research synthesis. The heterogeneity in meta-analyses is usually assumed to follow a normal distribution. This is actually a strong assumption, but one that often receives little attention and is used without justification. Although methods for assessing the normality assumption are readily available, they cannot be used directly because the included studies have different within-study standard errors. Here we present a standardization framework for evaluation of the normality assumption and examine its performance in random-effects meta-analyses with simulation studies and real examples. We use both a formal statistical test and a quantile-quantile plot for visualization. Simulation studies show that our normality test has well-controlled type I error rates and reasonable power. We also illustrate the real-world significance of examining the normality assumption with examples. Investigating the normality assumption can provide valuable information for further analysis or clinical application. We recommend routine examination of the normality assumption with the proposed framework in future meta-analyses.
Topics: Meta-Analysis as Topic; Normal Distribution
PubMed: 31781756
DOI: 10.1093/aje/kwz261 -
Journal of Neurophysiology Apr 2021Extracellular recordings of brain voltage signals have many uses, including the identification of spikes and the characterization of brain states via analysis of local...
Extracellular recordings of brain voltage signals have many uses, including the identification of spikes and the characterization of brain states via analysis of local field potential (LFP) or EEG recordings. Though the factors underlying the generation of these signals are time varying and complex, their analysis may be facilitated by an understanding of their statistical properties. To this end, we analyzed the voltage distributions of high-pass extracellular recordings from a variety of structures, including cortex, thalamus, and hippocampus, in monkeys, cats, and rodents. We additionally investigated LFP signals in these recordings as well as human EEG signals obtained during different sleep stages. In all cases, the distributions were accurately described by a Gaussian within ±1.5 standard deviations from zero. Outside these limits, voltages tended to be distributed exponentially, that is, they fell off linearly on log-linear frequency plots, with variable heights and slopes. A possible explanation for this is that sporadically and independently occurring events with individual Gaussian size distributions can sum to produce approximately exponential distributions. For the high-pass recordings, a second explanation results from a model of the noisy behavior of ion channels that produce action potentials via Hodgkin-Huxley kinetics. The distributions produced by this model, relative to the averaged potential, were also Gaussian with approximately exponential flanks. The model also predicted time-varying noise distributions during action potentials, which were observed in the extracellular spike signals. These findings suggest a principled method for detecting spikes in high-pass recordings and transient events in LFP and EEG signals. We show that the voltage distributions in brain recordings, including high-pass extracellular recordings, the LFP, and human EEG, are accurately described by a Gaussian within ±1.5 standard deviations from zero, with heavy, exponential tails outside these limits. This offers a principled way of setting event detection thresholds in high-pass recordings. It also offers a means for identifying event-like, transient signals in LFP and EEG recordings which may correlate with other neural phenomena.
Topics: Adult; Animals; Cats; Cerebral Cortex; Electroencephalography; Electrophysiological Phenomena; Humans; Macaca; Mice; Models, Statistical; Normal Distribution; Rats
PubMed: 33689506
DOI: 10.1152/jn.00633.2020 -
Genetic Epidemiology Feb 2022Count data with excessive zeros are increasingly ubiquitous in genetic association studies, such as neuritic plaques in brain pathology for Alzheimer's disease. Here, we...
Count data with excessive zeros are increasingly ubiquitous in genetic association studies, such as neuritic plaques in brain pathology for Alzheimer's disease. Here, we developed gene-based association tests to model such data by a mixture of two distributions, one for the structural zeros contributed by the Binomial distribution, and the other for the counts from the Poisson distribution. We derived the score statistics of the corresponding parameter of the rare variants in the zero-inflated Poisson regression model, and then constructed burden (ZIP-b) and kernel (ZIP-k) tests for the association tests. We evaluated omnibus tests that combined both ZIP-b and ZIP-k tests. Through simulated sequence data, we illustrated the potential power gain of our proposed method over a two-stage method that analyzes binary and non-zero continuous data separately for both burden and kernel tests. The ZIP burden test outperformed the kernel test as expected in all scenarios except for the scenario of variants with a mixture of directions in the genetic effects. We further demonstrated its applications to analyses of the neuritic plaque data in the ROSMAP cohort. We expect our proposed test to be useful in practice as more powerful than or complementary to the two-stage method.
Topics: Binomial Distribution; Humans; Models, Genetic; Models, Statistical; Phenotype; Poisson Distribution
PubMed: 34779034
DOI: 10.1002/gepi.22438 -
Cytopathology : Official Journal of the... Nov 2022This article serves as the second in a series that offers recommendations for optimal data reporting, specifically focusing on statistical methods most frequently... (Review)
Review
This article serves as the second in a series that offers recommendations for optimal data reporting, specifically focusing on statistical methods most frequently reported by the Cytopathology audience. The inaugural article, Recommendations for reporting statistical results when comparing proportions, dealt with the most common category of reported statistical tests over 2.5 years of Cytopathology articles-comparing proportions. Comparing samples using t tests, Mann-Whitney U, analysis of variance, and Kruskal-Wallis tests was another common category of statistical test reported among this audience. An important distinction between these tests is based on whether the samples follow a normal distribution. Therefore, Parametric or nonparametric statistical tests: Choosing the most appropriate option for your data is the second topic in the series. While this article will review considerations when selecting parametric or nonparametric statistical tests, an extensive review of each method is beyond the scope of this summary. The author encourages the reader to consult with a trained statistician to map out a thorough analytical plan (including their recommendations for the appropriate statistical test[s] to use) prior to data collection.
Topics: Humans; Statistics, Nonparametric; Normal Distribution
PubMed: 36017662
DOI: 10.1111/cyt.13174 -
PloS One 2022Compositional data, which is data consisting of fractions or probabilities, is common in many fields including ecology, economics, physical science and political...
Compositional data, which is data consisting of fractions or probabilities, is common in many fields including ecology, economics, physical science and political science. If these data would otherwise be normally distributed, their spread can be conveniently represented by a multivariate normal distribution truncated to the non-negative space under a unit simplex. Here this distribution is called the simplex-truncated multivariate normal distribution. For calculations on truncated distributions, it is often useful to obtain rapid estimates of their integral, mean and covariance; these quantities characterising the truncated distribution will generally possess different values to the corresponding non-truncated distribution. In this paper, three different approaches that can estimate the integral, mean and covariance of any simplex-truncated multivariate normal distribution are described and compared. These three approaches are (1) naive rejection sampling, (2) a method described by Gessner et al. that unifies subset simulation and the Holmes-Diaconis-Ross algorithm with an analytical version of elliptical slice sampling, and (3) a semi-analytical method that expresses the integral, mean and covariance in terms of integrals of hyperrectangularly-truncated multivariate normal distributions, the latter of which are readily computed in modern mathematical and statistical packages. Strong agreement is demonstrated between all three approaches, but the most computationally efficient approach depends strongly both on implementation details and the dimension of the simplex-truncated multivariate normal distribution. For computations in low-dimensional distributions, the semi-analytical method is fast and thus should be considered. As the dimension increases, the Gessner et al. method becomes the only practically efficient approach of the methods tested here.
Topics: Algorithms; Computer Simulation; Normal Distribution
PubMed: 35867671
DOI: 10.1371/journal.pone.0272014 -
PloS One 2022Resilience is a system's ability to withstand a disruption and return to a normal state quickly. It is a random variable due to the randomness of both the disruption and...
Resilience is a system's ability to withstand a disruption and return to a normal state quickly. It is a random variable due to the randomness of both the disruption and resilience behavior of a system. The distribution characteristics of resilience are the basis for resilience design and analysis, such as test sample size determination and assessment model selection. In this paper, we propose a systematic resilience distribution identification and analysis (RDIA) method based on a system's performance processes after disruptions. Typical performance degradation/recovery processes have linear, exponential, and trigonometric functions, and they have three key parameters: the maximum performance degradation, the degradation duration, and the recovery duration. Using the Monte Carlo method, these three key parameters are first sampled according to their corresponding probability density functions. Combining the sample results with the given performance function type, the system performance curves after disruptions can be obtained. Then the sample resilience is computed using a deterministic resilience measure and the resilience distribution can be determined through candidate distribution identification, parameter estimation, and a goodness-of-fit test. Finally, we apply our RDIA method to systems with typical performance processes, and both the orthogonal experiment method and the control variable method are used to investigate the resilience distribution laws. The results show that the resilience of these systems follows the Weibull distribution. An end-to-end communication system is also used to explain how to apply this method with simulation or test data in practice.
Topics: Monte Carlo Method; Statistical Distributions; Sample Size; Likelihood Functions; Computer Simulation
PubMed: 36327292
DOI: 10.1371/journal.pone.0276908 -
The International Journal of... Jan 2021In allometric studies, the joint distribution of the log-transformed morphometric variables is typically symmetric and with heavy tails. Moreover, in the bivariate case,...
In allometric studies, the joint distribution of the log-transformed morphometric variables is typically symmetric and with heavy tails. Moreover, in the bivariate case, it is customary to explain the morphometric variation of these variables by fitting a convenient line, as for example the first principal component (PC). To account for all these peculiarities, we propose the use of multiple scaled symmetric (MSS) distributions. These distributions have the advantage to be directly defined in the PC space, the kind of symmetry involved is less restrictive than the commonly considered elliptical symmetry, the behavior of the tails can vary across PCs, and their first PC is less sensitive to outliers. In the family of MSS distributions, we also propose the multiple scaled shifted exponential normal distribution, equivalent of the multivariate shifted exponential normal distribution in the MSS framework. For the sake of parsimony, we also allow the parameter governing the leptokurtosis on each PC, in the considered MSS distributions, to be tied across PCs. From an inferential point of view, we describe an EM algorithm to estimate the parameters by maximum likelihood, we illustrate how to compute standard errors of the obtained estimates, and we give statistical tests and confidence intervals for the parameters. We use artificial and real allometric data to appreciate the advantages of the MSS distributions over well-known elliptically symmetric distributions and to compare the robustness of the line from our models with respect to the lines fitted by well-established robust and non-robust methods available in the literature.
Topics: Algorithms; Likelihood Functions; Normal Distribution
PubMed: 33730771
DOI: 10.1515/ijb-2020-0059 -
Scientific Reports Nov 2021The uniformity of the rice cluster distribution in the field affects population quality and the precise management of pesticides and fertilizers. However, there is no...
The uniformity of the rice cluster distribution in the field affects population quality and the precise management of pesticides and fertilizers. However, there is no appropriate technical system for estimating and evaluating the uniformity at present. For that reason, a method based on unmanned aerial vehicle (UAV images) is proposed to estimate and evaluate the uniformity in this present study. This method includes rice cluster recognition and location determination based on the RGB color characteristics of the seedlings of aerial images, region segmentation considering the rice clusters based on Voronoi Diagram, and uniformity index definition for evaluating the rice cluster distribution based on the variation coefficient. The results indicate the rice cluster recognition attains a high precision, with the precision, accuracy, recall, and F1-score of rice cluster recognition reaching > 95%, 97%, 97%, 95%, and 96%, respectively. The rice cluster location error is small and obeys the gamma (3.00, 0.54) distribution (mean error, 1.62 cm). The uniformity index is reasonable for evaluating the rice cluster distribution verified via simulation. As a whole process, the estimating method is sufficiently high accuracy with relative error less than 0.01% over the manual labeling method. Therefore, this method based on UAV images is feasible, convenient, technologically advanced, inexpensive, and highly precision for the estimation and evaluation of the rice cluster distribution uniformity. However, the evaluation application indicates that there is much room for improvement in terms of the uniformity of mechanized paddy field transplanting in South China.
Topics: Image Processing, Computer-Assisted; Oryza; Remote Sensing Technology; Seedlings; Statistical Distributions; Unmanned Aerial Devices
PubMed: 34728745
DOI: 10.1038/s41598-021-01044-5