-
Cognition Oct 2022Humans can rapidly estimate the statistical properties of groups of stimuli, including their average and variability. But recent studies of so-called Feature...
Humans can rapidly estimate the statistical properties of groups of stimuli, including their average and variability. But recent studies of so-called Feature Distribution Learning (FDL) have shown that observers can quickly learn even more complex aspects of feature distributions. In FDL, observers learn the full shape of a distribution of features in a set of distractor stimuli and use this information to improve visual search: response times (RT) are slowed if the target feature lies inside the previous distractor distribution, and the RT patterns closely reflect the distribution shape. FDL requires only a few trials and is markedly sensitive to different distribution types. It is unknown, however, whether our perceptual system encodes feature distributions automatically and by passive exposure, or whether this learning requires active engagement with the stimuli. In two experiments, we sought to answer this question. During an initial exposure stage, participants passively viewed a display of 36 lines that included one orientation singleton or no singletons. In the following search display, they had to find an oddly oriented target. The orientations of the lines were determined either by a Gaussian or a uniform distribution. We found evidence for FDL only when the passive trials contained an orientation singleton. Under these conditions, RT's decreased as a function of the orientation distance between the target and the mean of the exposed distractor distribution. These results suggest that passive exposure to a distribution of visual features can affect subsequent search performance, but only if a singleton appears during exposure to the distribution.
Topics: Attention; Humans; Learning; Reaction Time; Statistical Distributions; Visual Perception
PubMed: 35785655
DOI: 10.1016/j.cognition.2022.105211 -
BMC Bioinformatics May 2022Cluster algorithms are gaining in popularity in biomedical research due to their compelling ability to identify discrete subgroups in data, and their increasing...
BACKGROUND
Cluster algorithms are gaining in popularity in biomedical research due to their compelling ability to identify discrete subgroups in data, and their increasing accessibility in mainstream software. While guidelines exist for algorithm selection and outcome evaluation, there are no firmly established ways of computing a priori statistical power for cluster analysis. Here, we estimated power and classification accuracy for common analysis pipelines through simulation. We systematically varied subgroup size, number, separation (effect size), and covariance structure. We then subjected generated datasets to dimensionality reduction approaches (none, multi-dimensional scaling, or uniform manifold approximation and projection) and cluster algorithms (k-means, agglomerative hierarchical clustering with Ward or average linkage and Euclidean or cosine distance, HDBSCAN). Finally, we directly compared the statistical power of discrete (k-means), "fuzzy" (c-means), and finite mixture modelling approaches (which include latent class analysis and latent profile analysis).
RESULTS
We found that clustering outcomes were driven by large effect sizes or the accumulation of many smaller effects across features, and were mostly unaffected by differences in covariance structure. Sufficient statistical power was achieved with relatively small samples (N = 20 per subgroup), provided cluster separation is large (Δ = 4). Finally, we demonstrated that fuzzy clustering can provide a more parsimonious and powerful alternative for identifying separable multivariate normal distributions, particularly those with slightly lower centroid separation (Δ = 3).
CONCLUSIONS
Traditional intuitions about statistical power only partially apply to cluster analysis: increasing the number of participants above a sufficient sample size did not improve power, but effect size was crucial. Notably, for the popular dimensionality reduction and clustering algorithms tested here, power was only satisfactory for relatively large effect sizes (clear separation between subgroups). Fuzzy clustering provided higher power in multivariate normal distributions. Overall, we recommend that researchers (1) only apply cluster analysis when large subgroup separation is expected, (2) aim for sample sizes of N = 20 to N = 30 per expected subgroup, (3) use multi-dimensional scaling to improve cluster separation, and (4) use fuzzy clustering or mixture modelling approaches that are more powerful and more parsimonious with partially overlapping multivariate normal distributions.
Topics: Algorithms; Cluster Analysis; Humans; Normal Distribution; Sample Size; Software
PubMed: 35641905
DOI: 10.1186/s12859-022-04675-1 -
Mathematical Biosciences and... Sep 2022In this work, we suggest a reduced distribution with two parameters of the modified Weibull distribution to avoid some estimation difficulties. The hazard rate function...
In this work, we suggest a reduced distribution with two parameters of the modified Weibull distribution to avoid some estimation difficulties. The hazard rate function of the reduced distribution exhibits decreasing, increasing or bathtub shape. The suggested reduced distribution can be applied to many problems of modelling lifetime data. Some statistical properties of the proposed distribution have been discussed. The maximum likelihood is employed to estimate the model parameters. The Fisher information matrix is derived and then applied to construct confidence intervals for parameters. A simulation is conducted to illustrate the performance of maximum likelihood estimation. Four sets of real data are tested to prove the proposed distribution advantages. According to the statistical criteria, the proposed distribution fits the tested data better than some well-known two-and three-parameter distributions.
Topics: Likelihood Functions; Computer Simulation; Statistical Distributions; Engineering
PubMed: 36654042
DOI: 10.3934/mbe.2022617 -
PloS One 2022Compositional data, which is data consisting of fractions or probabilities, is common in many fields including ecology, economics, physical science and political...
Compositional data, which is data consisting of fractions or probabilities, is common in many fields including ecology, economics, physical science and political science. If these data would otherwise be normally distributed, their spread can be conveniently represented by a multivariate normal distribution truncated to the non-negative space under a unit simplex. Here this distribution is called the simplex-truncated multivariate normal distribution. For calculations on truncated distributions, it is often useful to obtain rapid estimates of their integral, mean and covariance; these quantities characterising the truncated distribution will generally possess different values to the corresponding non-truncated distribution. In this paper, three different approaches that can estimate the integral, mean and covariance of any simplex-truncated multivariate normal distribution are described and compared. These three approaches are (1) naive rejection sampling, (2) a method described by Gessner et al. that unifies subset simulation and the Holmes-Diaconis-Ross algorithm with an analytical version of elliptical slice sampling, and (3) a semi-analytical method that expresses the integral, mean and covariance in terms of integrals of hyperrectangularly-truncated multivariate normal distributions, the latter of which are readily computed in modern mathematical and statistical packages. Strong agreement is demonstrated between all three approaches, but the most computationally efficient approach depends strongly both on implementation details and the dimension of the simplex-truncated multivariate normal distribution. For computations in low-dimensional distributions, the semi-analytical method is fast and thus should be considered. As the dimension increases, the Gessner et al. method becomes the only practically efficient approach of the methods tested here.
Topics: Algorithms; Computer Simulation; Normal Distribution
PubMed: 35867671
DOI: 10.1371/journal.pone.0272014 -
PloS One 2022Resilience is a system's ability to withstand a disruption and return to a normal state quickly. It is a random variable due to the randomness of both the disruption and...
Resilience is a system's ability to withstand a disruption and return to a normal state quickly. It is a random variable due to the randomness of both the disruption and resilience behavior of a system. The distribution characteristics of resilience are the basis for resilience design and analysis, such as test sample size determination and assessment model selection. In this paper, we propose a systematic resilience distribution identification and analysis (RDIA) method based on a system's performance processes after disruptions. Typical performance degradation/recovery processes have linear, exponential, and trigonometric functions, and they have three key parameters: the maximum performance degradation, the degradation duration, and the recovery duration. Using the Monte Carlo method, these three key parameters are first sampled according to their corresponding probability density functions. Combining the sample results with the given performance function type, the system performance curves after disruptions can be obtained. Then the sample resilience is computed using a deterministic resilience measure and the resilience distribution can be determined through candidate distribution identification, parameter estimation, and a goodness-of-fit test. Finally, we apply our RDIA method to systems with typical performance processes, and both the orthogonal experiment method and the control variable method are used to investigate the resilience distribution laws. The results show that the resilience of these systems follows the Weibull distribution. An end-to-end communication system is also used to explain how to apply this method with simulation or test data in practice.
Topics: Monte Carlo Method; Statistical Distributions; Sample Size; Likelihood Functions; Computer Simulation
PubMed: 36327292
DOI: 10.1371/journal.pone.0276908 -
Anaesthesia Jan 2017
Topics: Data Interpretation, Statistical; Humans; Hydrogen-Ion Concentration; Infant, Newborn; Reference Values; Statistical Distributions; Umbilical Arteries; Umbilical Veins
PubMed: 27858980
DOI: 10.1111/anae.13753 -
General two-parameter distribution: Statistical properties, estimation, and application on COVID-19.PloS One 2023In this paper, we introduced a novel general two-parameter statistical distribution which can be presented as a mix of both exponential and gamma distributions. Some...
In this paper, we introduced a novel general two-parameter statistical distribution which can be presented as a mix of both exponential and gamma distributions. Some statistical properties of the general model were derived mathematically. Many estimation methods studied the estimation of the proposed model parameters. A new statistical model was presented as a particular case of the general two-parameter model, which is used to study the performance of the different estimation methods with the randomly generated data sets. Finally, the COVID-19 data set was used to show the superiority of the particular case for fitting real-world data sets over other compared well-known models.
Topics: Humans; COVID-19; Models, Statistical; Statistical Distributions
PubMed: 36753497
DOI: 10.1371/journal.pone.0281474 -
PloS One 2018Fame and celebrity play an ever-increasing role in our culture. However, despite the cultural and economic importance of fame and its gradations, there exists no...
Fame and celebrity play an ever-increasing role in our culture. However, despite the cultural and economic importance of fame and its gradations, there exists no consensus method for quantifying the fame of an individual, or of comparing that of two individuals. We argue that, even if fame is difficult to measure with precision, one may develop useful metrics for fame that correlate well with intuition and that remain reasonably stable over time. Using datasets of recently deceased individuals who were highly renowned, we have evaluated several internet-based methods for quantifying fame. We find that some widely-used internet-derived metrics, such as search engine results, correlate poorly with human subject judgments of fame. However other metrics exist that agree well with human judgments and appear to offer workable, easily accessible measures of fame. Using such a metric we perform a preliminary investigation of the statistical distribution of fame, which has some of the power law character seen in other natural and social phenomena such as landslides and market crashes. In order to demonstrate how such findings can generate quantitative insight into celebrity culture, we assess some folk ideas regarding the frequency distribution and apparent clustering of celebrity deaths.
Topics: Famous Persons; Female; Humans; Internet; Judgment; Male; Probability; Statistical Distributions; Surveys and Questionnaires
PubMed: 29979792
DOI: 10.1371/journal.pone.0200196 -
Scientific Reports Nov 2021The uniformity of the rice cluster distribution in the field affects population quality and the precise management of pesticides and fertilizers. However, there is no...
The uniformity of the rice cluster distribution in the field affects population quality and the precise management of pesticides and fertilizers. However, there is no appropriate technical system for estimating and evaluating the uniformity at present. For that reason, a method based on unmanned aerial vehicle (UAV images) is proposed to estimate and evaluate the uniformity in this present study. This method includes rice cluster recognition and location determination based on the RGB color characteristics of the seedlings of aerial images, region segmentation considering the rice clusters based on Voronoi Diagram, and uniformity index definition for evaluating the rice cluster distribution based on the variation coefficient. The results indicate the rice cluster recognition attains a high precision, with the precision, accuracy, recall, and F1-score of rice cluster recognition reaching > 95%, 97%, 97%, 95%, and 96%, respectively. The rice cluster location error is small and obeys the gamma (3.00, 0.54) distribution (mean error, 1.62 cm). The uniformity index is reasonable for evaluating the rice cluster distribution verified via simulation. As a whole process, the estimating method is sufficiently high accuracy with relative error less than 0.01% over the manual labeling method. Therefore, this method based on UAV images is feasible, convenient, technologically advanced, inexpensive, and highly precision for the estimation and evaluation of the rice cluster distribution uniformity. However, the evaluation application indicates that there is much room for improvement in terms of the uniformity of mechanized paddy field transplanting in South China.
Topics: Image Processing, Computer-Assisted; Oryza; Remote Sensing Technology; Seedlings; Statistical Distributions; Unmanned Aerial Devices
PubMed: 34728745
DOI: 10.1038/s41598-021-01044-5 -
Computational Intelligence and... 2022In this study, a new one-parameter count distribution is proposed by combining Poisson and XLindley distributions. Some of its statistical and reliability properties...
In this study, a new one-parameter count distribution is proposed by combining Poisson and XLindley distributions. Some of its statistical and reliability properties including order statistics, hazard rate function, reversed hazard rate function, mode, factorial moments, probability generating function, moment generating function, index of dispersion, Shannon entropy, Mills ratio, mean residual life function, and associated measures are investigated. All these properties can be expressed in explicit forms. It is found that the new probability mass function can be utilized to model positively skewed data with leptokurtic shape. Moreover, the new discrete distribution is considered a proper tool to model equi- and over-dispersed phenomena with increasing hazard rate function. The distribution parameter is estimated by different six estimation approaches, and the behavior of these methods is explored using the Monte Carlo simulation. Finally, two applications to real life are presented herein to illustrate the flexibility of the new model.
Topics: Computer Simulation; Likelihood Functions; Models, Statistical; Monte Carlo Method; Poisson Distribution; Reproducibility of Results; Statistical Distributions
PubMed: 35463286
DOI: 10.1155/2022/6503670