-
PloS One 2023In this study, we propose a generalized Marshall-Olkin exponentiated exponential distribution as a submodel of the family of generalized Marshall-Olkin distribution....
In this study, we propose a generalized Marshall-Olkin exponentiated exponential distribution as a submodel of the family of generalized Marshall-Olkin distribution. Some statistical properties of the proposed distribution are examined such as moments, the moment-generating function, incomplete moment, and Lorenz and Bonferroni curves. We give five estimators for the unknown parameters of the proposed distribution based on maximum likelihood, least squares, weighted least squares, and the Anderson-Darling and Cramer-von Mises methods of estimation. To investigate the finite sample properties of the estimators, a comprehensive Monte Carlo simulation study is conducted for the models with three sets of randomly selected parameter values. Finally, four different real data applications are presented to demonstrate the usefulness of the proposed distribution in real life.
Topics: Computer Simulation; Statistical Distributions; Monte Carlo Method; Least-Squares Analysis
PubMed: 36652462
DOI: 10.1371/journal.pone.0280349 -
Cerebral Cortex (New York, N.Y. : 1991) Aug 2023Numbers of neurons and their spatial variation are fundamental organizational features of the brain. Despite the large corpus of cytoarchitectonic data available in the...
Numbers of neurons and their spatial variation are fundamental organizational features of the brain. Despite the large corpus of cytoarchitectonic data available in the literature, the statistical distributions of neuron densities within and across brain areas remain largely uncharacterized. Here, we show that neuron densities are compatible with a lognormal distribution across cortical areas in several mammalian species, and find that this also holds true within cortical areas. A minimal model of noisy cell division, in combination with distributed proliferation times, can account for the coexistence of lognormal distributions within and across cortical areas. Our findings uncover a new organizational principle of cortical cytoarchitecture: the ubiquitous lognormal distribution of neuron densities, which adds to a long list of lognormal variables in the brain.
Topics: Animals; Neurons; Brain; Mammals; Cerebral Cortex; Statistical Distributions
PubMed: 37409647
DOI: 10.1093/cercor/bhad160 -
Biochemia Medica 2015Computer-intensive resampling/bootstrap methods are feasible when calculating reference intervals from non-Gaussian or small reference samples. Microsoft Excel® in... (Review)
Review
Computer-intensive resampling/bootstrap methods are feasible when calculating reference intervals from non-Gaussian or small reference samples. Microsoft Excel® in version 2010 or later includes natural functions, which lend themselves well to this purpose including recommended interpolation procedures for estimating 2.5 and 97.5 percentiles. The purpose of this paper is to introduce the reader to resampling estimation techniques in general and in using Microsoft Excel® 2010 for the purpose of estimating reference intervals in particular. Parametric methods are preferable to resampling methods when the distributions of observations in the reference samples is Gaussian or can transformed to that distribution even when the number of reference samples is less than 120. Resampling methods are appropriate when the distribution of data from the reference samples is non-Gaussian and in case the number of reference individuals and corresponding samples are in the order of 40. At least 500-1000 random samples with replacement should be taken from the results of measurement of the reference samples.
Topics: Humans; Models, Statistical; Normal Distribution; Reference Values; Software
PubMed: 26527366
DOI: 10.11613/BM.2015.031 -
Psychological Research Sep 2022Humans are surprisingly good at learning the statistical characteristics of their visual environment. Recent studies have revealed that not only can the visual system...
Humans are surprisingly good at learning the statistical characteristics of their visual environment. Recent studies have revealed that not only can the visual system learn repeated features of visual search distractors, but also their actual probability distributions. Search times were determined by the frequency of distractor features over consecutive search trials. The search displays applied in these studies involved many exemplars of distractors on each trial and while there is clear evidence that feature distributions can be learned from large distractor sets, it is less clear if distributions are well learned for single targets presented on each trial. Here, we investigated potential learning of probability distributions of single targets during visual search. Over blocks of trials, observers searched for an oddly colored target that was drawn from either a Gaussian or a uniform distribution. Search times for the different target colors were clearly influenced by the probability of that feature within trial blocks. The same search targets, coming from the extremes of the two distributions were found significantly slower during the blocks where the targets were drawn from a Gaussian distribution than from a uniform distribution indicating that observers were sensitive to the target probability determined by the distribution shape. In Experiment 2, we replicated the effect using binned distributions and revealed the limitations of encoding complex target distributions. Our results demonstrate detailed internal representations of target feature distributions and that the visual system integrates probability distributions of target colors over surprisingly long trial sequences.
Topics: Attention; Humans; Learning; Normal Distribution; Probability; Reaction Time; Visual Perception
PubMed: 34997327
DOI: 10.1007/s00426-021-01621-3 -
Forensic Science International Mar 2020While most evidence types considered by forensic scientists result from the interactions between criminals, objects or victims at crime scenes, dust evidence arises from...
While most evidence types considered by forensic scientists result from the interactions between criminals, objects or victims at crime scenes, dust evidence arises from the mere presence of individuals and objects at locations of interest. Dust is ubiquitous. Yet, the use of dust evidence is anecdotical and is limited to cases where rare and characteristic particles are observed. The dust at any given location contains a large number of particles from different types and the dust present on an object or individual traveling across locations may be indicative of the locations recently visited by an individual, and, in particular, of the presence of an individual at a particular site of interest, e.g., the scene of a crime. In this paper, we propose to represent dust mixtures as vectors of counts of the individual particles, which can be characterised by any appropriate analytical technique. This strategy enables us to describe a dust mixture as a mixture of multinomial distributions over a fixed number of dust particle types. Using a latent Dirichlet allocation model, we make inference on (a) the contributions of sites of interest to a dust mixture, and (b) the particle profiles associated with these sites.
Topics: Algorithms; Bayes Theorem; Dust; Models, Theoretical; Statistical Distributions
PubMed: 32058271
DOI: 10.1016/j.forsciint.2020.110144 -
Briefings in Bioinformatics Jan 2023The progress of single-cell RNA sequencing (scRNA-seq) has led to a large number of scRNA-seq data, which are widely used in biomedical research. The noise in the raw...
The progress of single-cell RNA sequencing (scRNA-seq) has led to a large number of scRNA-seq data, which are widely used in biomedical research. The noise in the raw data and tens of thousands of genes pose a challenge to capture the real structure and effective information of scRNA-seq data. Most of the existing single-cell analysis methods assume that the low-dimensional embedding of the raw data belongs to a Gaussian distribution or a low-dimensional nonlinear space without any prior information, which limits the flexibility and controllability of the model to a great extent. In addition, many existing methods need high computational cost, which makes them difficult to be used to deal with large-scale datasets. Here, we design and develop a depth generation model named Gaussian mixture adversarial autoencoders (scGMAAE), assuming that the low-dimensional embedding of different types of cells follows different Gaussian distributions, integrating Bayesian variational inference and adversarial training, as to give the interpretable latent representation of complex data and discover the statistical distribution of different types of cells. The scGMAAE is provided with good controllability, interpretability and scalability. Therefore, it can process large-scale datasets in a short time and give competitive results. scGMAAE outperforms existing methods in several ways, including dimensionality reduction visualization, cell clustering, differential expression analysis and batch effect removal. Importantly, compared with most deep learning methods, scGMAAE requires less iterations to generate the best results.
Topics: Gene Expression Profiling; Sequence Analysis, RNA; Normal Distribution; Bayes Theorem; Single-Cell Gene Expression Analysis; Single-Cell Analysis; Cluster Analysis
PubMed: 36592058
DOI: 10.1093/bib/bbac585 -
Biometrics Jun 2023Combining dependent tests of significance has broad applications but the related p-value calculation is challenging. For Fisher's combination test, current p-value...
Combining dependent tests of significance has broad applications but the related p-value calculation is challenging. For Fisher's combination test, current p-value calculation methods (eg, Brown's approximation) tend to inflate the type I error rate when the desired significance level is substantially less than 0.05. The problem could lead to significant false discoveries in big data analyses. This paper provides two main contributions. First, it presents a general family of Fisher type statistics, referred to as the GFisher, which covers many classic statistics, such as Fisher's combination, Good's statistic, Lancaster's statistic, weighted Z-score combination, and so forth. The GFisher allows a flexible weighting scheme, as well as an omnibus procedure that automatically adapts proper weights and the statistic-defining parameters to a given data. Second, the paper presents several new p-value calculation methods based on two novel ideas: moment-ratio matching and joint-distribution surrogating. Systematic simulations show that the new calculation methods are more accurate under multivariate Gaussian, and more robust under the generalized linear model and the multivariate t-distribution. The applications of the GFisher and the new p-value calculation methods are demonstrated by a gene-based single nucleotide polymorphism (SNP)-set association study. Relevant computation has been implemented to an R package GFisher available on the Comprehensive R Archive Network.
Topics: Linear Models; Statistical Distributions; Genetic Association Studies; Normal Distribution
PubMed: 35178716
DOI: 10.1111/biom.13634 -
Biometrical Journal. Biometrische... Feb 2023Common count distributions, such as the Poisson (binomial) distribution for unbounded (bounded) counts considered here, can be characterized by appropriate Stein...
Common count distributions, such as the Poisson (binomial) distribution for unbounded (bounded) counts considered here, can be characterized by appropriate Stein identities. These identities, in turn, might be utilized to define a corresponding goodness-of-fit (GoF) test, the test statistic of which involves the computation of weighted means for a user-selected weight function f. Here, the choice of f should be done with respect to the relevant alternative scenario, as it will have great impact on the GoF-test's performance. We derive the asymptotics of both the Poisson and binomial Stein-type GoF-statistic for general count distributions (we also briefly consider the negative-binomial case), such that the asymptotic power is easily computed for arbitrary alternatives. This allows for an efficient implementation of optimal Stein tests, that is, which are most powerful within a given class of weight functions. The performance and application of the optimal Stein-type GoF-tests is investigated by simulations and several medical data examples.
Topics: Binomial Distribution; Models, Statistical
PubMed: 36166681
DOI: 10.1002/bimj.202200073 -
BMC Medical Research Methodology Dec 2017The statistical analysis of health care cost data is often problematic because these data are usually non-negative, right-skewed and have excess zeros for non-users....
BACKGROUND
The statistical analysis of health care cost data is often problematic because these data are usually non-negative, right-skewed and have excess zeros for non-users. This prevents the use of linear models based on the Gaussian or Gamma distribution. A common way to counter this is the use of Two-part or Tobit models, which makes interpretation of the results more difficult. In this study, I explore a statistical distribution from the Tweedie family of distributions that can simultaneously model the probability of zero outcome, i.e. of being a non-user of health care utilization and continuous costs for users.
METHODS
I assess the usefulness of the Tweedie model in a Monte Carlo simulation study that addresses two common situations of low and high correlation of the users and the non-users of health care utilization. Furthermore, I compare the Tweedie model with several other models using a real data set from the RAND health insurance experiment.
RESULTS
I show that the Tweedie distribution fits cost data very well and provides better fit, especially when the number of non-users is low and the correlation between users and non-users is high.
CONCLUSION
The Tweedie distribution provides an interesting solution to many statistical problems in health economic analyses.
Topics: Algorithms; Computer Simulation; Health Care Costs; Health Services Research; Humans; Models, Economic; Monte Carlo Method; Patient Acceptance of Health Care; Statistical Distributions
PubMed: 29258428
DOI: 10.1186/s12874-017-0445-y -
Behavior Therapy Nov 2015When researchers are interested in the effect of certain interventions on certain individuals, single-subject studies are often performed. In their most simple form,... (Review)
Review
When researchers are interested in the effect of certain interventions on certain individuals, single-subject studies are often performed. In their most simple form, such single-subject studies require that a subject is measured on relevant criterion variables several times before an intervention and several times during or after the intervention. Scores from the two phases are then compared in order to investigate the intervention effect. Since observed scores typically consist of a mixture of true scores and random measurement error, simply looking at the difference in scores can be misleading. Hence, de Vries & Morey (2013) developed models and hypothesis tests for single-subject data, quantifying the evidence in data for the size and presence of an intervention effect. In this paper we give a non-technical overview of the models and hypothesis tests and show how they can be applied on real data using the BayesSingleSub R package, with the aid of an empirical data set.
Topics: Bayes Theorem; Data Interpretation, Statistical; Humans; Likelihood Functions; Models, Statistical; Psychology; Research Design; Sample Size; Statistical Distributions
PubMed: 26520223
DOI: 10.1016/j.beth.2014.09.013