statistical distribution - OpenMD.com Journal Search

Generalized Marshall-Olkin exponentiated exponential distribution: Properties and applications.

PloS One 2023

In this study, we propose a generalized Marshall-Olkin exponentiated exponential distribution as a submodel of the family of generalized Marshall-Olkin distribution....

Summary PubMed Full Text PDF

Authors: Egemen Ozkan, Gulhayat Golbasi Simsek

In this study, we propose a generalized Marshall-Olkin exponentiated exponential distribution as a submodel of the family of generalized Marshall-Olkin distribution. Some statistical properties of the proposed distribution are examined such as moments, the moment-generating function, incomplete moment, and Lorenz and Bonferroni curves. We give five estimators for the unknown parameters of the proposed distribution based on maximum likelihood, least squares, weighted least squares, and the Anderson-Darling and Cramer-von Mises methods of estimation. To investigate the finite sample properties of the estimators, a comprehensive Monte Carlo simulation study is conducted for the models with three sets of randomly selected parameter values. Finally, four different real data applications are presented to demonstrate the usefulness of the proposed distribution in real life.

Topics: Computer Simulation; Statistical Distributions; Monte Carlo Method; Least-Squares Analysis

PubMed: 36652462
DOI: 10.1371/journal.pone.0280349

Ubiquitous lognormal distribution of neuron densities in mammalian cerebral cortex.

Cerebral Cortex (New York, N.Y. : 1991) Aug 2023

Numbers of neurons and their spatial variation are fundamental organizational features of the brain. Despite the large corpus of cytoarchitectonic data available in the...

Summary PubMed Full Text PDF

Authors: Aitor Morales-Gregorio, Alexander van Meegen, Sacha J van Albada...

Numbers of neurons and their spatial variation are fundamental organizational features of the brain. Despite the large corpus of cytoarchitectonic data available in the literature, the statistical distributions of neuron densities within and across brain areas remain largely uncharacterized. Here, we show that neuron densities are compatible with a lognormal distribution across cortical areas in several mammalian species, and find that this also holds true within cortical areas. A minimal model of noisy cell division, in combination with distributed proliferation times, can account for the coexistence of lognormal distributions within and across cortical areas. Our findings uncover a new organizational principle of cortical cytoarchitecture: the ubiquitous lognormal distribution of neuron densities, which adds to a long list of lognormal variables in the brain.

Topics: Animals; Neurons; Brain; Mammals; Cerebral Cortex; Statistical Distributions

PubMed: 37409647
DOI: 10.1093/cercor/bhad160

Resampling methods in Microsoft Excel® for estimating reference intervals.

Biochemia Medica 2015

Computer-intensive resampling/bootstrap methods are feasible when calculating reference intervals from non-Gaussian or small reference samples. Microsoft Excel® in... (Review)

Summary PubMed Full Text PDF

Review

Authors: Elvar Theodorsson

Computer-intensive resampling/bootstrap methods are feasible when calculating reference intervals from non-Gaussian or small reference samples. Microsoft Excel® in version 2010 or later includes natural functions, which lend themselves well to this purpose including recommended interpolation procedures for estimating 2.5 and 97.5 percentiles.  The purpose of this paper is to introduce the reader to resampling estimation techniques in general and in using Microsoft Excel® 2010 for the purpose of estimating reference intervals in particular.  Parametric methods are preferable to resampling methods when the distributions of observations in the reference samples is Gaussian or can transformed to that distribution even when the number of reference samples is less than 120. Resampling methods are appropriate when the distribution of data from the reference samples is non-Gaussian and in case the number of reference individuals and corresponding samples are in the order of 40. At least 500-1000 random samples with replacement should be taken from the results of measurement of the reference samples.

Topics: Humans; Models, Statistical; Normal Distribution; Reference Values; Software

PubMed: 26527366
DOI: 10.11613/BM.2015.031

Temporal integration of feature probability distributions.

Psychological Research Sep 2022

Humans are surprisingly good at learning the statistical characteristics of their visual environment. Recent studies have revealed that not only can the visual system...

Summary PubMed

Authors: Sabrina Hansmann-Roth, Sóley Þorsteinsdóttir, Joy J Geng...

Humans are surprisingly good at learning the statistical characteristics of their visual environment. Recent studies have revealed that not only can the visual system learn repeated features of visual search distractors, but also their actual probability distributions. Search times were determined by the frequency of distractor features over consecutive search trials. The search displays applied in these studies involved many exemplars of distractors on each trial and while there is clear evidence that feature distributions can be learned from large distractor sets, it is less clear if distributions are well learned for single targets presented on each trial. Here, we investigated potential learning of probability distributions of single targets during visual search. Over blocks of trials, observers searched for an oddly colored target that was drawn from either a Gaussian or a uniform distribution. Search times for the different target colors were clearly influenced by the probability of that feature within trial blocks. The same search targets, coming from the extremes of the two distributions were found significantly slower during the blocks where the targets were drawn from a Gaussian distribution than from a uniform distribution indicating that observers were sensitive to the target probability determined by the distribution shape. In Experiment 2, we replicated the effect using binned distributions and revealed the limitations of encoding complex target distributions. Our results demonstrate detailed internal representations of target feature distributions and that the visual system integrates probability distributions of target colors over surprisingly long trial sequences.

Topics: Attention; Humans; Learning; Normal Distribution; Probability; Reaction Time; Visual Perception

PubMed: 34997327
DOI: 10.1007/s00426-021-01621-3

Deconvolution of dust mixtures.

Forensic Science International Mar 2020

While most evidence types considered by forensic scientists result from the interactions between criminals, objects or victims at crime scenes, dust evidence arises from...

Summary PubMed

Authors: M A Ausdemore, C Neumann

While most evidence types considered by forensic scientists result from the interactions between criminals, objects or victims at crime scenes, dust evidence arises from the mere presence of individuals and objects at locations of interest. Dust is ubiquitous. Yet, the use of dust evidence is anecdotical and is limited to cases where rare and characteristic particles are observed. The dust at any given location contains a large number of particles from different types and the dust present on an object or individual traveling across locations may be indicative of the locations recently visited by an individual, and, in particular, of the presence of an individual at a particular site of interest, e.g., the scene of a crime. In this paper, we propose to represent dust mixtures as vectors of counts of the individual particles, which can be characterised by any appropriate analytical technique. This strategy enables us to describe a dust mixture as a mixture of multinomial distributions over a fixed number of dust particle types. Using a latent Dirichlet allocation model, we make inference on (a) the contributions of sites of interest to a dust mixture, and (b) the particle profiles associated with these sites.

Topics: Algorithms; Bayes Theorem; Dust; Models, Theoretical; Statistical Distributions

PubMed: 32058271
DOI: 10.1016/j.forsciint.2020.110144

scGMAAE: Gaussian mixture adversarial autoencoders for diversification analysis of scRNA-seq data.

Briefings in Bioinformatics Jan 2023

The progress of single-cell RNA sequencing (scRNA-seq) has led to a large number of scRNA-seq data, which are widely used in biomedical research. The noise in the raw...

Summary PubMed

Authors: Hai-Yun Wang, Jian-Ping Zhao, Chun-Hou Zheng...

The progress of single-cell RNA sequencing (scRNA-seq) has led to a large number of scRNA-seq data, which are widely used in biomedical research. The noise in the raw data and tens of thousands of genes pose a challenge to capture the real structure and effective information of scRNA-seq data. Most of the existing single-cell analysis methods assume that the low-dimensional embedding of the raw data belongs to a Gaussian distribution or a low-dimensional nonlinear space without any prior information, which limits the flexibility and controllability of the model to a great extent. In addition, many existing methods need high computational cost, which makes them difficult to be used to deal with large-scale datasets. Here, we design and develop a depth generation model named Gaussian mixture adversarial autoencoders (scGMAAE), assuming that the low-dimensional embedding of different types of cells follows different Gaussian distributions, integrating Bayesian variational inference and adversarial training, as to give the interpretable latent representation of complex data and discover the statistical distribution of different types of cells. The scGMAAE is provided with good controllability, interpretability and scalability. Therefore, it can process large-scale datasets in a short time and give competitive results. scGMAAE outperforms existing methods in several ways, including dimensionality reduction visualization, cell clustering, differential expression analysis and batch effect removal. Importantly, compared with most deep learning methods, scGMAAE requires less iterations to generate the best results.

Topics: Gene Expression Profiling; Sequence Analysis, RNA; Normal Distribution; Bayes Theorem; Single-Cell Gene Expression Analysis; Single-Cell Analysis; Cluster Analysis

PubMed: 36592058
DOI: 10.1093/bib/bbac585

The generalized Fisher's combination and accurate p-value calculation under dependence.

Biometrics Jun 2023

Combining dependent tests of significance has broad applications but the related p-value calculation is challenging. For Fisher's combination test, current p-value...

Summary PubMed

Authors: Hong Zhang, Zheyang Wu

Combining dependent tests of significance has broad applications but the related p-value calculation is challenging. For Fisher's combination test, current p-value calculation methods (eg, Brown's approximation) tend to inflate the type I error rate when the desired significance level is substantially less than 0.05. The problem could lead to significant false discoveries in big data analyses. This paper provides two main contributions. First, it presents a general family of Fisher type statistics, referred to as the GFisher, which covers many classic statistics, such as Fisher's combination, Good's statistic, Lancaster's statistic, weighted Z-score combination, and so forth. The GFisher allows a flexible weighting scheme, as well as an omnibus procedure that automatically adapts proper weights and the statistic-defining parameters to a given data. Second, the paper presents several new p-value calculation methods based on two novel ideas: moment-ratio matching and joint-distribution surrogating. Systematic simulations show that the new calculation methods are more accurate under multivariate Gaussian, and more robust under the generalized linear model and the multivariate t-distribution. The applications of the GFisher and the new p-value calculation methods are demonstrated by a gene-based single nucleotide polymorphism (SNP)-set association study. Relevant computation has been implemented to an R package GFisher available on the Comprehensive R Archive Network.

Topics: Linear Models; Statistical Distributions; Genetic Association Studies; Normal Distribution

PubMed: 35178716
DOI: 10.1111/biom.13634

Optimal Stein-type goodness-of-fit tests for count data.

Biometrical Journal. Biometrische... Feb 2023

Common count distributions, such as the Poisson (binomial) distribution for unbounded (bounded) counts considered here, can be characterized by appropriate Stein...

Summary PubMed

Authors: Christian H Weiß, Pedro Puig, Boris Aleksandrov...

Common count distributions, such as the Poisson (binomial) distribution for unbounded (bounded) counts considered here, can be characterized by appropriate Stein identities. These identities, in turn, might be utilized to define a corresponding goodness-of-fit (GoF) test, the test statistic of which involves the computation of weighted means for a user-selected weight function f. Here, the choice of f should be done with respect to the relevant alternative scenario, as it will have great impact on the GoF-test's performance. We derive the asymptotics of both the Poisson and binomial Stein-type GoF-statistic for general count distributions (we also briefly consider the negative-binomial case), such that the asymptotic power is easily computed for arbitrary alternatives. This allows for an efficient implementation of optimal Stein tests, that is, which are most powerful within a given class of weight functions. The performance and application of the optimal Stein-type GoF-tests is investigated by simulations and several medical data examples.

Topics: Binomial Distribution; Models, Statistical

PubMed: 36166681
DOI: 10.1002/bimj.202200073

Tweedie distributions for fitting semicontinuous health care utilization cost data.

BMC Medical Research Methodology Dec 2017

The statistical analysis of health care cost data is often problematic because these data are usually non-negative, right-skewed and have excess zeros for non-users....

Summary PubMed Full Text PDF

Authors: Christoph F Kurz

BACKGROUND

The statistical analysis of health care cost data is often problematic because these data are usually non-negative, right-skewed and have excess zeros for non-users. This prevents the use of linear models based on the Gaussian or Gamma distribution. A common way to counter this is the use of Two-part or Tobit models, which makes interpretation of the results more difficult. In this study, I explore a statistical distribution from the Tweedie family of distributions that can simultaneously model the probability of zero outcome, i.e. of being a non-user of health care utilization and continuous costs for users.

METHODS

I assess the usefulness of the Tweedie model in a Monte Carlo simulation study that addresses two common situations of low and high correlation of the users and the non-users of health care utilization. Furthermore, I compare the Tweedie model with several other models using a real data set from the RAND health insurance experiment.

RESULTS

I show that the Tweedie distribution fits cost data very well and provides better fit, especially when the number of non-users is low and the correlation between users and non-users is high.

CONCLUSION

The Tweedie distribution provides an interesting solution to many statistical problems in health economic analyses.

Topics: Algorithms; Computer Simulation; Health Care Costs; Health Services Research; Humans; Models, Economic; Monte Carlo Method; Patient Acceptance of Health Care; Statistical Distributions

PubMed: 29258428
DOI: 10.1186/s12874-017-0445-y

A Tutorial on Computing Bayes Factors for Single-Subject Designs.

Behavior Therapy Nov 2015

When researchers are interested in the effect of certain interventions on certain individuals, single-subject studies are often performed. In their most simple form,... (Review)

Summary PubMed

Review

Authors: Rivka M de Vries, Bregje M A Hartogs, Richard D Morey...

When researchers are interested in the effect of certain interventions on certain individuals, single-subject studies are often performed. In their most simple form, such single-subject studies require that a subject is measured on relevant criterion variables several times before an intervention and several times during or after the intervention. Scores from the two phases are then compared in order to investigate the intervention effect. Since observed scores typically consist of a mixture of true scores and random measurement error, simply looking at the difference in scores can be misleading. Hence, de Vries & Morey (2013) developed models and hypothesis tests for single-subject data, quantifying the evidence in data for the size and presence of an intervention effect. In this paper we give a non-technical overview of the models and hypothesis tests and show how they can be applied on real data using the BayesSingleSub R package, with the aid of an empirical data set.

Topics: Bayes Theorem; Data Interpretation, Statistical; Humans; Likelihood Functions; Models, Statistical; Psychology; Research Design; Sample Size; Statistical Distributions

PubMed: 26520223
DOI: 10.1016/j.beth.2014.09.013