-
Neural Networks : the Official Journal... Aug 2023In this article, we formulate the standard mixture learning problem as a Markov Decision Process (MDP). We theoretically show that the objective value of the MDP is...
In this article, we formulate the standard mixture learning problem as a Markov Decision Process (MDP). We theoretically show that the objective value of the MDP is equivalent to the log-likelihood of the observed data with a slightly different parameter space constrained by the policy. Different from some classic mixture learning methods such as Expectation-Maximization (EM) algorithm, the proposed reinforced algorithm requires no distribution assumptions and can handle the non-convex clustered data by constructing a model-free reward to evaluate the mixture assignment based on the spectral graph theory and Linear Discriminant Analysis (LDA). Extensive experiments on both synthetic and real examples demonstrate that the proposed method is comparable with the EM algorithm when the Gaussian mixture assumption is satisfied, and significantly outperforms it and other clustering methods in most scenarios when the model is misspecified. A Python implementation of our proposed method is available at https://github.com/leyuanheart/Reinforced-Mixture-Learning.
Topics: Algorithms; Markov Chains; Normal Distribution; Cluster Analysis
PubMed: 37307663
DOI: 10.1016/j.neunet.2023.05.018 -
Biometrics Sep 2023Capturing complex dependence structures between outcome variables (e.g., study endpoints) is of high relevance in contemporary biomedical data problems and medical...
Capturing complex dependence structures between outcome variables (e.g., study endpoints) is of high relevance in contemporary biomedical data problems and medical research. Distributional copula regression provides a flexible tool to model the joint distribution of multiple outcome variables by disentangling the marginal response distributions and their dependence structure. In a regression setup, each parameter of the copula model, that is, the marginal distribution parameters and the copula dependence parameters, can be related to covariates via structured additive predictors. We propose a framework to fit distributional copula regression via model-based boosting, which is a modern estimation technique that incorporates useful features like an intrinsic variable selection mechanism, parameter shrinkage and the capability to fit regression models in high-dimensional data setting, that is, situations with more covariates than observations. Thus, model-based boosting does not only complement existing Bayesian and maximum-likelihood based estimation frameworks for this model class but rather enables unique intrinsic mechanisms that can be helpful in many applied problems. The performance of our boosting algorithm for copula regression models with continuous margins is evaluated in simulation studies that cover low- and high-dimensional data settings and situations with and without dependence between the responses. Moreover, distributional copula boosting is used to jointly analyze and predict the length and the weight of newborns conditional on sonographic measurements of the fetus before delivery together with other clinical variables.
Topics: Infant, Newborn; Humans; Models, Statistical; Likelihood Functions; Bayes Theorem; Computer Simulation; Algorithms
PubMed: 36165288
DOI: 10.1111/biom.13765 -
The Science of the Total Environment Jun 2024This paper highlights the critical role of pH or proton activity measurements in environmental studies and emphasises the importance of applying proper statistical...
This paper highlights the critical role of pH or proton activity measurements in environmental studies and emphasises the importance of applying proper statistical approaches when handling pH data. This allows for more informed decisions to effectively manage environmental data such as from mining influenced water. Both the pH and {H} of the same system display different distributions, with pH mostly displaying a normal or bimodal distribution and {H} showing a lognormal distribution. It is therefore a challenge of whether to use pH or {H} to compute the mean or measures of central tendency for further environmental statistical analyses. In this study, different statistical techniques were applied to understand the distribution of pH and {H} from four different mine sites, Metsämonttu in Finland, Felsendome Rabenstein in Germany, Eastrand and Westrand mine water treatment plants in South Africa. Based on the statistical results, the geometric mean can be used to calculate the average of pH if the distribution is unimodal. For a multimodal pH data distribution, peak identifying methods can be applied to extract the mean for each data population and use them for further statistical analyses.
PubMed: 38917894
DOI: 10.1016/j.scitotenv.2024.174099 -
Journal of Korean Medical Science Jan 2024Determining if the frequency distribution of a given data set follows a normal distribution or not is among the first steps of data analysis. Visual examination of the... (Review)
Review
Determining if the frequency distribution of a given data set follows a normal distribution or not is among the first steps of data analysis. Visual examination of the data, commonly by Q-Q plot, although is acceptable by many scientists, is considered subjective and not acceptable by other researchers. One-sample Kolmogorov-Smirnov test with Lilliefors correction (for a sample size ≥ 50) and Shapiro-Wilk test (for a sample size < 50) are common statistical tests for checking the normality of a data set quantitatively. As parametric tests, which assume that the data distribution is normal (Gaussian, bell-shaped), are more robust compared to their non-parametric counterparts, we commonly use transformations (e.g., log-transformation, Box-Cox transformation, etc.) to make the frequency distribution of non-normally distributed data close to a normal distribution. Herein, I wish to reflect on presenting how to practically work with these statistical methods through examining of real data sets.
Topics: Humans; Data Analysis; Physicians; Research Personnel; Statistics, Nonparametric
PubMed: 38258367
DOI: 10.3346/jkms.2024.39.e35 -
IEEE Transactions on Pattern Analysis... Feb 2024The training and testing data for deep-neural-network-based classifiers are usually assumed to be sampled from the same distribution. When part of the testing samples...
The training and testing data for deep-neural-network-based classifiers are usually assumed to be sampled from the same distribution. When part of the testing samples are drawn from a distribution that is sufficiently far away from that of the training samples (a.k.a. out-of-distribution (OOD) samples), the trained neural network has a tendency to make high-confidence predictions for these OOD samples. Detection of the OOD samples is critical when training a neural network used for image classification, object detection, etc. It can enhance the classifier's robustness to irrelevant inputs, and improve the system's resilience and security under different forms of attacks. Detection of OOD samples has three main challenges: (i) the proposed OOD detection method should be compatible with various architectures of classifiers (e.g., DenseNet, ResNet) without significantly increasing the model complexity and requirements on computational resources; (ii) the OOD samples may come from multiple distributions, whose class labels are commonly unavailable; (iii) a score function needs to be defined to effectively separate OOD samples from in-distribution (InD) samples. To overcome these challenges, we propose a Wasserstein-based out-of-distribution detection (WOOD) method. The basic idea is to define a Wasserstein-based score that evaluates the dissimilarity between a test sample and the distribution of InD samples. An optimization problem is then formulated and solved based on the proposed score function. The statistical learning bound of the proposed method is investigated to guarantee that the loss value achieved by the empirical optimizer approximates the global optimum. The comparison study results demonstrate that the proposed WOOD consistently outperforms other existing OOD detection methods.
PubMed: 37906483
DOI: 10.1109/TPAMI.2023.3328883 -
PloS One 2023Testing whether data are from a normal distribution is a traditional problem and is of great concern for data analyses. The normality is the premise of many statistical...
Testing whether data are from a normal distribution is a traditional problem and is of great concern for data analyses. The normality is the premise of many statistical methods, such as t-test, Hotelling T2 test and ANOVA. There are numerous tests in the literature and the commonly used ones are Anderson-Darling test, Shapiro-Wilk test and Jarque-Bera test. Each test has its own advantageous points since they are developed for specific patterns and there is no method that consistently performs optimally in all situations. Since the data distribution of practical problems can be complex and diverse, we propose a Cauchy Combination Omnibus Test (CCOT) that is robust and valid in most data cases. We also give some theoretical results to analyze the good properties of CCOT. Two obvious advantages of CCOT are that not only does CCOT have a display expression for calculating statistical significance, but extensive simulation results show its robustness regardless of the shape of distribution the data comes from. Applications to South African Heart Disease and Neonatal Hearing Impairment data further illustrate its practicability.
Topics: Computer Simulation; Normal Distribution; Sample Size; Data Analysis
PubMed: 37535617
DOI: 10.1371/journal.pone.0289498 -
Neural Networks : the Official Journal... Oct 2023The problem of vanishing and exploding gradients has been a long-standing obstacle that hinders the effective training of neural networks. Despite various tricks and...
The problem of vanishing and exploding gradients has been a long-standing obstacle that hinders the effective training of neural networks. Despite various tricks and techniques that have been employed to alleviate the problem in practice, there still lacks satisfactory theories or provable solutions. In this paper, we address the problem from the perspective of high-dimensional probability theory. We provide a rigorous result that shows, under mild conditions, how the vanishing/exploding gradients problem disappears with high probability if the neural networks have sufficient width. Our main idea is to constrain both forward and backward signal propagation in a nonlinear neural network through a new class of activation functions, namely Gaussian-Poincaré normalized functions, and orthogonal weight matrices. Experiments on both synthetic and real-world data validate our theory and confirm its effectiveness on very deep neural networks when applied in practice.
Topics: Neural Networks, Computer; Normal Distribution
PubMed: 37666186
DOI: 10.1016/j.neunet.2023.08.017 -
Physical Review Letters Jun 2023We analyze transport on a graph with multiple constraints and where the weight of the edges connecting the nodes is a dynamical variable. The network dynamics results...
We analyze transport on a graph with multiple constraints and where the weight of the edges connecting the nodes is a dynamical variable. The network dynamics results from the interplay between a nonlinear function of the flow, dissipation, and Gaussian, additive noise. For a given set of parameters and finite noise amplitudes, the network self-organizes into one of several metastable configurations, according to a probability distribution that depends on the noise amplitude α. At a finite value α, we find a resonantlike behavior for which one network topology is the most probable stationary state. This specific topology maximizes the robustness and transport efficiency, it is reached with the maximal convergence rate, and it is not found by the noiseless dynamics. We argue that this behavior is a manifestation of noise-induced resonances in network self-organization. Our findings show that stochastic dynamics can boost transport on a nonlinear network and, further, suggest a change of paradigm about the role of noise in optimization algorithms.
Topics: Algorithms; Normal Distribution; Probability
PubMed: 37450810
DOI: 10.1103/PhysRevLett.130.267401 -
Nature Jul 2023Quantum computers promise to solve certain computational problems much faster than classical computers. However, current quantum processors are limited by their modest...
Quantum computers promise to solve certain computational problems much faster than classical computers. However, current quantum processors are limited by their modest size and appreciable error rates. Recent efforts to demonstrate quantum speedups have therefore focused on problems that are both classically hard and naturally suited to current quantum hardware, such as sampling from complicated-although not explicitly useful-probability distributions. Here we introduce and experimentally demonstrate a quantum algorithm that is similarly well suited to current hardware, but which samples from complicated distributions arising in several applications. The algorithm performs Markov chain Monte Carlo (MCMC), a prominent iterative technique, to sample from the Boltzmann distribution of classical Ising models. Unlike most near-term quantum algorithms, ours provably converges to the correct distribution, despite being hard to simulate classically. But like most MCMC algorithms, its convergence rate is difficult to establish theoretically, so we instead analysed it through both experiments and simulations. In experiments, our quantum algorithm converged in fewer iterations than common classical MCMC alternatives, suggesting unusual robustness to noise. In simulations, we observed a polynomial speedup between cubic and quartic over such alternatives. This empirical speedup, should it persist to larger scales, could ease computational bottlenecks posed by this sampling problem in machine learning, statistical physics and optimization. This algorithm therefore opens a new path for quantum computers to solve useful-not merely difficult-sampling problems.
PubMed: 37438591
DOI: 10.1038/s41586-023-06095-4 -
Entropy (Basel, Switzerland) Jul 2023Inspired by the development in modern data science, a shift is increasingly visible in the foundation of statistical inference, away from a real space, where random...
Inspired by the development in modern data science, a shift is increasingly visible in the foundation of statistical inference, away from a real space, where random variables reside, toward a nonmetrized and nonordinal alphabet, where more general random elements reside. While statistical inferences based on random variables are theoretically well supported in the rich literature of probability and statistics, inferences on alphabets, mostly by way of various entropies and their estimation, are less systematically supported in theory. Without the familiar notions of neighborhood, real or complex moments, tails, et cetera, associated with random variables, probability and statistics based on random elements on alphabets need more attention to foster a sound framework for rigorous development of entropy-based statistical exercises. In this article, several basic elements of entropic statistics are introduced and discussed, including notions of general entropies, entropic sample spaces, entropic distributions, entropic statistics, entropic multinomial distributions, entropic moments, and entropic basis, among other entropic objects. In particular, an entropic-moment-generating function is defined and it is shown to uniquely characterize the underlying distribution in entropic perspective, and, hence, all entropies. An entropic version of the Glivenko-Cantelli convergence theorem is also established.
PubMed: 37510007
DOI: 10.3390/e25071060