-
Optics Express Jul 2023In the underwater optical wireless communication (UOWC) scenario, a photomultiplier tube (PMT) with higher sensitivity, lower noise, and a larger receiver area is...
In the underwater optical wireless communication (UOWC) scenario, a photomultiplier tube (PMT) with higher sensitivity, lower noise, and a larger receiver area is employed as the photon detector to further extend the transmission distance. Due to the complex underwater environment, the high directionality of the light beam, and the vibration of a transceiver, the incident optical power usually spans a very wide dynamic range, and the PMT may operate in any one of the three regimes: pulse, transition, and waveform. While it is difficult to obtain the analytical characterization of the output electric signals across these regimes, this paper resorts to experimental measurements of the upsampled discrete samples within a training symbol duration. Among different statistical distribution fitting options, generalized extreme value (GEV) distribution is found to show excellent performance in fitting the probability density function (PDF) of either multiple samples or the superimposition of all samples within a symbol duration. Then joint sample distribution (JSD) based and superimposed sample distribution (SSD) based symbol detection methods are proposed by adopting the GEV distribution and log-likelihood ratio (LLR) testing criterion. The proposed methods are experimentally evaluated under different received signal optical powers, data rates, and sampling rates. They are shown to outperform the Poisson and Gaussian based maximum likelihood detection methods which are employed for the pulse regime and waveform regime respectively. Furthermore, the effectiveness of the proposed methods in alleviating strong ambient radiation is experimentally verified.
PubMed: 37475336
DOI: 10.1364/OE.494311 -
Nature Communications Sep 2023Solar and wind resources are vital for the sustainable energy transition. Although renewable potentials have been widely assessed in existing literature, few studies...
Solar and wind resources are vital for the sustainable energy transition. Although renewable potentials have been widely assessed in existing literature, few studies have examined the statistical characteristics of the inherent renewable uncertainties arising from natural randomness, which is inevitable in stochastic-aware research and applications. Here we develop a rule-of-thumb statistical learning model for wind and solar power prediction and generate a year-long dataset of hourly prediction errors of 30 provinces in China. We reveal diversified spatiotemporal distribution patterns of prediction errors, indicating that over 60% of wind prediction errors and 50% of solar prediction errors arise from scenarios with high utilization rates. The first-order difference and peak ratio of generation series are two primary indicators explaining the uncertainty distribution. Additionally, we analyze the seasonal distributions of the provincial prediction errors that reveal a consistent law in China. Finally, policies including incentive improvements and interprovincial scheduling are suggested.
PubMed: 37666800
DOI: 10.1038/s41467-023-40670-7 -
Scientific Reports Jul 2023The paper presents a novel statistical approach for analyzing the daily coronavirus case and fatality statistics. The survival discretization method was used to generate...
The paper presents a novel statistical approach for analyzing the daily coronavirus case and fatality statistics. The survival discretization method was used to generate a two-parameter discrete distribution. The resulting distribution is referred to as the "Discrete Marshall-Olkin Length Biased Exponential (DMOLBE) distribution". Because of the varied forms of its probability mass and failure rate functions, the DMOLBE distribution is adaptable. We calculated the mean and variance, skewness, kurtosis, dispersion index, hazard and survival functions, and second failure rate function for the suggested distribution. The DI index demonstrates that the proposed model can represent both over-dispersed and under-dispersed data sets. We estimated the parameters of the DMOLBE distribution. The behavior of ML estimates is checked via a comprehensive simulation study. The behavior of Bayesian estimates is checked by generating 10,000 iterations of Markov chain Monte Carlo techniques, plotting the trace, and checking the proposed distribution. From simulation studies, it was observed that the bias and mean square error decreased with an increase in sample size. To show the importance and flexibility of DMOLBE distribution using two data sets about deaths due to coronavirus in China and Pakistan are analyzed. The DMOLBE distribution provides a better fit than some important discrete models namely the discrete Burr-XII, discrete Bilal, discrete Burr-Hatke, discrete Rayleigh distribution, and Poisson distributions. We conclude that the new proposed distribution works well in analyzing these data sets. The data sets used in the paper was collected from 2020 year.
Topics: Humans; Bayes Theorem; COVID-19; Computer Simulation; Probability; Markov Chains; Monte Carlo Method
PubMed: 37507438
DOI: 10.1038/s41598-023-39183-6 -
Analysis of occupational radiation dose data and determination of suitable probability distribution.Radiation Protection Dosimetry Jul 2023The first study on fitting dose data for workers was performed by Gale( 1) in 1965 where log-normal and normal distributions were used. Since then, various models of...
The first study on fitting dose data for workers was performed by Gale( 1) in 1965 where log-normal and normal distributions were used. Since then, various models of dose distribution have been proposed. The log-normal distribution and its different forms have been widely used for fitting the dose data. Most of the studies included one or two distributions under consideration. In this study five distributions are considered for fitting and four distributions are selected based on observation of Cullen-Frey graph. The Akaike's Information criteria (AIC) and Bayesian Information criteria (BIC) are applied to find the suitable distribution to fit the occupational dose data. The maximum likelihood method was used for calculation of AIC, BIC values and parameter estimation. A computer code is written in R-language and environment for statistical computing and graphics for analysis of occupational dose data of three institutions.
Topics: Humans; Bayes Theorem; Probability; Normal Distribution; Statistical Distributions; Radiation Dosage
PubMed: 37259618
DOI: 10.1093/rpd/ncad160 -
Mathematical Biosciences and... Nov 2023Factorization reduces computational complexity, and is therefore an important tool in statistical machine learning of high dimensional systems. Conventional molecular...
Factorization reduces computational complexity, and is therefore an important tool in statistical machine learning of high dimensional systems. Conventional molecular modeling, including molecular dynamics and Monte Carlo simulations of molecular systems, is a large research field based on approximate factorization of molecular interactions. Recently, the local distribution theory was proposed to factorize joint distribution of a given molecular system into trainable local distributions. Belief propagation algorithms are a family of exact factorization algorithms for (junction) trees, and are extended to approximate loopy belief propagation algorithms for graphs with loops. Despite the fact that factorization of probability distribution is the common foundation, computational research in molecular systems and machine learning studies utilizing belief propagation algorithms have been carried out independently with respective track of algorithm development. The connection and differences among these factorization algorithms are briefly presented in this perspective, with the hope to intrigue further development of factorization algorithms for physical modeling of complex molecular systems.
PubMed: 38124591
DOI: 10.3934/mbe.2023935 -
Biometrics Mar 2024The current Poisson factor models often assume that the factors are unknown, which overlooks the explanatory potential of certain observable covariates. This study...
The current Poisson factor models often assume that the factors are unknown, which overlooks the explanatory potential of certain observable covariates. This study focuses on high dimensional settings, where the number of the count response variables and/or covariates can diverge as the sample size increases. A covariate-augmented overdispersed Poisson factor model is proposed to jointly perform a high-dimensional Poisson factor analysis and estimate a large coefficient matrix for overdispersed count data. A group of identifiability conditions is provided to theoretically guarantee computational identifiability. We incorporate the interdependence of both response variables and covariates by imposing a low-rank constraint on the large coefficient matrix. To address the computation challenges posed by nonlinearity, two high-dimensional latent matrices, and the low-rank constraint, we propose a novel variational estimation scheme that combines Laplace and Taylor approximations. We also develop a criterion based on a singular value ratio to determine the number of factors and the rank of the coefficient matrix. Comprehensive simulation studies demonstrate that the proposed method outperforms the state-of-the-art methods in estimation accuracy and computational efficiency. The practical merit of our method is demonstrated by an application to the CITE-seq dataset. A flexible implementation of our proposed method is available in the R package COAP.
Topics: Poisson Distribution; Computer Simulation; Humans; Models, Statistical; Sample Size; Biometry; Factor Analysis, Statistical
PubMed: 38682464
DOI: 10.1093/biomtc/ujae031 -
Heliyon Jan 2024We aim in this paper to propose a novel class of distributions that was created by merging the Topp-Leone distribution and the Generated families of Kumaraswamy and...
We aim in this paper to propose a novel class of distributions that was created by merging the Topp-Leone distribution and the Generated families of Kumaraswamy and Marshall-Olkin. Its cumulative distribution function characterizes it and includes rational and polynomial functions. In particular, the following desirable properties of the new family are presented: Shannon entropy, order statistics, the quantile power series, and several associated measures and functions. Then, using a specific family member identified before, we create a parametric statistical model with the basic distribution being the inverse exponential distribution. Finally, a thorough investigation has been made to implement this new distribution with three data sets: the glass fibers data set, the glass Alumina data set and the hailing times data set. In comparison to six prominent competitors, the new model performs favorably on all statistical tests and criteria that were examined.
PubMed: 38298704
DOI: 10.1016/j.heliyon.2024.e24001 -
NEJM Evidence Nov 2023Increasingly, investigators are choosing to use Bayesian methods for the analysis of clinical trial data. Unlike classical statistical methods that treat model parameter...
Increasingly, investigators are choosing to use Bayesian methods for the analysis of clinical trial data. Unlike classical statistical methods that treat model parameter values (such as treatment effects) as fixed, Bayesian methods view parameters as following a probability distribution. As we have written previously, by analyzing clinical trial data using Bayesian methods one can obtain quantities that may be of interest to clinicians, providers, and patients, such as the probability that a treatment effect is more or less than 0, that is, the probability that a treatment is effective.
Topics: Humans; Bayes Theorem; Probability
PubMed: 38320533
DOI: 10.1056/EVIDe2300250 -
Microorganisms Nov 2023A comprehensive overview of the recent physics-inspired genome analysis tool, GenomeBits, is presented. This is based on traditional signal processing methods such as...
A comprehensive overview of the recent physics-inspired genome analysis tool, GenomeBits, is presented. This is based on traditional signal processing methods such as discrete Fourier transform (DFT). GenomeBits can be used to extract underlying genomics features from the distribution of nucleotides, and can be further used to analyze the mutation patterns in viral genomes. Examples of the main GenomeBits findings outlining the intrinsic signal organization of genomics sequences for different SARS-CoV-2 variants along the pandemic years 2020-2022 and Monkeypox cases in 2021 are presented to show the usefulness of GenomeBits. GenomeBits results for DFT of SARS-CoV-2 genomes in different geographical regions are discussed, together with the GenomeBits analysis of complete genome sequences for the first coronavirus variants reported: Alpha, Beta, Gamma, Epsilon and Eta. Interesting features of the Delta and Omicron variants in the form of a unique 'order-disorder' transition are uncovered from these samples, as well as from their cumulative distribution function and scatter plots. This class of transitions might reveal the cumulative outcome of mutations on the spike protein. A salient feature of GenomeBits is the mapping of the nucleotide bases (A,T,C,G) into an alternating spin-like numerical sequence via a series having binary (0,1) indicators for each A,T,C,G. This leads to the derivation of a set of statistical distribution curves. Furthermore, the quantum-based extension of the GenomeBits model to an analogous probability measure is shown to identify properties of genome sequences as wavefunctions via a superposition of states. An association of the integral of the GenomeBits coding and a binding-like energy can, in principle, also be established. The relevance of these different results in bioinformatics is analyzed.
PubMed: 38004745
DOI: 10.3390/microorganisms11112733 -
Biometrics Dec 2023The Dirichlet-multinomial (DM) distribution plays a fundamental role in modern statistical methodology development and application. Recently, the DM distribution and its...
The Dirichlet-multinomial (DM) distribution plays a fundamental role in modern statistical methodology development and application. Recently, the DM distribution and its variants have been used extensively to model multivariate count data generated by high-throughput sequencing technology in omics research due to its ability to accommodate the compositional structure of the data as well as overdispersion. A major limitation of the DM distribution is that it is unable to handle excess zeros typically found in practice which may bias inference. To fill this gap, we propose a novel Bayesian zero-inflated DM model for multivariate compositional count data with excess zeros. We then extend our approach to regression settings and embed sparsity-inducing priors to perform variable selection for high-dimensional covariate spaces. Throughout, modeling decisions are made to boost scalability without sacrificing interpretability or imposing limiting assumptions. Extensive simulations and an application to a human gut microbiome dataset are presented to compare the performance of the proposed method to existing approaches. We provide an accompanying R package with a user-friendly vignette to apply our method to other datasets.
Topics: Humans; Models, Statistical; Bayes Theorem; Microbiota; Gastrointestinal Microbiome; Poisson Distribution
PubMed: 36896642
DOI: 10.1111/biom.13853