-
Scientific Reports Jul 2023Among diseases, cancer exhibits the fastest global spread, presenting a substantial challenge for patients, their families, and the communities they belong to. This...
Among diseases, cancer exhibits the fastest global spread, presenting a substantial challenge for patients, their families, and the communities they belong to. This paper is devoted to modeling such a disease as a special case. A newly proposed distribution called the binomial-discrete Erlang-truncated exponential (BDETE) is introduced. The BDETE is a mixture of binomial distribution with the number of trials (parameter [Formula: see text]) taken after a discrete Erlang-truncated exponential distribution. A comprehensive mathematical treatment of the proposed distribution and expressions of its density, cumulative distribution function, survival function, failure rate function, Quantile function, moment generating function, Shannon entropy, order statistics, and stress-strength reliability, are provided. The distribution's parameters are estimated using the maximum likelihood method. Two real-world lifetime count data sets from the cancer disease, both of which are right-skewed and over-dispersed, are fitted using the proposed BDETE distribution to evaluate its efficacy and viability. We expect the findings to become standard works in probability theory and its related fields.
Topics: Humans; Reproducibility of Results; Statistical Distributions; Entropy; Neoplasms
PubMed: 37507433
DOI: 10.1038/s41598-023-38709-2 -
Mathematical Biosciences and... Sep 2022In this work, we suggest a reduced distribution with two parameters of the modified Weibull distribution to avoid some estimation difficulties. The hazard rate function...
In this work, we suggest a reduced distribution with two parameters of the modified Weibull distribution to avoid some estimation difficulties. The hazard rate function of the reduced distribution exhibits decreasing, increasing or bathtub shape. The suggested reduced distribution can be applied to many problems of modelling lifetime data. Some statistical properties of the proposed distribution have been discussed. The maximum likelihood is employed to estimate the model parameters. The Fisher information matrix is derived and then applied to construct confidence intervals for parameters. A simulation is conducted to illustrate the performance of maximum likelihood estimation. Four sets of real data are tested to prove the proposed distribution advantages. According to the statistical criteria, the proposed distribution fits the tested data better than some well-known two-and three-parameter distributions.
Topics: Likelihood Functions; Computer Simulation; Statistical Distributions; Engineering
PubMed: 36654042
DOI: 10.3934/mbe.2022617 -
Cognitive Science Apr 2023Humans can learn complex functional relationships between variables from small amounts of data. In doing so, they draw on prior expectations about the form of these... (Randomized Controlled Trial)
Randomized Controlled Trial
Humans can learn complex functional relationships between variables from small amounts of data. In doing so, they draw on prior expectations about the form of these relationships. In three experiments, we show that people learn to adjust these expectations through experience, learning about the likely forms of the functions they will encounter. Previous work has used Gaussian processes-a statistical framework that extends Bayesian nonparametric approaches to regression-to model human function learning. We build on this work, modeling the process of learning to learn functions as a form of hierarchical Bayesian inference about the Gaussian process hyperparameters.
Topics: Humans; Bayes Theorem; Learning; Normal Distribution; Models, Psychological
PubMed: 37051879
DOI: 10.1111/cogs.13262 -
Computational Intelligence and... 2022In this study, a new one-parameter count distribution is proposed by combining Poisson and XLindley distributions. Some of its statistical and reliability properties...
In this study, a new one-parameter count distribution is proposed by combining Poisson and XLindley distributions. Some of its statistical and reliability properties including order statistics, hazard rate function, reversed hazard rate function, mode, factorial moments, probability generating function, moment generating function, index of dispersion, Shannon entropy, Mills ratio, mean residual life function, and associated measures are investigated. All these properties can be expressed in explicit forms. It is found that the new probability mass function can be utilized to model positively skewed data with leptokurtic shape. Moreover, the new discrete distribution is considered a proper tool to model equi- and over-dispersed phenomena with increasing hazard rate function. The distribution parameter is estimated by different six estimation approaches, and the behavior of these methods is explored using the Monte Carlo simulation. Finally, two applications to real life are presented herein to illustrate the flexibility of the new model.
Topics: Computer Simulation; Likelihood Functions; Models, Statistical; Monte Carlo Method; Poisson Distribution; Reproducibility of Results; Statistical Distributions
PubMed: 35463286
DOI: 10.1155/2022/6503670 -
Biometrical Journal. Biometrische... Oct 2022The features in a high-dimensional biomedical prediction problem are often well described by low-dimensional latent variables (or factors). We use this to include...
The features in a high-dimensional biomedical prediction problem are often well described by low-dimensional latent variables (or factors). We use this to include unlabeled features and additional information on the features when building a prediction model. Such additional feature information is often available in biomedical applications. Examples are annotation of genes, metabolites, or p-values from a previous study. We employ a Bayesian factor regression model that jointly models the features and the outcome using Gaussian latent variables. We fit the model using a computationally efficient variational Bayes method, which scales to high dimensions. We use the extra information to set up a prior model for the features in terms of hyperparameters, which are then estimated through empirical Bayes. The method is demonstrated in simulations and two applications. One application considers influenza vaccine efficacy prediction based on microarray data. The second application predicts oral cancer metastasis from RNAseq data.
Topics: Algorithms; Bayes Theorem; Normal Distribution; Research Design
PubMed: 35730912
DOI: 10.1002/bimj.202100105 -
Journal of the Mechanical Behavior of... Feb 2024Aseptic loosening due to mechanical failure of bone cement is considered to be a leading cause of revision of joint replacement systems. Detailed quantified information...
Aseptic loosening due to mechanical failure of bone cement is considered to be a leading cause of revision of joint replacement systems. Detailed quantified information on the number, size and distribution pattern of pores can help to obtain a deeper understanding of the bone cement's fatigue behavior. The objective of this study was to provide statistical descriptions for the pore distribution characteristics of laboratory bone cement specimens with different amounts of antibiotic contents. For four groups of bone cement (Palacos) specimens, containing 0.3, 0.6, 1.2 and 2.4 wt/wt% of telavancin antibiotic, seven samples per group were micro computed tomography scanned (38.97 μm voxel size). The images were first preprocessed in Mimics and then analyzed in Dragonfly, with the level of threshold being set such that single-pixel pores become visible. The normalized pore volume data of the specimens were then used to extract the logarithmic histograms of the pore densities for antibiotic groups, as well as their three-parameter Weibull probability density functions. Statistical comparison of the pore distribution data of the antibiotic groups using the Mann-Whitney non-parametric test revealed a significantly larger porosity (p < 0.05) in groups with larger added antibiotic contents (2.4 and 0.6 wt/wt% vs 0.3 wt/wt%). Further analysis revealed that this effect was associated with the significantly larger frequency of micropores of 0.1-0.5 mm diameter (p < 0.05) in groups with larger antibiotic content (2.4 wt/wt% vs and 0.6 and 0.3 wt/wt%), implying that the elution of the added antibiotic produces micropores in this diameter range mainly. Based on this observation and the fatigue test results in the literature, it was suggested that micropore clusters have a detrimental effect on the mechanical properties of bone cement and play a major role in initiating fatigue cracks in highly antibiotic added specimens.
Topics: Animals; Polymethyl Methacrylate; Anti-Bacterial Agents; Bone Cements; Odonata; X-Ray Microtomography; Statistical Distributions
PubMed: 38100980
DOI: 10.1016/j.jmbbm.2023.106297 -
Bioinformatics (Oxford, England) Nov 2022Gaussian graphical models (GGMs) are network representations of random variables (as nodes) and their partial correlations (as edges). GGMs overcome the challenges of...
MOTIVATION
Gaussian graphical models (GGMs) are network representations of random variables (as nodes) and their partial correlations (as edges). GGMs overcome the challenges of high-dimensional data analysis by using shrinkage methodologies. Therefore, they have become useful to reconstruct gene regulatory networks from gene-expression profiles. However, it is often ignored that the partial correlations are 'shrunk' and that they cannot be compared/assessed directly. Therefore, accurate (differential) network analyses need to account for the number of variables, the sample size, and also the shrinkage value, otherwise, the analysis and its biological interpretation would turn biased. To date, there are no appropriate methods to account for these factors and address these issues.
RESULTS
We derive the statistical properties of the partial correlation obtained with the Ledoit-Wolf shrinkage. Our result provides a toolbox for (differential) network analyses as (i) confidence intervals, (ii) a test for zero partial correlation (null-effects) and (iii) a test to compare partial correlations. Our novel (parametric) methods account for the number of variables, the sample size and the shrinkage values. Additionally, they are computationally fast, simple to implement and require only basic statistical knowledge. Our simulations show that the novel tests perform better than DiffNetFDR-a recently published alternative-in terms of the trade-off between true and false positives. The methods are demonstrated on synthetic data and two gene-expression datasets from Escherichia coli and Mus musculus.
AVAILABILITY AND IMPLEMENTATION
The R package with the methods and the R script with the analysis are available in https://github.com/V-Bernal/GeneNetTools.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Mice; Animals; Normal Distribution; Gene Regulatory Networks; Sample Size; Gene Expression
PubMed: 36179082
DOI: 10.1093/bioinformatics/btac657 -
International Journal of Environmental... Jun 2023To characterize the pollutant dispersal across major metropolitan cities in India, daily particulate matter (PM and PM) data for the study areas were collected from the...
To characterize the pollutant dispersal across major metropolitan cities in India, daily particulate matter (PM and PM) data for the study areas were collected from the National Air Quality Monitoring stations database provided by the Central Pollution Control Board (CPCB) of India. The data were analysed for three temporal ranges, i.e. before the pandemic-induced lockdown, during the lockdown, and after the upliftment of lockdown restrictions. For the purpose, the time scale ranged from 1st April to 31st May for the years 2019 (pre), 2020, and 2021 (post). Statistical distributions (lognormal, Weibull, and Gamma), aerosol optical thickness, and back trajectories were assessed for all three time periods. Most cities followed the lognormal distribution for PM during the lockdown period except Mumbai and Hyderabad. For PM, all the regions followed the lognormal distribution. Delhi and Kolkata observed a maximum decline in particulate pollution of 41% and 52% for PM and 49% and 53% for PM, respectively. Air mass back trajectory suggests local transmission of air mass during the lockdown period, and an undeniable decline in aerosol optical thickness was observed from the MODIS sensor. It can be concluded that statistical distribution analysis coupled with pollution models can be a counterpart in studying the dispersal and developing pollution abatement policies for specific sites. Moreover, incorporating remote sensing in pollution study can enhance the knowledge about the origin and movement of air parcels and can be helpful in taking decisions beforehand.
PubMed: 37360554
DOI: 10.1007/s13762-023-05025-1 -
PloS One 2021Disease mapping aims to determine the underlying disease risk from scattered epidemiological data and to represent it on a smoothed colored map. This methodology is...
Disease mapping aims to determine the underlying disease risk from scattered epidemiological data and to represent it on a smoothed colored map. This methodology is based on Bayesian inference and is classically dedicated to non-infectious diseases whose incidence is low and whose cases distribution is spatially (and eventually temporally) structured. Over the last decades, disease mapping has received many major improvements to extend its scope of application: integrating the temporal dimension, dealing with missing data, taking into account various a prioris (environmental and population covariates, assumptions concerning the repartition and the evolution of the risk), dealing with overdispersion, etc. We aim to adapt this approach to model rare infectious diseases proposing specific and generic variants of this methodology. In the context of a contagious disease, the outcome of a primary case can in addition generate secondary occurrences of the pathology in a close spatial and temporal neighborhood; this can result in local overdispersion and in higher spatial and temporal dependencies due to direct and/or indirect transmission. In consequence, we test models including a Negative Binomial distribution (instead of the usual Poisson distribution) to deal with local overdispersion. We also use a specific spatio-temporal link in order to better model the stronger spatial and temporal dependencies due to the transmission of the disease. We have proposed and tested 60 Bayesian hierarchical models on 400 simulated datasets and bovine tuberculosis real data. This analysis shows the relevance of the CAR (Conditional AutoRegressive) processes to deal with the structure of the risk. We can also conclude that the negative binomial models outperform the Poisson models with a Gaussian noise to handle overdispersion. In addition our study provided relevant maps which are congruent with the real risk (simulated data) and with the knowledge concerning bovine tuberculosis (real data).
Topics: Animals; Bayes Theorem; Binomial Distribution; Cattle; Disease; Humans; Incidence; Models, Statistical; Poisson Distribution; Tuberculosis, Bovine
PubMed: 33439868
DOI: 10.1371/journal.pone.0222898 -
PloS One 2020One aim of data mining is the identification of interesting structures in data. For better analytical results, the basic properties of an empirical distribution, such as...
One aim of data mining is the identification of interesting structures in data. For better analytical results, the basic properties of an empirical distribution, such as skewness and eventual clipping, i.e. hard limits in value ranges, need to be assessed. Of particular interest is the question of whether the data originate from one process or contain subsets related to different states of the data producing process. Data visualization tools should deliver a clear picture of the univariate probability density distribution (PDF) for each feature. Visualization tools for PDFs typically use kernel density estimates and include both the classical histogram, as well as the modern tools like ridgeline plots, bean plots and violin plots. If density estimation parameters remain in a default setting, conventional methods pose several problems when visualizing the PDF of uniform, multimodal, skewed distributions and distributions with clipped data, For that reason, a new visualization tool called the mirrored density plot (MD plot), which is specifically designed to discover interesting structures in continuous features, is proposed. The MD plot does not require adjusting any parameters of density estimation, which is what may make the use of this plot compelling particularly to non-experts. The visualization tools in question are evaluated against statistical tests with regard to typical challenges of explorative distribution analysis. The results of the evaluation are presented using bimodal Gaussian, skewed distributions and several features with already published PDFs. In an exploratory data analysis of 12 features describing quarterly financial statements, when statistical testing poses a great difficulty, only the MD plots can identify the structure of their PDFs. In sum, the MD plot outperforms the above mentioned methods.
Topics: Algorithms; Data Interpretation, Statistical; Data Mining; Data Visualization; Humans; Monte Carlo Method; Normal Distribution; Probability; Software; Stochastic Processes
PubMed: 33052923
DOI: 10.1371/journal.pone.0238835