-
CPT: Pharmacometrics & Systems... Nov 2021Metaheuristics is a powerful optimization tool that is increasingly used across disciplines to tackle general purpose optimization problems. Nature-inspired... (Review)
Review
Metaheuristics is a powerful optimization tool that is increasingly used across disciplines to tackle general purpose optimization problems. Nature-inspired metaheuristic algorithms is a subclass of metaheuristic algorithms and have been shown to be particularly flexible and useful in solving complicated optimization problems in computer science and engineering. A common practice with metaheuristics is to hybridize it with another suitably chosen algorithm for enhanced performance. This paper reviews metaheuristic algorithms and demonstrates some of its utility in tackling pharmacometric problems. Specifically, we provide three applications using one of its most celebrated members, particle swarm optimization (PSO), and show that PSO can effectively estimate parameters in complicated nonlinear mixed-effects models and to gain insights into statistical identifiability issues in a complex compartment model. In the third application, we demonstrate how to hybridize PSO with sparse grid, which is an often-used technique to evaluate high dimensional integrals, to search for -efficient designs for estimating parameters in nonlinear mixed-effects models with a count outcome. We also show the proposed hybrid algorithm outperforms its competitors when sparse grid is replaced by its competitor, adaptive gaussian quadrature to approximate the integral, or when PSO is replaced by three notable nature-inspired metaheuristic algorithms.
Topics: Algorithms; Computer Simulation; Humans; Normal Distribution
PubMed: 34562342
DOI: 10.1002/psp4.12714 -
Revista Latino-americana de Enfermagem 2020to assess the effect of a breastfeeding educational intervention on the counseling provided to postpartum women. (Randomized Controlled Trial)
Randomized Controlled Trial
OBJECTIVE
to assess the effect of a breastfeeding educational intervention on the counseling provided to postpartum women.
METHOD
this is a randomized controlled trial including 104 postpartum women (intervention group = 52 and control group = 52) from a private hospital, whose educational intervention was based on the pragmatic theory and on the use of a soft-hard technology called Breastfeeding Educational Kit (Kit Educativo para Aleitamento Materno, KEAM). Women were followed-up for up to 60 days after childbirth. Chi-Squared Test, Fischer's Exact Test, and Generalized Estimating Equation were used, with a significance level of 5% (p-value <0.05). The analyses were performed using the Statistical Package for the Social Sciences, version 24.
RESULTS
the postpartum women in the intervention group had fewer breastfeeding difficulties and a higher percentage of exclusive breastfeeding at all time points compared with those in the control group.
CONCLUSION
the educational intervention based on active methodologies and stimulating instructional resources was effective in developing greater practical mastery among postpartum women with regard to adherence and maintenance of exclusive breastfeeding. Registry REBEC RBR - 8p9v7v.
Topics: Breast Feeding; Chi-Square Distribution; Delivery, Obstetric; Female; Humans; Parturition; Patient Education as Topic; Postpartum Period; Pregnancy
PubMed: 33027400
DOI: 10.1590/1518-8345.3081.3335 -
Cognition Oct 2022Humans can rapidly estimate the statistical properties of groups of stimuli, including their average and variability. But recent studies of so-called Feature...
Humans can rapidly estimate the statistical properties of groups of stimuli, including their average and variability. But recent studies of so-called Feature Distribution Learning (FDL) have shown that observers can quickly learn even more complex aspects of feature distributions. In FDL, observers learn the full shape of a distribution of features in a set of distractor stimuli and use this information to improve visual search: response times (RT) are slowed if the target feature lies inside the previous distractor distribution, and the RT patterns closely reflect the distribution shape. FDL requires only a few trials and is markedly sensitive to different distribution types. It is unknown, however, whether our perceptual system encodes feature distributions automatically and by passive exposure, or whether this learning requires active engagement with the stimuli. In two experiments, we sought to answer this question. During an initial exposure stage, participants passively viewed a display of 36 lines that included one orientation singleton or no singletons. In the following search display, they had to find an oddly oriented target. The orientations of the lines were determined either by a Gaussian or a uniform distribution. We found evidence for FDL only when the passive trials contained an orientation singleton. Under these conditions, RT's decreased as a function of the orientation distance between the target and the mean of the exposed distractor distribution. These results suggest that passive exposure to a distribution of visual features can affect subsequent search performance, but only if a singleton appears during exposure to the distribution.
Topics: Attention; Humans; Learning; Reaction Time; Statistical Distributions; Visual Perception
PubMed: 35785655
DOI: 10.1016/j.cognition.2022.105211 -
Journal of Epidemiology and Global... Jun 2021This manuscript brings attention to inaccurate epidemiological concepts that emerged during the COVID-19 pandemic. In social media and scientific journals, some wrong...
This manuscript brings attention to inaccurate epidemiological concepts that emerged during the COVID-19 pandemic. In social media and scientific journals, some wrong references were given to a "normal epidemic curve" and also to a "log-normal curve/distribution". For many years, textbooks and courses of reputable institutions and scientific journals have disseminated misleading concepts. For example, calling histogram to plots of epidemic curves or using epidemic data to introduce the concept of a Gaussian distribution, ignoring its temporal indexing. Although an epidemic curve may look like a Gaussian curve and be eventually modelled by a Gauss function, it is not a normal distribution or a log-normal, as some authors claim. A pandemic produces highly-complex data and to tackle it effectively statistical and mathematical modelling need to go beyond the "one-size-fits-all solution". Classical textbooks need to be updated since pandemics happen and epidemiology needs to provide reliable information to policy recommendations and actions.
Topics: COVID-19; Epidemiologic Research Design; Humans; Models, Statistical; Normal Distribution; Pandemics; Reproducibility of Results; SARS-CoV-2
PubMed: 33605119
DOI: 10.2991/jegh.k.210108.001 -
BMC Bioinformatics May 2022Cluster algorithms are gaining in popularity in biomedical research due to their compelling ability to identify discrete subgroups in data, and their increasing...
BACKGROUND
Cluster algorithms are gaining in popularity in biomedical research due to their compelling ability to identify discrete subgroups in data, and their increasing accessibility in mainstream software. While guidelines exist for algorithm selection and outcome evaluation, there are no firmly established ways of computing a priori statistical power for cluster analysis. Here, we estimated power and classification accuracy for common analysis pipelines through simulation. We systematically varied subgroup size, number, separation (effect size), and covariance structure. We then subjected generated datasets to dimensionality reduction approaches (none, multi-dimensional scaling, or uniform manifold approximation and projection) and cluster algorithms (k-means, agglomerative hierarchical clustering with Ward or average linkage and Euclidean or cosine distance, HDBSCAN). Finally, we directly compared the statistical power of discrete (k-means), "fuzzy" (c-means), and finite mixture modelling approaches (which include latent class analysis and latent profile analysis).
RESULTS
We found that clustering outcomes were driven by large effect sizes or the accumulation of many smaller effects across features, and were mostly unaffected by differences in covariance structure. Sufficient statistical power was achieved with relatively small samples (N = 20 per subgroup), provided cluster separation is large (Δ = 4). Finally, we demonstrated that fuzzy clustering can provide a more parsimonious and powerful alternative for identifying separable multivariate normal distributions, particularly those with slightly lower centroid separation (Δ = 3).
CONCLUSIONS
Traditional intuitions about statistical power only partially apply to cluster analysis: increasing the number of participants above a sufficient sample size did not improve power, but effect size was crucial. Notably, for the popular dimensionality reduction and clustering algorithms tested here, power was only satisfactory for relatively large effect sizes (clear separation between subgroups). Fuzzy clustering provided higher power in multivariate normal distributions. Overall, we recommend that researchers (1) only apply cluster analysis when large subgroup separation is expected, (2) aim for sample sizes of N = 20 to N = 30 per expected subgroup, (3) use multi-dimensional scaling to improve cluster separation, and (4) use fuzzy clustering or mixture modelling approaches that are more powerful and more parsimonious with partially overlapping multivariate normal distributions.
Topics: Algorithms; Cluster Analysis; Humans; Normal Distribution; Sample Size; Software
PubMed: 35641905
DOI: 10.1186/s12859-022-04675-1 -
Medical Physics Oct 2020Radiotherapy, especially with charged particles, is sensitive to executional and preparational uncertainties that propagate to uncertainty in dose and plan quality...
PURPOSE
Radiotherapy, especially with charged particles, is sensitive to executional and preparational uncertainties that propagate to uncertainty in dose and plan quality indicators, for example, dose-volume histograms (DVHs). Current approaches to quantify and mitigate such uncertainties rely on explicitly computed error scenarios and are thus subject to statistical uncertainty and limitations regarding the underlying uncertainty model. Here we present an alternative, analytical method to approximate moments, in particular expectation value and (co)variance, of the probability distribution of DVH-points, and evaluate its accuracy on patient data.
METHODS
We use Analytical Probabilistic Modeling (APM) to derive moments of the probability distribution over individual DVH-points based on the probability distribution over dose. By using the computed moments to parameterize distinct probability distributions over DVH-points (here normal or beta distributions), not only the moments but also percentiles, that is, α - DVHs, are computed. The model is subsequently evaluated on three patient cases (intracranial, paraspinal, prostate) in 30- and single-fraction scenarios by assuming the dose to follow a multivariate normal distribution, whose moments are computed in closed-form with APM. The results are compared to a benchmark based on discrete random sampling.
RESULTS
The evaluation of the new probabilistic model on the three patient cases against a sampling benchmark proves its correctness under perfect assumptions as well as good agreement in realistic conditions. More precisely, ca. 90% of all computed expected DVH-points and their standard deviations agree within 1% volume with their empirical counterpart from sampling computations, for both fractionated and single fraction treatments. When computing α - DVH, the assumption of a beta distribution achieved better agreement with empirical percentiles than the assumption of a normal distribution: While in both cases probabilities locally showed large deviations (up to ±0.2), the respective - DVHs for α={0.05,0.5,0.95} only showed small deviations in respective volume (up to ±5% volume for a normal distribution, and up to 2% for a beta distribution). A previously published model from literature, which was included for comparison, exhibited substantially larger deviations.
CONCLUSIONS
With APM we could derive a mathematically exact description of moments of probability distributions over DVH-points given a probability distribution over dose. The model generalizes previous attempts and performs well for both choices of probability distributions, that is, normal or beta distributions, over DVH-points.
Topics: Humans; Male; Models, Statistical; Normal Distribution; Probability; Radiotherapy Dosage; Radiotherapy Planning, Computer-Assisted
PubMed: 32740930
DOI: 10.1002/mp.14414 -
PloS One 2022RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A...
RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution.
Topics: Binomial Distribution; High-Throughput Nucleotide Sequencing; RNA-Seq; Sample Size; Sequence Analysis, RNA
PubMed: 36112652
DOI: 10.1371/journal.pone.0264246 -
PloS One 2023Testing whether data are from a normal distribution is a traditional problem and is of great concern for data analyses. The normality is the premise of many statistical...
Testing whether data are from a normal distribution is a traditional problem and is of great concern for data analyses. The normality is the premise of many statistical methods, such as t-test, Hotelling T2 test and ANOVA. There are numerous tests in the literature and the commonly used ones are Anderson-Darling test, Shapiro-Wilk test and Jarque-Bera test. Each test has its own advantageous points since they are developed for specific patterns and there is no method that consistently performs optimally in all situations. Since the data distribution of practical problems can be complex and diverse, we propose a Cauchy Combination Omnibus Test (CCOT) that is robust and valid in most data cases. We also give some theoretical results to analyze the good properties of CCOT. Two obvious advantages of CCOT are that not only does CCOT have a display expression for calculating statistical significance, but extensive simulation results show its robustness regardless of the shape of distribution the data comes from. Applications to South African Heart Disease and Neonatal Hearing Impairment data further illustrate its practicability.
Topics: Computer Simulation; Normal Distribution; Sample Size; Data Analysis
PubMed: 37535617
DOI: 10.1371/journal.pone.0289498 -
BMC Medical Research Methodology Oct 2022Count data from the national survey captures healthcare utilisation within a specific reference period, resulting in excess zeros and skewed positive tails. Often, it is...
BACKGROUND
Count data from the national survey captures healthcare utilisation within a specific reference period, resulting in excess zeros and skewed positive tails. Often, it is modelled using count data models. This study aims to identify the best-fitting model for outpatient healthcare utilisation using data from the Malaysian National Health and Morbidity Survey 2019 (NHMS 2019) and utilisation factors among adults in Malaysia.
METHODS
The frequency of outpatient visits is the dependent variable, and instrumental variable selection is based on Andersen's model. Six different models were used: ordinary least squares (OLS), Poisson regression, negative binomial regression (NB), inflated models: zero-inflated Poisson, marginalized-zero-inflated negative binomial (MZINB), and hurdle model. Identification of the best-fitting model was based on model selection criteria, goodness-of-fit and statistical test of the factors associated with outpatient visits.
RESULTS
The frequency of zero was 90%. Of the sample, 8.35% of adults utilized healthcare services only once, and 1.04% utilized them twice. The mean-variance value varied between 0.14 and 0.39. Across six models, the zero-inflated model (ZIM) possesses the smallest log-likelihood, Akaike information criterion, Bayesian information criterion, and a positive Vuong corrected value. Fourteen instrumental variables, five predisposing factors, six enablers, and three need factors were identified. Data overdispersion is characterized by excess zeros, a large mean to variance value, and skewed positive tails. We assumed frequency and true zeros throughout the study reference period. ZIM is the best-fitting model based on the model selection criteria, smallest Root Mean Square Error (RMSE) and higher R2. Both Vuong corrected and uncorrected values with different Stata commands yielded positive values with small differences.
CONCLUSION
State as a place of residence, ethnicity, household income quintile, and health needs were significantly associated with healthcare utilisation. Our findings suggest using ZIM over traditional OLS. This study encourages the use of this count data model as it has a better fit, is easy to interpret, and has appropriate assumptions based on the survey methodology.
Topics: Adult; Ambulatory Care; Bayes Theorem; Humans; Models, Statistical; Outpatients; Patient Acceptance of Health Care; Poisson Distribution
PubMed: 36199028
DOI: 10.1186/s12874-022-01733-3 -
Statistical Methods in Medical Research Jan 2023We revisit several conditionally formulated Gaussian Markov random fields, known as the intrinsic conditional autoregressive model, the proper conditional autoregressive... (Review)
Review
We revisit several conditionally formulated Gaussian Markov random fields, known as the intrinsic conditional autoregressive model, the proper conditional autoregressive model, and the Leroux et al. conditional autoregressive model, as well as convolution models such as the well known Besag, York and Mollie model, its (adaptive) re-parameterization, and its scaled alternatives, for their roles of modelling underlying spatial risks in Bayesian disease mapping. Analytic and simulation studies, with graphic visualizations, and disease mapping case studies, present insights and critique on these models for their nature and capacities in characterizing spatial dependencies, local influences, and spatial covariance and correlation functions, and in facilitating stabilized and efficient posterior risk prediction and inference. It is illustrated that these models are Gaussian (Markov) random fields of different spatial dependence, local influence, and (covariance) correlation functions and can play different and complementary roles in Bayesian disease mapping applications.
Topics: Bayes Theorem; Computer Simulation; Normal Distribution; Spatial Analysis; Models, Statistical
PubMed: 36317373
DOI: 10.1177/09622802221129040