-
Annals of Cardiac Anaesthesia 2019Descriptive statistics are an important part of biomedical research which is used to describe the basic features of the data in the study. They provide simple summaries...
Descriptive statistics are an important part of biomedical research which is used to describe the basic features of the data in the study. They provide simple summaries about the sample and the measures. Measures of the central tendency and dispersion are used to describe the quantitative data. For the continuous data, test of the normality is an important step for deciding the measures of central tendency and statistical methods for data analysis. When our data follow normal distribution, parametric tests otherwise nonparametric methods are used to compare the groups. There are different methods used to test the normality of data, including numerical and visual methods, and each method has its own advantages and disadvantages. In the present study, we have discussed the summary measures and methods used to test the normality of the data.
Topics: Biomedical Research; Data Interpretation, Statistical; Humans; Normal Distribution
PubMed: 30648682
DOI: 10.4103/aca.ACA_157_18 -
CPT: Pharmacometrics & Systems... Nov 2021Metaheuristics is a powerful optimization tool that is increasingly used across disciplines to tackle general purpose optimization problems. Nature-inspired... (Review)
Review
Metaheuristics is a powerful optimization tool that is increasingly used across disciplines to tackle general purpose optimization problems. Nature-inspired metaheuristic algorithms is a subclass of metaheuristic algorithms and have been shown to be particularly flexible and useful in solving complicated optimization problems in computer science and engineering. A common practice with metaheuristics is to hybridize it with another suitably chosen algorithm for enhanced performance. This paper reviews metaheuristic algorithms and demonstrates some of its utility in tackling pharmacometric problems. Specifically, we provide three applications using one of its most celebrated members, particle swarm optimization (PSO), and show that PSO can effectively estimate parameters in complicated nonlinear mixed-effects models and to gain insights into statistical identifiability issues in a complex compartment model. In the third application, we demonstrate how to hybridize PSO with sparse grid, which is an often-used technique to evaluate high dimensional integrals, to search for -efficient designs for estimating parameters in nonlinear mixed-effects models with a count outcome. We also show the proposed hybrid algorithm outperforms its competitors when sparse grid is replaced by its competitor, adaptive gaussian quadrature to approximate the integral, or when PSO is replaced by three notable nature-inspired metaheuristic algorithms.
Topics: Algorithms; Computer Simulation; Humans; Normal Distribution
PubMed: 34562342
DOI: 10.1002/psp4.12714 -
Biostatistics (Oxford, England) Apr 2018Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global...
Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.
Topics: Biostatistics; Data Interpretation, Statistical; Genomics; High-Throughput Nucleotide Sequencing; Humans; Models, Statistical
PubMed: 29036413
DOI: 10.1093/biostatistics/kxx028 -
Cognition Oct 2022Humans can rapidly estimate the statistical properties of groups of stimuli, including their average and variability. But recent studies of so-called Feature...
Humans can rapidly estimate the statistical properties of groups of stimuli, including their average and variability. But recent studies of so-called Feature Distribution Learning (FDL) have shown that observers can quickly learn even more complex aspects of feature distributions. In FDL, observers learn the full shape of a distribution of features in a set of distractor stimuli and use this information to improve visual search: response times (RT) are slowed if the target feature lies inside the previous distractor distribution, and the RT patterns closely reflect the distribution shape. FDL requires only a few trials and is markedly sensitive to different distribution types. It is unknown, however, whether our perceptual system encodes feature distributions automatically and by passive exposure, or whether this learning requires active engagement with the stimuli. In two experiments, we sought to answer this question. During an initial exposure stage, participants passively viewed a display of 36 lines that included one orientation singleton or no singletons. In the following search display, they had to find an oddly oriented target. The orientations of the lines were determined either by a Gaussian or a uniform distribution. We found evidence for FDL only when the passive trials contained an orientation singleton. Under these conditions, RT's decreased as a function of the orientation distance between the target and the mean of the exposed distractor distribution. These results suggest that passive exposure to a distribution of visual features can affect subsequent search performance, but only if a singleton appears during exposure to the distribution.
Topics: Attention; Humans; Learning; Reaction Time; Statistical Distributions; Visual Perception
PubMed: 35785655
DOI: 10.1016/j.cognition.2022.105211 -
Caries Research 2018Oral epidemiology involves studying and investigating the distribution and determinants of dental-related diseases in a specified population group to inform decisions in... (Review)
Review
Oral epidemiology involves studying and investigating the distribution and determinants of dental-related diseases in a specified population group to inform decisions in the management of health problems. In oral epidemiology studies, the hypothesis is typically followed by a cogent study design and data collection. Appropriate statistical analysis is essential to demonstrate the scientific association between the independent factors and the target variable. Analysis also helps to develop and build a statistical model. Poisson regression and its extensions have gained more attention in caries epidemiology than other working models such as logistic regression. This review discusses the fundamental principles and basic knowledge of Poisson regression models. It also introduces the use of a robust variance estimator with a focus on the "robust" interpretation of the model. In addition, extensions of regression models, including the zero-inflated model, hurdle model, and negative binomial model, and their interpretation in caries studies are reviewed. Principles of model fitting, including goodness-of-fit measures, are also discussed. Clinicians and researchers should pay attention to the statistical context of the models used and interpret the models to improve the oral and general health of the communities in which they live.
Topics: Data Interpretation, Statistical; Dental Caries; Humans; Poisson Distribution; Regression Analysis
PubMed: 29478049
DOI: 10.1159/000486970 -
Biometrical Journal. Biometrische... Nov 2018Meta-analysis is a widely used statistical technique. The simplicity of the calculations required when performing conventional meta-analyses belies the parametric nature... (Review)
Review
Meta-analysis is a widely used statistical technique. The simplicity of the calculations required when performing conventional meta-analyses belies the parametric nature of the assumptions that justify them. In particular, the normal distribution is extensively, and often implicitly, assumed. Here, we review how the normal distribution is used in meta-analysis. We discuss when the normal distribution is likely to be adequate and also when it should be avoided. We discuss alternative and more advanced methods that make less use of the normal distribution. We conclude that statistical methods that make fewer normality assumptions should be considered more often in practice. In general, statisticians and applied analysts should understand the assumptions made by their statistical analyses. They should also be able to defend these assumptions. Our hope is that this article will foster a greater appreciation of the extent to which assumptions involving the normal distribution are made in statistical methods for meta-analysis. We also hope that this article will stimulate further discussion and methodological work.
Topics: Aversive Therapy; C-Reactive Protein; Humans; Meta-Analysis as Topic; Normal Distribution; Smoking Cessation; Statistics as Topic
PubMed: 30062789
DOI: 10.1002/bimj.201800071 -
BMC Bioinformatics May 2022Cluster algorithms are gaining in popularity in biomedical research due to their compelling ability to identify discrete subgroups in data, and their increasing...
BACKGROUND
Cluster algorithms are gaining in popularity in biomedical research due to their compelling ability to identify discrete subgroups in data, and their increasing accessibility in mainstream software. While guidelines exist for algorithm selection and outcome evaluation, there are no firmly established ways of computing a priori statistical power for cluster analysis. Here, we estimated power and classification accuracy for common analysis pipelines through simulation. We systematically varied subgroup size, number, separation (effect size), and covariance structure. We then subjected generated datasets to dimensionality reduction approaches (none, multi-dimensional scaling, or uniform manifold approximation and projection) and cluster algorithms (k-means, agglomerative hierarchical clustering with Ward or average linkage and Euclidean or cosine distance, HDBSCAN). Finally, we directly compared the statistical power of discrete (k-means), "fuzzy" (c-means), and finite mixture modelling approaches (which include latent class analysis and latent profile analysis).
RESULTS
We found that clustering outcomes were driven by large effect sizes or the accumulation of many smaller effects across features, and were mostly unaffected by differences in covariance structure. Sufficient statistical power was achieved with relatively small samples (N = 20 per subgroup), provided cluster separation is large (Δ = 4). Finally, we demonstrated that fuzzy clustering can provide a more parsimonious and powerful alternative for identifying separable multivariate normal distributions, particularly those with slightly lower centroid separation (Δ = 3).
CONCLUSIONS
Traditional intuitions about statistical power only partially apply to cluster analysis: increasing the number of participants above a sufficient sample size did not improve power, but effect size was crucial. Notably, for the popular dimensionality reduction and clustering algorithms tested here, power was only satisfactory for relatively large effect sizes (clear separation between subgroups). Fuzzy clustering provided higher power in multivariate normal distributions. Overall, we recommend that researchers (1) only apply cluster analysis when large subgroup separation is expected, (2) aim for sample sizes of N = 20 to N = 30 per expected subgroup, (3) use multi-dimensional scaling to improve cluster separation, and (4) use fuzzy clustering or mixture modelling approaches that are more powerful and more parsimonious with partially overlapping multivariate normal distributions.
Topics: Algorithms; Cluster Analysis; Humans; Normal Distribution; Sample Size; Software
PubMed: 35641905
DOI: 10.1186/s12859-022-04675-1 -
PloS One 2022RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A...
RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution.
Topics: Binomial Distribution; High-Throughput Nucleotide Sequencing; RNA-Seq; Sample Size; Sequence Analysis, RNA
PubMed: 36112652
DOI: 10.1371/journal.pone.0264246 -
PloS One 2023Testing whether data are from a normal distribution is a traditional problem and is of great concern for data analyses. The normality is the premise of many statistical...
Testing whether data are from a normal distribution is a traditional problem and is of great concern for data analyses. The normality is the premise of many statistical methods, such as t-test, Hotelling T2 test and ANOVA. There are numerous tests in the literature and the commonly used ones are Anderson-Darling test, Shapiro-Wilk test and Jarque-Bera test. Each test has its own advantageous points since they are developed for specific patterns and there is no method that consistently performs optimally in all situations. Since the data distribution of practical problems can be complex and diverse, we propose a Cauchy Combination Omnibus Test (CCOT) that is robust and valid in most data cases. We also give some theoretical results to analyze the good properties of CCOT. Two obvious advantages of CCOT are that not only does CCOT have a display expression for calculating statistical significance, but extensive simulation results show its robustness regardless of the shape of distribution the data comes from. Applications to South African Heart Disease and Neonatal Hearing Impairment data further illustrate its practicability.
Topics: Computer Simulation; Normal Distribution; Sample Size; Data Analysis
PubMed: 37535617
DOI: 10.1371/journal.pone.0289498 -
Nature Communications Apr 2023Mass spectrometry imaging vows to enable simultaneous spatially resolved investigation of hundreds of metabolites in tissues, but it primarily relies on traditional ion...
Mass spectrometry imaging vows to enable simultaneous spatially resolved investigation of hundreds of metabolites in tissues, but it primarily relies on traditional ion images for non-data-driven metabolite visualization and analysis. The rendering and interpretation of ion images neither considers nonlinearities in the resolving power of mass spectrometers nor does it yet evaluate the statistical significance of differential spatial metabolite abundance. Here, we outline the computational framework moleculaR ( https://github.com/CeMOS-Mannheim/moleculaR ) that is expected to improve signal reliability by data-dependent Gaussian-weighting of ion intensities and that introduces probabilistic molecular mapping of statistically significant nonrandom patterns of relative spatial abundance of metabolites-of-interest in tissue. moleculaR also enables cross-tissue statistical comparisons and collective molecular projections of entire biomolecular ensembles followed by their spatial statistical significance evaluation on a single tissue plane. It thereby fosters the spatially resolved investigation of ion milieus, lipid remodeling pathways, or complex scores like the adenylate energy charge within the same image.
Topics: Reproducibility of Results; Mass Spectrometry; Diagnostic Imaging; Normal Distribution
PubMed: 37005414
DOI: 10.1038/s41467-023-37394-z