statistical distribution - OpenMD.com Journal Search

Computational Modeling Insights into Extreme Heterogeneity in COVID-19 Nasal Swab Data.

Viruses Dec 2023

Throughout the COVID-19 pandemic, an unprecedented level of clinical nasal swab data from around the globe has been collected and shared. Positive tests have...

Summary PubMed Full Text PDF

Authors: Leyi Zhang, Han Cao, Karen Medlin...

Throughout the COVID-19 pandemic, an unprecedented level of clinical nasal swab data from around the globe has been collected and shared. Positive tests have consistently revealed viral titers spanning six orders of magnitude! An open question is whether such extreme population heterogeneity is unique to SARS-CoV-2 or possibly generic to viral respiratory infections. To probe this question, we turn to the computational modeling of nasal tract infections. Employing a physiologically faithful, spatially resolved, stochastic model of respiratory tract infection, we explore the statistical distribution of human nasal infections in the immediate 48 h of infection. The spread, or heterogeneity, of the distribution derives from variations in factors within the model that are unique to the infected host, infectious variant, and timing of the test. Hypothetical factors include: (1) reported physiological differences between infected individuals (nasal mucus thickness and clearance velocity); (2) differences in the kinetics of infection, replication, and shedding of viral RNA copies arising from the unique interactions between the host and viral variant; and (3) differences in the time between initial cell infection and the clinical test. Since positive clinical tests are often pre-symptomatic and independent of prior infection or vaccination status, in the model we assume immune evasion throughout the immediate 48 h of infection. Model simulations generate the mean statistical outcomes of total shed viral load and infected cells throughout 48 h for each "virtual individual", which we define as each fixed set of model parameters (1) and (2) above. The "virtual population" and the statistical distribution of outcomes over the population are defined by collecting clinically and experimentally guided ranges for the full set of model parameters (1) and (2). This establishes a model-generated "virtual population database" of nasal viral titers throughout the initial 48 h of infection of every individual, which we then compare with clinical swab test data. Support for model efficacy comes from the sampling of infection dynamics over the virtual population database, which reproduces the six-order-of-magnitude clinical population heterogeneity. However, the goal of this study is to answer a deeper biological and clinical question. To answer this question, global data analysis methods are applied to the virtual population database that sample across the entire database and de-correlate (i.e., isolate) the dynamic infection outcome sensitivities of each model parameter. These methods predict the dominant, indeed exponential, driver of population heterogeneity in dynamic infection outcomes is the latency time of infected cells (from the moment of infection until onset of viral RNA shedding). The shedding rate of the viral RNA of infected cells in the shedding phase is a strong, but not exponential, driver of infection. Furthermore, the unknown timing of the nasal swab test relative to the onset of infection is an equally dominant contributor to extreme population heterogeneity in clinical test data since infectious viral loads grow from undetectable levels to more than six orders of magnitude within 48 h.

Topics: Humans; COVID-19; SARS-CoV-2; Pandemics; Common Cold; Computer Simulation; RNA, Viral

PubMed: 38257769
DOI: 10.3390/v16010069

pressuRe: an R package for analyzing and visualizing biomechanical pressure distribution data.

Scientific Reports Oct 2023

In many biomechanical analyses, the forces acting on a body during dynamic and static activities are often simplified as point loads. However, it is usually more...

Summary PubMed Full Text PDF

Authors: Scott Telfer, Ellen Y Li

In many biomechanical analyses, the forces acting on a body during dynamic and static activities are often simplified as point loads. However, it is usually more accurate to characterize these forces as distributed loads, varying in magnitude and direction, over a given contact area. Evaluating these pressure distributions while they are applied to different parts of the body can provide effective insights for clinicians and researchers when studying health and disease conditions, for example when investigating the biomechanical factors that may lead to plantar ulceration in diabetic foot disease. At present, most processing and analysis for pressure data is performed using proprietary software, limiting reproducibility, transparency, and consistency across different studies. This paper describes an open-source software package, 'pressuRe', which is built in the freely available R statistical computing environment and is designed to process, analyze, and visualize pressure data collected on a range of different hardware systems in a standardized manner. We demonstrate the use of the package on pressure dataset from patients with diabetic foot disease, comparing pressure variables between those with longer and shorter durations of the disease. The results matched closely with those from commercially available software, and individuals with longer duration of diabetes were found to have higher forefoot pressures than those with shorter duration. By utilizing R's powerful and openly available tools for statistical analysis and user customization, this package may be a useful tool for researchers and clinicians studying plantar pressures and other pressure sensor array based biomechanical measurements. With regular updates intended, this package allows for continued improvement and we welcome feedback and future contributions to extend its scope. In this article, we detail the package's features and functionality.

Topics: Humans; Diabetic Foot; Reproducibility of Results; Foot; Pressure; Biomechanical Phenomena

PubMed: 37798383
DOI: 10.1038/s41598-023-44041-6

China nationwide landscape of 16 types inherited metabolic disorders: a retrospective analysis on 372,255 clinical cases.

Orphanet Journal of Rare Diseases Aug 2023

Inherited metabolic disorders (IMDs) usually occurs at young age and hence it severely threatening the health and life of young people. While so far there lacks a...

Summary PubMed Full Text PDF

Authors: Beibei Zhao, Peichun Chen, Xuhui She...

BACKGROUND

Inherited metabolic disorders (IMDs) usually occurs at young age and hence it severely threatening the health and life of young people. While so far there lacks a comprehensive study which can reveals China's nationwide landscape of IMDs. This study aimed to evaluate IMDs incidence and regional distributions in China at a national and province level to guide clinicians and policy makers.

METHODS

The retrospective study conducted from January 2012 to March 2021, we analyzed and characterized 372255 cases' clinical test information and diagnostic data from KingMed Diagnostics Laboratory. The samples were from 32 provincial regions of China, the urine organic acids were detected by gas chromatography-mass spectrometry (GC-MS), amino acids and acylcarnitines in dried blood spots were detected by liquid chromatography-tandem mass spectrometry (LC-MS/MS). We did a statistical analysis of the distribution of the 16 most common IMDs in amino acid disorders and organic acidemias, and then paid special attention to analyze the age and regional distributions of different IMDs. The statistical analyses and visualization analysis were performed with the programming language R (version 4.2.1).

RESULTS

There were 4911 positive cases diagnosed, which was 1.32% of the total sample during the ten-year study period. Most diseases tended to occur at ages younger than 18 year-old. The Ornithine Transcarbamylase Deficiency tended to progress on male infants who were less than 28 days old. While the peak of the positive case number of Citrin Deficiency disease (CD) was at 1-6 months. Different IMDs' had different distribution patterns in China's provinces. Methylmalonic Acidemias and Hyperphenylalaninemia had an imbalanced distribution pattern in China and its positive rate was significantly higher in North China than South China. Conversely, the positive rate of CD was significantly higher in South China than North China.

CONCLUSIONS

Results of this work, such as the differences in distribution pattern of different diseases in terms of age, region, etc. provide important insights and references for clinicians, researchers and healthcare policy makers. The policy makers could optimize the better health screening programs for covering children and infants in specific ages and regions based on our findings.

Topics: Infant; Child; Humans; Male; Adolescent; Retrospective Studies; Chromatography, Liquid; Tandem Mass Spectrometry; Metabolic Diseases; China

PubMed: 37537594
DOI: 10.1186/s13023-023-02834-y

Phase-type distributions in mathematical population genetics: An emerging framework.

Theoretical Population Biology Jun 2024

A phase-type distribution is the time to absorption in a continuous- or discrete-time Markov chain. Phase-type distributions can be used as a general framework to... (Review)

Summary PubMed

Review

Authors: Asger Hobolth, Iker Rivas-González, Mogens Bladt...

A phase-type distribution is the time to absorption in a continuous- or discrete-time Markov chain. Phase-type distributions can be used as a general framework to calculate key properties of the standard coalescent model and many of its extensions. Here, the 'phases' in the phase-type distribution correspond to states in the ancestral process. For example, the time to the most recent common ancestor and the total branch length are phase-type distributed. Furthermore, the site frequency spectrum follows a multivariate discrete phase-type distribution and the joint distribution of total branch lengths in the two-locus coalescent-with-recombination model is multivariate phase-type distributed. In general, phase-type distributions provide a powerful mathematical framework for coalescent theory because they are analytically tractable using matrix manipulations. The purpose of this review is to explain the phase-type theory and demonstrate how the theory can be applied to derive basic properties of coalescent models. These properties can then be used to obtain insight into the ancestral process, or they can be applied for statistical inference. In particular, we show the relation between classical first-step analysis of coalescent models and phase-type calculations. We also show how reward transformations in phase-type theory lead to easy calculation of covariances and correlation coefficients between e.g. tree height, tree length, external branch length, and internal branch length. Furthermore, we discuss how these quantities can be used for statistical inference based on estimating equations. Providing an alternative to previous work based on the Laplace transform, we derive likelihoods for small-size coalescent trees based on phase-type theory. Overall, our main aim is to demonstrate that phase-type distributions provide a convenient general set of tools to understand aspects of coalescent models that are otherwise difficult to derive. Throughout the review, we emphasize the versatility of the phase-type framework, which is also illustrated by our accompanying R-code. All our analyses and figures can be reproduced from code available on GitHub.

Topics: Genetics, Population; Markov Chains; Models, Genetic; Humans

PubMed: 38460602
DOI: 10.1016/j.tpb.2024.03.001

Modeling ocean distributions and abundances of natural- and hatchery-origin Chinook salmon stocks with integrated genetic and tagging data.

PeerJ 2023

Considerable resources are spent to track fish movement in marine environments, often with the intent of estimating behavior, distribution, and abundance. Resulting data...

Summary PubMed Full Text PDF

Authors: Alexander J Jensen, Ryan P Kelly, William H Satterthwaite...

BACKGROUND

Considerable resources are spent to track fish movement in marine environments, often with the intent of estimating behavior, distribution, and abundance. Resulting data from these monitoring efforts, including tagging studies and genetic sampling, often can be siloed. For Pacific salmon in the Northeast Pacific Ocean, predominant data sources for fish monitoring are coded wire tags (CWTs) and genetic stock identification (GSI). Despite their complementary strengths and weaknesses in coverage and information content, the two data streams rarely have been integrated to inform Pacific salmon biology and management. Joint, or integrated, models can combine and contextualize multiple data sources in a single statistical framework to produce more robust estimates of fish populations.

METHODS

We introduce and fit a comprehensive joint model that integrates data from CWT recoveries and GSI sampling to inform the marine life history of Chinook salmon stocks at spatial and temporal scales relevant to ongoing fisheries management efforts. In a departure from similar models based primarily on CWT recoveries, modeled stocks in the new framework encompass both hatchery- and natural-origin fish. We specifically model the spatial distribution and marine abundance of four distinct stocks with spawning locations in California and southern Oregon, one of which is listed under the U.S. Endangered Species Act.

RESULTS

Using the joint model, we generated the most comprehensive estimates of marine distribution to date for all modeled Chinook salmon stocks, including historically data poor and low abundance stocks. Estimated marine distributions from the joint model were broadly similar to estimates from a simpler, CWT-only model but did suggest some differences in distribution in select seasons. Model output also included novel stock-, year-, and season-specific estimates of marine abundance. We observed and partially addressed several challenges in model convergence with the use of supplemental data sources and model constraints; similar difficulties are not unexpected with integrated modeling. We identify several options for improved data collection that could address issues in convergence and increase confidence in model estimates of abundance. We expect these model advances and results provide management-relevant biological insights, with the potential to inform future mixed-stock fisheries management efforts, as well as a foundation for more expansive and comprehensive analyses to follow.

Topics: Animals; Salmon; Fisheries; Pacific Ocean; Endangered Species; Oncorhynchus

PubMed: 38047019
DOI: 10.7717/peerj.16487

Bayesian parametric models for survival prediction in medical applications.

BMC Medical Research Methodology Oct 2023

Evidence-based treatment decisions in medicine are made founded on population-level evidence obtained during randomized clinical trials. In an era of personalized...

Summary PubMed Full Text PDF

Authors: Iwan Paolucci, Yuan-Mao Lin, Jessica Albuquerque Marques Silva...

BACKGROUND

Evidence-based treatment decisions in medicine are made founded on population-level evidence obtained during randomized clinical trials. In an era of personalized medicine, these decisions should be based on the predicted benefit of a treatment on a patient-level. Survival prediction models play a central role as they incorporate the time-to-event and censoring. In medical applications uncertainty is critical especially when treatments differ in their side effect profiles or costs. Additionally, models must be adapted to local populations without diminishing performance and often without the original training data available due to privacy concern. Both points are supported by Bayesian models-yet they are rarely used. The aim of this work is to evaluate Bayesian parametric survival models on public datasets including cardiology, infectious diseases, and oncology.

MATERIALS AND METHODS

Bayesian parametric survival models based on the Exponential and Weibull distribution were implemented as a Python package. A linear combination and a neural network were used for predicting the parameters of the distributions. A superiority design was used to assess whether Bayesian models are better than commonly used models such as Cox Proportional Hazards, Random Survival Forest, and Neural Network-based Cox Proportional Hazards. In a secondary analysis, overfitting was compared between these models. An equivalence design was used to assess whether the prediction performance of Bayesian models after model updating using Bayes rule is equivalent to retraining on the full dataset.

RESULTS

In this study, we found that Bayesian parametric survival models perform as good as state-of-the art models while requiring less hyperparameters to be tuned and providing a measure of the uncertainty of the predictions. In addition, these models were less prone to overfitting. Furthermore, we show that updating these models using Bayes rule yields equivalent performance compared to models trained on combined original and new datasets.

CONCLUSIONS

Bayesian parametric survival models are non-inferior to conventional survival models while requiring less hyperparameter tuning, being less prone to overfitting, and allowing model updating using Bayes rule. Further, the Bayesian models provide a measure of the uncertainty on the statistical inference, and, in particular, on the prediction.

Topics: Humans; Bayes Theorem; Neural Networks, Computer; Uncertainty

PubMed: 37884857
DOI: 10.1186/s12874-023-02059-4

Energy-Conserving Surface Hopping for Auger Processes.

Journal of Chemical Theory and... Jun 2024

Auger-type processes are ubiquitous in nanoscale materials because quantum confinement enhances Coulomb interactions, and there exist large densities of states. Modeling...

Summary PubMed

Authors: Shriya Gumber, Oleg V Prezhdo

Auger-type processes are ubiquitous in nanoscale materials because quantum confinement enhances Coulomb interactions, and there exist large densities of states. Modeling Auger processes requires the modification of nonadiabatic (NA) molecular dynamics algorithms to include transitions caused by both NA and Coulomb couplings. The system is split into quantum and classical subsystems, e.g., electrons and vibrations, and as a result, energy conservation becomes nontrivial. In surface hopping, an electronic transition induced by NA coupling is accompanied by a classical velocity readjustment to ensure conservation of the total quantum-classical energy. A different treatment is needed for Auger transitions driven by Coulomb interactions. We develop a nonadiabatic molecular dynamics methodology that meticulously differentiates the energy redistribution accompanying hops induced by the NA coupling and the Coulomb interaction and correctly conserves the total energy at each transition. If the transition is driven by a Coulomb interaction, the hop energy is redistributed within the quantum electronic subsystem only. If the transition is NA, the energy is redistributed between the quantum and classical subsystems. Properly maintaining energy conservation for both types of transitions is crucial to generate a correct order of events, obtain accurate transition times, maintain a proper statistical distribution of state populations, and reach thermodynamic equilibrium. We test the method with biexciton annihilation and Auger-assisted hot electron relaxation in a CdSe quantum dot. The sequence of Auger and phonon-driven processes and the calculated time scales are in excellent agreement with the experimental results. The developed approach can be coupled with any surface-hopping method and provides a crucial practical advance to study charge-carrier dynamics in the nanoscale and condensed matter systems.

PubMed: 38902855
DOI: 10.1021/acs.jctc.4c00562

Bias-Compensated Integral Regression for Human Pose Estimation.

IEEE Transactions on Pattern Analysis... Sep 2023

In human and hand pose estimation, heatmaps are a crucial intermediate representation for a body or hand keypoint. Two popular methods to decode the heatmap into a final...

Summary PubMed

Authors: Kerui Gu, Linlin Yang, Michael Bi Mi...

In human and hand pose estimation, heatmaps are a crucial intermediate representation for a body or hand keypoint. Two popular methods to decode the heatmap into a final joint coordinate are via an argmax, as done in heatmap detection, or via softmax and expectation, as done in integral regression. Integral regression is learnable end-to-end, but has lower accuracy than detection. This paper uncovers an induced bias from integral regression that results from combining the softmax and the expectation operation. This bias often forces the network to learn degenerately localized heatmaps, obscuring the keypoint's true underlying distribution and leads to lower accuracies. Training-wise, by investigating the gradients of integral regression, we show that the implicit guidance of integral regression to update the heatmap makes it slower to converge than detection. To counter the above two limitations, we propose Bias Compensated Integral Regression (BCIR), an integral regression-based framework that compensates for the bias. BCIR also incorporates a Gaussian prior loss to speed up training and improve prediction accuracy. Experimental results on both the human body and hand benchmarks show that BCIR is faster to train and more accurate than the original integral regression, making it competitive with state-of-the-art detection methods.

Topics: Humans; Algorithms; Hand; Benchmarking; Learning; Normal Distribution

PubMed: 37018104
DOI: 10.1109/TPAMI.2023.3264742

A comprehensive assessment of hurdle and zero-inflated models for single cell RNA-sequencing analysis.

Briefings in Bioinformatics Sep 2023

Single cell RNA-sequencing (scRNA-seq) technology has significantly advanced the understanding of transcriptomic signatures. Although various statistical models have...

Summary PubMed Full Text PDF

Authors: Tao Cui, Tingting Wang

Single cell RNA-sequencing (scRNA-seq) technology has significantly advanced the understanding of transcriptomic signatures. Although various statistical models have been used to describe the distribution of gene expression across cells, a comprehensive assessment of the different models is missing. Moreover, the growing number of features associated with scRNA-seq datasets creates new challenges for analytical accuracy and computing speed. Here, we developed a Python-based package (TensorZINB) to solve the zero-inflated negative binomial (ZINB) model using the TensorFlow deep learning framework. We used a sequential initialization method to solve the numerical stability issues associated with hurdle and zero-inflated models. A recursive feature selection protocol was used to optimize feature selections for data processing and downstream differentially expressed gene (DEG) analysis. We proposed a class of hybrid models combining nested models to further improve the model's performance. Additionally, we developed a new method to convert a continuous distribution to its equivalent discrete form, so that statistical models can be fairly compared. Finally, we showed that the proposed TensorFlow algorithm (TensorZINB) was numerically stable and that its computing speed and performance were superior to those of existing ZINB solvers. Moreover, we implemented seven hurdle and zero-inflated statistical models in Python and systematically assessed their performance using a real scRNA-seq dataset. We demonstrated that the ZINB model achieved the lowest Akaike information criterion compared with other models tested. Taken together, TensorZINB was accurate, efficient and scalable for the implementation of ZINB and for large-scale scRNA-seq data analysis with DEG identification.

Topics: Models, Statistical; Poisson Distribution; Gene Expression Profiling; RNA; Sequence Analysis, RNA

PubMed: 37507115
DOI: 10.1093/bib/bbad272

Statistical Considerations for Subjective Visual Vertical and Subjective Visual Horizontal Assessment in Normal Subjects.

Otology & Neurotology Open Dec 2023

Judgments of the subjective visual vertical (SVV) and subjective visual horizontal (SVH) while seated upright are commonly included in standard clinical test batteries...

Summary PubMed Full Text PDF

Authors: Carey D Balaban, Erin Williams, Cynthia L Holland...

OBJECTIVES

Judgments of the subjective visual vertical (SVV) and subjective visual horizontal (SVH) while seated upright are commonly included in standard clinical test batteries for vestibular function. We examined SVV and SVH data from retrospective control to assess their statistical distributions and normative values for magnitudes of the preset effect, sex differences, and fixed-head versus head-free device platforms for assessment.

METHODS

Retrospective clinical SVV and SVH data from 2 test platforms, Neuro-otologic Test Center (NOTC) and the Neurolign Dx 100 (I-Portal Portable Assessment System Nystagmograph) were analyzed statistically (SPSS and MATLAB software) for 408 healthy male and female civilians and military service members, aged 18-50 years.

RESULTS

No prominent age-related effects were observed. The preset angle effects for both SVV and SVH, and their deviations from orthogonality, agree in magnitude with previous reports. Differences attributable to interactions with device type and sex are of small magnitude. Analyses confirmed that common clinical measure for SVV and SVH, the average of equal numbers of clockwise and counterclockwise preset trials, was not significantly affected by the test device or sex of the subject. Finally, distributional analyses failed to reject the hypothesis of underlying Gaussian distributions for the clinical metrics.

CONCLUSIONS

scores based on these normative findings can be used for objective detection of outliers from normal functional limits in the clinic.

PubMed: 38516545
DOI: 10.1097/ONO.0000000000000044