-
Viruses Dec 2023Throughout the COVID-19 pandemic, an unprecedented level of clinical nasal swab data from around the globe has been collected and shared. Positive tests have...
Throughout the COVID-19 pandemic, an unprecedented level of clinical nasal swab data from around the globe has been collected and shared. Positive tests have consistently revealed viral titers spanning six orders of magnitude! An open question is whether such extreme population heterogeneity is unique to SARS-CoV-2 or possibly generic to viral respiratory infections. To probe this question, we turn to the computational modeling of nasal tract infections. Employing a physiologically faithful, spatially resolved, stochastic model of respiratory tract infection, we explore the statistical distribution of human nasal infections in the immediate 48 h of infection. The spread, or heterogeneity, of the distribution derives from variations in factors within the model that are unique to the infected host, infectious variant, and timing of the test. Hypothetical factors include: (1) reported physiological differences between infected individuals (nasal mucus thickness and clearance velocity); (2) differences in the kinetics of infection, replication, and shedding of viral RNA copies arising from the unique interactions between the host and viral variant; and (3) differences in the time between initial cell infection and the clinical test. Since positive clinical tests are often pre-symptomatic and independent of prior infection or vaccination status, in the model we assume immune evasion throughout the immediate 48 h of infection. Model simulations generate the mean statistical outcomes of total shed viral load and infected cells throughout 48 h for each "virtual individual", which we define as each fixed set of model parameters (1) and (2) above. The "virtual population" and the statistical distribution of outcomes over the population are defined by collecting clinically and experimentally guided ranges for the full set of model parameters (1) and (2). This establishes a model-generated "virtual population database" of nasal viral titers throughout the initial 48 h of infection of every individual, which we then compare with clinical swab test data. Support for model efficacy comes from the sampling of infection dynamics over the virtual population database, which reproduces the six-order-of-magnitude clinical population heterogeneity. However, the goal of this study is to answer a deeper biological and clinical question. To answer this question, global data analysis methods are applied to the virtual population database that sample across the entire database and de-correlate (i.e., isolate) the dynamic infection outcome sensitivities of each model parameter. These methods predict the dominant, indeed exponential, driver of population heterogeneity in dynamic infection outcomes is the latency time of infected cells (from the moment of infection until onset of viral RNA shedding). The shedding rate of the viral RNA of infected cells in the shedding phase is a strong, but not exponential, driver of infection. Furthermore, the unknown timing of the nasal swab test relative to the onset of infection is an equally dominant contributor to extreme population heterogeneity in clinical test data since infectious viral loads grow from undetectable levels to more than six orders of magnitude within 48 h.
Topics: Humans; COVID-19; SARS-CoV-2; Pandemics; Common Cold; Computer Simulation; RNA, Viral
PubMed: 38257769
DOI: 10.3390/v16010069 -
Scientific Reports Oct 2023In many biomechanical analyses, the forces acting on a body during dynamic and static activities are often simplified as point loads. However, it is usually more...
In many biomechanical analyses, the forces acting on a body during dynamic and static activities are often simplified as point loads. However, it is usually more accurate to characterize these forces as distributed loads, varying in magnitude and direction, over a given contact area. Evaluating these pressure distributions while they are applied to different parts of the body can provide effective insights for clinicians and researchers when studying health and disease conditions, for example when investigating the biomechanical factors that may lead to plantar ulceration in diabetic foot disease. At present, most processing and analysis for pressure data is performed using proprietary software, limiting reproducibility, transparency, and consistency across different studies. This paper describes an open-source software package, 'pressuRe', which is built in the freely available R statistical computing environment and is designed to process, analyze, and visualize pressure data collected on a range of different hardware systems in a standardized manner. We demonstrate the use of the package on pressure dataset from patients with diabetic foot disease, comparing pressure variables between those with longer and shorter durations of the disease. The results matched closely with those from commercially available software, and individuals with longer duration of diabetes were found to have higher forefoot pressures than those with shorter duration. By utilizing R's powerful and openly available tools for statistical analysis and user customization, this package may be a useful tool for researchers and clinicians studying plantar pressures and other pressure sensor array based biomechanical measurements. With regular updates intended, this package allows for continued improvement and we welcome feedback and future contributions to extend its scope. In this article, we detail the package's features and functionality.
Topics: Humans; Diabetic Foot; Reproducibility of Results; Foot; Pressure; Biomechanical Phenomena
PubMed: 37798383
DOI: 10.1038/s41598-023-44041-6 -
Orphanet Journal of Rare Diseases Aug 2023Inherited metabolic disorders (IMDs) usually occurs at young age and hence it severely threatening the health and life of young people. While so far there lacks a...
BACKGROUND
Inherited metabolic disorders (IMDs) usually occurs at young age and hence it severely threatening the health and life of young people. While so far there lacks a comprehensive study which can reveals China's nationwide landscape of IMDs. This study aimed to evaluate IMDs incidence and regional distributions in China at a national and province level to guide clinicians and policy makers.
METHODS
The retrospective study conducted from January 2012 to March 2021, we analyzed and characterized 372255 cases' clinical test information and diagnostic data from KingMed Diagnostics Laboratory. The samples were from 32 provincial regions of China, the urine organic acids were detected by gas chromatography-mass spectrometry (GC-MS), amino acids and acylcarnitines in dried blood spots were detected by liquid chromatography-tandem mass spectrometry (LC-MS/MS). We did a statistical analysis of the distribution of the 16 most common IMDs in amino acid disorders and organic acidemias, and then paid special attention to analyze the age and regional distributions of different IMDs. The statistical analyses and visualization analysis were performed with the programming language R (version 4.2.1).
RESULTS
There were 4911 positive cases diagnosed, which was 1.32% of the total sample during the ten-year study period. Most diseases tended to occur at ages younger than 18 year-old. The Ornithine Transcarbamylase Deficiency tended to progress on male infants who were less than 28 days old. While the peak of the positive case number of Citrin Deficiency disease (CD) was at 1-6 months. Different IMDs' had different distribution patterns in China's provinces. Methylmalonic Acidemias and Hyperphenylalaninemia had an imbalanced distribution pattern in China and its positive rate was significantly higher in North China than South China. Conversely, the positive rate of CD was significantly higher in South China than North China.
CONCLUSIONS
Results of this work, such as the differences in distribution pattern of different diseases in terms of age, region, etc. provide important insights and references for clinicians, researchers and healthcare policy makers. The policy makers could optimize the better health screening programs for covering children and infants in specific ages and regions based on our findings.
Topics: Infant; Child; Humans; Male; Adolescent; Retrospective Studies; Chromatography, Liquid; Tandem Mass Spectrometry; Metabolic Diseases; China
PubMed: 37537594
DOI: 10.1186/s13023-023-02834-y -
Theoretical Population Biology Jun 2024A phase-type distribution is the time to absorption in a continuous- or discrete-time Markov chain. Phase-type distributions can be used as a general framework to... (Review)
Review
A phase-type distribution is the time to absorption in a continuous- or discrete-time Markov chain. Phase-type distributions can be used as a general framework to calculate key properties of the standard coalescent model and many of its extensions. Here, the 'phases' in the phase-type distribution correspond to states in the ancestral process. For example, the time to the most recent common ancestor and the total branch length are phase-type distributed. Furthermore, the site frequency spectrum follows a multivariate discrete phase-type distribution and the joint distribution of total branch lengths in the two-locus coalescent-with-recombination model is multivariate phase-type distributed. In general, phase-type distributions provide a powerful mathematical framework for coalescent theory because they are analytically tractable using matrix manipulations. The purpose of this review is to explain the phase-type theory and demonstrate how the theory can be applied to derive basic properties of coalescent models. These properties can then be used to obtain insight into the ancestral process, or they can be applied for statistical inference. In particular, we show the relation between classical first-step analysis of coalescent models and phase-type calculations. We also show how reward transformations in phase-type theory lead to easy calculation of covariances and correlation coefficients between e.g. tree height, tree length, external branch length, and internal branch length. Furthermore, we discuss how these quantities can be used for statistical inference based on estimating equations. Providing an alternative to previous work based on the Laplace transform, we derive likelihoods for small-size coalescent trees based on phase-type theory. Overall, our main aim is to demonstrate that phase-type distributions provide a convenient general set of tools to understand aspects of coalescent models that are otherwise difficult to derive. Throughout the review, we emphasize the versatility of the phase-type framework, which is also illustrated by our accompanying R-code. All our analyses and figures can be reproduced from code available on GitHub.
Topics: Genetics, Population; Markov Chains; Models, Genetic; Humans
PubMed: 38460602
DOI: 10.1016/j.tpb.2024.03.001 -
PeerJ 2023Considerable resources are spent to track fish movement in marine environments, often with the intent of estimating behavior, distribution, and abundance. Resulting data...
BACKGROUND
Considerable resources are spent to track fish movement in marine environments, often with the intent of estimating behavior, distribution, and abundance. Resulting data from these monitoring efforts, including tagging studies and genetic sampling, often can be siloed. For Pacific salmon in the Northeast Pacific Ocean, predominant data sources for fish monitoring are coded wire tags (CWTs) and genetic stock identification (GSI). Despite their complementary strengths and weaknesses in coverage and information content, the two data streams rarely have been integrated to inform Pacific salmon biology and management. Joint, or integrated, models can combine and contextualize multiple data sources in a single statistical framework to produce more robust estimates of fish populations.
METHODS
We introduce and fit a comprehensive joint model that integrates data from CWT recoveries and GSI sampling to inform the marine life history of Chinook salmon stocks at spatial and temporal scales relevant to ongoing fisheries management efforts. In a departure from similar models based primarily on CWT recoveries, modeled stocks in the new framework encompass both hatchery- and natural-origin fish. We specifically model the spatial distribution and marine abundance of four distinct stocks with spawning locations in California and southern Oregon, one of which is listed under the U.S. Endangered Species Act.
RESULTS
Using the joint model, we generated the most comprehensive estimates of marine distribution to date for all modeled Chinook salmon stocks, including historically data poor and low abundance stocks. Estimated marine distributions from the joint model were broadly similar to estimates from a simpler, CWT-only model but did suggest some differences in distribution in select seasons. Model output also included novel stock-, year-, and season-specific estimates of marine abundance. We observed and partially addressed several challenges in model convergence with the use of supplemental data sources and model constraints; similar difficulties are not unexpected with integrated modeling. We identify several options for improved data collection that could address issues in convergence and increase confidence in model estimates of abundance. We expect these model advances and results provide management-relevant biological insights, with the potential to inform future mixed-stock fisheries management efforts, as well as a foundation for more expansive and comprehensive analyses to follow.
Topics: Animals; Salmon; Fisheries; Pacific Ocean; Endangered Species; Oncorhynchus
PubMed: 38047019
DOI: 10.7717/peerj.16487 -
BMC Medical Research Methodology Oct 2023Evidence-based treatment decisions in medicine are made founded on population-level evidence obtained during randomized clinical trials. In an era of personalized...
BACKGROUND
Evidence-based treatment decisions in medicine are made founded on population-level evidence obtained during randomized clinical trials. In an era of personalized medicine, these decisions should be based on the predicted benefit of a treatment on a patient-level. Survival prediction models play a central role as they incorporate the time-to-event and censoring. In medical applications uncertainty is critical especially when treatments differ in their side effect profiles or costs. Additionally, models must be adapted to local populations without diminishing performance and often without the original training data available due to privacy concern. Both points are supported by Bayesian models-yet they are rarely used. The aim of this work is to evaluate Bayesian parametric survival models on public datasets including cardiology, infectious diseases, and oncology.
MATERIALS AND METHODS
Bayesian parametric survival models based on the Exponential and Weibull distribution were implemented as a Python package. A linear combination and a neural network were used for predicting the parameters of the distributions. A superiority design was used to assess whether Bayesian models are better than commonly used models such as Cox Proportional Hazards, Random Survival Forest, and Neural Network-based Cox Proportional Hazards. In a secondary analysis, overfitting was compared between these models. An equivalence design was used to assess whether the prediction performance of Bayesian models after model updating using Bayes rule is equivalent to retraining on the full dataset.
RESULTS
In this study, we found that Bayesian parametric survival models perform as good as state-of-the art models while requiring less hyperparameters to be tuned and providing a measure of the uncertainty of the predictions. In addition, these models were less prone to overfitting. Furthermore, we show that updating these models using Bayes rule yields equivalent performance compared to models trained on combined original and new datasets.
CONCLUSIONS
Bayesian parametric survival models are non-inferior to conventional survival models while requiring less hyperparameter tuning, being less prone to overfitting, and allowing model updating using Bayes rule. Further, the Bayesian models provide a measure of the uncertainty on the statistical inference, and, in particular, on the prediction.
Topics: Humans; Bayes Theorem; Neural Networks, Computer; Uncertainty
PubMed: 37884857
DOI: 10.1186/s12874-023-02059-4 -
Journal of Chemical Theory and... Jun 2024Auger-type processes are ubiquitous in nanoscale materials because quantum confinement enhances Coulomb interactions, and there exist large densities of states. Modeling...
Auger-type processes are ubiquitous in nanoscale materials because quantum confinement enhances Coulomb interactions, and there exist large densities of states. Modeling Auger processes requires the modification of nonadiabatic (NA) molecular dynamics algorithms to include transitions caused by both NA and Coulomb couplings. The system is split into quantum and classical subsystems, e.g., electrons and vibrations, and as a result, energy conservation becomes nontrivial. In surface hopping, an electronic transition induced by NA coupling is accompanied by a classical velocity readjustment to ensure conservation of the total quantum-classical energy. A different treatment is needed for Auger transitions driven by Coulomb interactions. We develop a nonadiabatic molecular dynamics methodology that meticulously differentiates the energy redistribution accompanying hops induced by the NA coupling and the Coulomb interaction and correctly conserves the total energy at each transition. If the transition is driven by a Coulomb interaction, the hop energy is redistributed within the quantum electronic subsystem only. If the transition is NA, the energy is redistributed between the quantum and classical subsystems. Properly maintaining energy conservation for both types of transitions is crucial to generate a correct order of events, obtain accurate transition times, maintain a proper statistical distribution of state populations, and reach thermodynamic equilibrium. We test the method with biexciton annihilation and Auger-assisted hot electron relaxation in a CdSe quantum dot. The sequence of Auger and phonon-driven processes and the calculated time scales are in excellent agreement with the experimental results. The developed approach can be coupled with any surface-hopping method and provides a crucial practical advance to study charge-carrier dynamics in the nanoscale and condensed matter systems.
PubMed: 38902855
DOI: 10.1021/acs.jctc.4c00562 -
IEEE Transactions on Pattern Analysis... Sep 2023In human and hand pose estimation, heatmaps are a crucial intermediate representation for a body or hand keypoint. Two popular methods to decode the heatmap into a final...
In human and hand pose estimation, heatmaps are a crucial intermediate representation for a body or hand keypoint. Two popular methods to decode the heatmap into a final joint coordinate are via an argmax, as done in heatmap detection, or via softmax and expectation, as done in integral regression. Integral regression is learnable end-to-end, but has lower accuracy than detection. This paper uncovers an induced bias from integral regression that results from combining the softmax and the expectation operation. This bias often forces the network to learn degenerately localized heatmaps, obscuring the keypoint's true underlying distribution and leads to lower accuracies. Training-wise, by investigating the gradients of integral regression, we show that the implicit guidance of integral regression to update the heatmap makes it slower to converge than detection. To counter the above two limitations, we propose Bias Compensated Integral Regression (BCIR), an integral regression-based framework that compensates for the bias. BCIR also incorporates a Gaussian prior loss to speed up training and improve prediction accuracy. Experimental results on both the human body and hand benchmarks show that BCIR is faster to train and more accurate than the original integral regression, making it competitive with state-of-the-art detection methods.
Topics: Humans; Algorithms; Hand; Benchmarking; Learning; Normal Distribution
PubMed: 37018104
DOI: 10.1109/TPAMI.2023.3264742 -
Briefings in Bioinformatics Sep 2023Single cell RNA-sequencing (scRNA-seq) technology has significantly advanced the understanding of transcriptomic signatures. Although various statistical models have...
Single cell RNA-sequencing (scRNA-seq) technology has significantly advanced the understanding of transcriptomic signatures. Although various statistical models have been used to describe the distribution of gene expression across cells, a comprehensive assessment of the different models is missing. Moreover, the growing number of features associated with scRNA-seq datasets creates new challenges for analytical accuracy and computing speed. Here, we developed a Python-based package (TensorZINB) to solve the zero-inflated negative binomial (ZINB) model using the TensorFlow deep learning framework. We used a sequential initialization method to solve the numerical stability issues associated with hurdle and zero-inflated models. A recursive feature selection protocol was used to optimize feature selections for data processing and downstream differentially expressed gene (DEG) analysis. We proposed a class of hybrid models combining nested models to further improve the model's performance. Additionally, we developed a new method to convert a continuous distribution to its equivalent discrete form, so that statistical models can be fairly compared. Finally, we showed that the proposed TensorFlow algorithm (TensorZINB) was numerically stable and that its computing speed and performance were superior to those of existing ZINB solvers. Moreover, we implemented seven hurdle and zero-inflated statistical models in Python and systematically assessed their performance using a real scRNA-seq dataset. We demonstrated that the ZINB model achieved the lowest Akaike information criterion compared with other models tested. Taken together, TensorZINB was accurate, efficient and scalable for the implementation of ZINB and for large-scale scRNA-seq data analysis with DEG identification.
Topics: Models, Statistical; Poisson Distribution; Gene Expression Profiling; RNA; Sequence Analysis, RNA
PubMed: 37507115
DOI: 10.1093/bib/bbad272 -
Otology & Neurotology Open Dec 2023Judgments of the subjective visual vertical (SVV) and subjective visual horizontal (SVH) while seated upright are commonly included in standard clinical test batteries...
OBJECTIVES
Judgments of the subjective visual vertical (SVV) and subjective visual horizontal (SVH) while seated upright are commonly included in standard clinical test batteries for vestibular function. We examined SVV and SVH data from retrospective control to assess their statistical distributions and normative values for magnitudes of the preset effect, sex differences, and fixed-head versus head-free device platforms for assessment.
METHODS
Retrospective clinical SVV and SVH data from 2 test platforms, Neuro-otologic Test Center (NOTC) and the Neurolign Dx 100 (I-Portal Portable Assessment System Nystagmograph) were analyzed statistically (SPSS and MATLAB software) for 408 healthy male and female civilians and military service members, aged 18-50 years.
RESULTS
No prominent age-related effects were observed. The preset angle effects for both SVV and SVH, and their deviations from orthogonality, agree in magnitude with previous reports. Differences attributable to interactions with device type and sex are of small magnitude. Analyses confirmed that common clinical measure for SVV and SVH, the average of equal numbers of clockwise and counterclockwise preset trials, was not significantly affected by the test device or sex of the subject. Finally, distributional analyses failed to reject the hypothesis of underlying Gaussian distributions for the clinical metrics.
CONCLUSIONS
scores based on these normative findings can be used for objective detection of outliers from normal functional limits in the clinic.
PubMed: 38516545
DOI: 10.1097/ONO.0000000000000044