-
Physica A Feb 2021At the end of 2019, the current novel coronavirus emerged as a severe acute respiratory disease that has now become a worldwide pandemic. Future generations will look...
At the end of 2019, the current novel coronavirus emerged as a severe acute respiratory disease that has now become a worldwide pandemic. Future generations will look back on this difficult period and see how our society as a whole united and rose to this challenge. Many reports have suggested that this new virus is becoming comparable to the Spanish flu pandemic of 1918. We provide a statistical study on the modelling and analysis of the daily incidence of COVID-19 in eighteen countries around the world. In particular, we investigate whether it is possible to fit count regression models to the number of daily new cases of COVID-19 in various countries and make short term predictions of these numbers. The results suggest that the biggest advantage of these methods is that they are simplistic and straightforward allowing us to obtain preliminary results and an overall picture of the trends in the daily confirmed cases of COVID-19 around the world. The best fitting count regression model for modelling the number of new daily COVID-19 cases of all countries analysed was shown to be a negative binomial distribution with log link function. Whilst the results cannot solely be used to determine and influence policy decisions, they provide an alternative to more specialised epidemiological models and can help to support or contradict results obtained from other analysis.
PubMed: 33162665
DOI: 10.1016/j.physa.2020.125460 -
BMC Cancer Jan 2022The incidence of early-onset colorectal cancer (EOCRC) is increasing at an alarming rate and further studies are needed to identify risk factors and to develop... (Meta-Analysis)
Meta-Analysis Review
BACKGROUND
The incidence of early-onset colorectal cancer (EOCRC) is increasing at an alarming rate and further studies are needed to identify risk factors and to develop prevention strategies.
METHODS
Risk factors significantly associated with EOCRC were identified using meta-analysis. An individual risk appraisal model was constructed using the Rothman-Keller model. Next, a group of random data sets was generated using the binomial distribution function method, to determine nodes of risk assessment levels and to identify low, medium, and high risk populations.
RESULTS
A total of 32,843 EOCRC patients were identified in this study, and nine significant risk factors were identified using meta-analysis, including male sex, Caucasian ethnicity, sedentary lifestyle, inflammatory bowel disease, and high intake of red meat and processed meat. After simulating the risk assessment data of 10,000 subjects, scores of 0 to 0.0018, 0.0018 to 0.0036, and 0.0036 or more were respectively considered as low-, moderate-, and high-risk populations for the EOCRC population based on risk trends from the Rothman-Keller model.
CONCLUSION
This model can be used for screening of young adults to predict high risk of EOCRC and will contribute to the primary prevention strategies and the reduction of risk of developing EOCRC.
Topics: Adult; Clinical Decision Rules; Colorectal Neoplasms; Early Detection of Cancer; Female; Humans; Incidence; Male; Middle Aged; Risk Assessment; Risk Factors; Young Adult
PubMed: 35093005
DOI: 10.1186/s12885-022-09238-4 -
Journal of Applied Statistics 2022We propose a new model called LogLindley-Binomial and ordinal joint model with random effects for analyzing mixed overdispersed binomial and ordinal longitudinal...
We propose a new model called LogLindley-Binomial and ordinal joint model with random effects for analyzing mixed overdispersed binomial and ordinal longitudinal responses. A new distribution called the LogLindley-Binomial is presented, which is appropriate for the analysis of overdispersed binomial variables. A full likelihood-based approach is used to obtain maximum likelihood estimates. A comparison between LogLindley-Binomial and Beta-Binomial distributions are given by a simulation study. Also, to illustrate the utility of the proposed model, some simulation studies are conducted. In simulation studies, the performances of the LogLindley-Binomial distribution and the proposed model are well in some situations. Also, the new model's performance for analyzing a real dataset, extracted from the British Household Panel Survey, is studied. The proposed model performs well in comparison with another model for analyzing real data. Finally, the proposed distribution and the new model are found to be applicable for analyzing overdispersed binomial and mixed data.
PubMed: 35707561
DOI: 10.1080/02664763.2021.1881455 -
Journal of Statistical Theory and... 2022Two families of bivariate discrete Poisson-Lindley distributions are introduced. The first is derived by mixing the common parameter in a bivariate Poisson distribution...
Two families of bivariate discrete Poisson-Lindley distributions are introduced. The first is derived by mixing the common parameter in a bivariate Poisson distribution by different models of univariate continuous Lindley distributions. The second is obtained by generalizing a bivariate binomial distribution with respect to its exponent when it follows any of five different univariate discrete Poisson-Lindley distributions with one or two parameters. The use of probability-generating functions is mainly employed to derive some general properties for both families and specific characteristics for each one of their members. We obtain expressions for probabilities, moments, conditional distributions, regression functions, as well as characterizations for certain bivariate models and their marginals. An attractive property of all bivariate individual models is that they contain only two or three parameters, and one of them is readily estimated by simple ratios of their sample means. This feature, and since all marginal distributions are over-dispersed, strongly suggests their potential use to describe bivariate dependent count data in many different areas.
PubMed: 35493334
DOI: 10.1007/s42519-022-00261-z -
PloS One 2022RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A...
RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution.
Topics: Binomial Distribution; High-Throughput Nucleotide Sequencing; RNA-Seq; Sample Size; Sequence Analysis, RNA
PubMed: 36112652
DOI: 10.1371/journal.pone.0264246 -
Journal of Multidisciplinary Healthcare 2022This article provides a thorough explanation of methods and theoretical concepts to detect infectivity of COVID-19. The concept of heterogeneity is discussed and its...
This article provides a thorough explanation of methods and theoretical concepts to detect infectivity of COVID-19. The concept of heterogeneity is discussed and its impacts on COVID-19 pandemics are explored. Observable heterogeneity is distinguished from non-observable heterogeneity. The data support the concepts of heterogeneity and the methods to extract and interpret the data evidence for the conclusions in this paper. Heterogeneity among the vulnerable to COVID-19 is a significant factor in the contagion of COVID-19, as demonstrated with incidence rates using data of a Diamond Princess cruise ship. Given the nature of the pandemic, its heterogeneity with different social norms, pre- and post-voyage quick testing procedures ought to become the new standard for cruise ship passengers and crew. With quick testing, identification of those infected and thus, not allowing to embark on a cruise or quarantine those disembarking, and other mitigation strategies, the popular cruise adventure could become norm for safe voyage. The novel method used in this article adds valuable insight in the modeling of disease and specifically, the COVID-19 virus.
PubMed: 35018100
DOI: 10.2147/JMDH.S322637 -
Scientific Reports Jul 2023Among diseases, cancer exhibits the fastest global spread, presenting a substantial challenge for patients, their families, and the communities they belong to. This...
Among diseases, cancer exhibits the fastest global spread, presenting a substantial challenge for patients, their families, and the communities they belong to. This paper is devoted to modeling such a disease as a special case. A newly proposed distribution called the binomial-discrete Erlang-truncated exponential (BDETE) is introduced. The BDETE is a mixture of binomial distribution with the number of trials (parameter [Formula: see text]) taken after a discrete Erlang-truncated exponential distribution. A comprehensive mathematical treatment of the proposed distribution and expressions of its density, cumulative distribution function, survival function, failure rate function, Quantile function, moment generating function, Shannon entropy, order statistics, and stress-strength reliability, are provided. The distribution's parameters are estimated using the maximum likelihood method. Two real-world lifetime count data sets from the cancer disease, both of which are right-skewed and over-dispersed, are fitted using the proposed BDETE distribution to evaluate its efficacy and viability. We expect the findings to become standard works in probability theory and its related fields.
Topics: Humans; Reproducibility of Results; Statistical Distributions; Entropy; Neoplasms
PubMed: 37507433
DOI: 10.1038/s41598-023-38709-2 -
Nature Computational Science Apr 2021Most tissue samples are composed of different cell types. Differential expression analysis without accounting for cell type composition cannot separate the changes due...
Most tissue samples are composed of different cell types. Differential expression analysis without accounting for cell type composition cannot separate the changes due to cell type composition or cell type-specific expression. We propose a computational framework to address these limitations: ell Type ware analysis of NA- (CARseq). CARseq employs a negative binomial distribution that appropriately models the count data from RNA-seq experiments. Simulation studies show that CARseq has substantially higher power than a linear model-based approach and it also provides more accurate estimate of the rankings of differentially expressed genes. We have applied CARseq to compare gene expression of schizophrenia/autism subjects versus controls, and identified the cell types underlying the difference and similarities of these two neuron-developmental diseases. Our results are consistent with the results from differential expression analysis using single cell RNA-seq data.
PubMed: 34957416
DOI: 10.1038/s43588-021-00055-6 -
Infectious Disease Modelling Dec 2023Accurately estimating the effective reproduction number is crucial for characterizing the transmissibility of infectious diseases to optimize interventions and responses...
Accurately estimating the effective reproduction number is crucial for characterizing the transmissibility of infectious diseases to optimize interventions and responses during epidemic outbreaks. In this study, we improve the estimation of the effective reproduction number through two main approaches. First, we derive a discrete model to represent a time series of case counts and propose an estimation method based on this framework. We also conduct numerical experiments to demonstrate the effectiveness of the proposed discretization scheme. By doing so, we enhance the accuracy of approximating the underlying epidemic process compared to previous methods, even when the counting period is similar to the mean generation time of an infectious disease. Second, we employ a negative binomial distribution to model the variability of count data to accommodate overdispersion. Specifically, given that observed incidence counts follow a negative binomial distribution, the posterior distribution of secondary infections is obtained as a Dirichlet multinomial distribution. With this formulation, we establish posterior uncertainty bounds for the effective reproduction number. Finally, we demonstrate the effectiveness of the proposed method using incidence data from the COVID-19 pandemic.
PubMed: 37701756
DOI: 10.1016/j.idm.2023.08.006 -
MedRxiv : the Preprint Server For... May 2020Background COVID-19, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was declared a global pandemic in March 2020. Electronic cigarette use...
Background COVID-19, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was declared a global pandemic in March 2020. Electronic cigarette use (vaping) rapidly gained popularity in the US in recent years. Whether electronic cigarette users (vapers) are more susceptible to COVID-19 infection is unknown. Methods Using integrated data in each US state from the 2018 Behavioral Risk Factor Surveillance System (BRFSS), United States Census Bureau and the 1Point3Acres.com website, generalized estimating equation (GEE) models with negative binomial distribution assumption and log link functions were used to examine the association of weighted proportions of vapers with number of COVID-19 infections and deaths in the US. Results The weighted proportion of vapers who used e-cigarettes every day or some days ranged from 2.86% to 6.42% for US states. Statistically significant associations were observed between the weighted proportion of vapers and number of COVID-19 infected cases as well as COVID-19 deaths in the US after adjusting for the weighted proportion of smokers and other significant covariates in the GEE models. With every one percent increase in weighted proportion of vapers in each state, the number of COVID-19 infected cases increase by 0.3139 (95% CI: 0.0554 - 0.5723) and the number of COVID-19 deaths increase by 0.3705 (95% CI: 0.0623 - 0.6786) in log scale in each US state. Conclusions The positive associations between the proportion of vapers and the number of COVID-19 infected cases and deaths in each US state suggest an increased susceptibility of vapers to COVID-19 infections and deaths.
PubMed: 32511560
DOI: 10.1101/2020.05.05.20092379