-
Genome Biology 2010High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in...
High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.
Topics: Animals; Binomial Distribution; Chromatin Immunoprecipitation; Computational Biology; Drosophila; Gene Expression Profiling; High-Throughput Nucleotide Sequencing; Linear Models; Models, Genetic; Saccharomyces cerevisiae; Sequence Analysis, RNA; Stem Cells; Tissue Culture Techniques
PubMed: 20979621
DOI: 10.1186/gb-2010-11-10-r106 -
Journal of Applied Statistics 2022We propose a new model called LogLindley-Binomial and ordinal joint model with random effects for analyzing mixed overdispersed binomial and ordinal longitudinal...
We propose a new model called LogLindley-Binomial and ordinal joint model with random effects for analyzing mixed overdispersed binomial and ordinal longitudinal responses. A new distribution called the LogLindley-Binomial is presented, which is appropriate for the analysis of overdispersed binomial variables. A full likelihood-based approach is used to obtain maximum likelihood estimates. A comparison between LogLindley-Binomial and Beta-Binomial distributions are given by a simulation study. Also, to illustrate the utility of the proposed model, some simulation studies are conducted. In simulation studies, the performances of the LogLindley-Binomial distribution and the proposed model are well in some situations. Also, the new model's performance for analyzing a real dataset, extracted from the British Household Panel Survey, is studied. The proposed model performs well in comparison with another model for analyzing real data. Finally, the proposed distribution and the new model are found to be applicable for analyzing overdispersed binomial and mixed data.
PubMed: 35707561
DOI: 10.1080/02664763.2021.1881455 -
Scientific Reports Jul 2023Among diseases, cancer exhibits the fastest global spread, presenting a substantial challenge for patients, their families, and the communities they belong to. This...
Among diseases, cancer exhibits the fastest global spread, presenting a substantial challenge for patients, their families, and the communities they belong to. This paper is devoted to modeling such a disease as a special case. A newly proposed distribution called the binomial-discrete Erlang-truncated exponential (BDETE) is introduced. The BDETE is a mixture of binomial distribution with the number of trials (parameter [Formula: see text]) taken after a discrete Erlang-truncated exponential distribution. A comprehensive mathematical treatment of the proposed distribution and expressions of its density, cumulative distribution function, survival function, failure rate function, Quantile function, moment generating function, Shannon entropy, order statistics, and stress-strength reliability, are provided. The distribution's parameters are estimated using the maximum likelihood method. Two real-world lifetime count data sets from the cancer disease, both of which are right-skewed and over-dispersed, are fitted using the proposed BDETE distribution to evaluate its efficacy and viability. We expect the findings to become standard works in probability theory and its related fields.
Topics: Humans; Reproducibility of Results; Statistical Distributions; Entropy; Neoplasms
PubMed: 37507433
DOI: 10.1038/s41598-023-38709-2 -
Journal of Statistical Theory and... 2022Two families of bivariate discrete Poisson-Lindley distributions are introduced. The first is derived by mixing the common parameter in a bivariate Poisson distribution...
Two families of bivariate discrete Poisson-Lindley distributions are introduced. The first is derived by mixing the common parameter in a bivariate Poisson distribution by different models of univariate continuous Lindley distributions. The second is obtained by generalizing a bivariate binomial distribution with respect to its exponent when it follows any of five different univariate discrete Poisson-Lindley distributions with one or two parameters. The use of probability-generating functions is mainly employed to derive some general properties for both families and specific characteristics for each one of their members. We obtain expressions for probabilities, moments, conditional distributions, regression functions, as well as characterizations for certain bivariate models and their marginals. An attractive property of all bivariate individual models is that they contain only two or three parameters, and one of them is readily estimated by simple ratios of their sample means. This feature, and since all marginal distributions are over-dispersed, strongly suggests their potential use to describe bivariate dependent count data in many different areas.
PubMed: 35493334
DOI: 10.1007/s42519-022-00261-z -
Physica A Feb 2021At the end of 2019, the current novel coronavirus emerged as a severe acute respiratory disease that has now become a worldwide pandemic. Future generations will look...
At the end of 2019, the current novel coronavirus emerged as a severe acute respiratory disease that has now become a worldwide pandemic. Future generations will look back on this difficult period and see how our society as a whole united and rose to this challenge. Many reports have suggested that this new virus is becoming comparable to the Spanish flu pandemic of 1918. We provide a statistical study on the modelling and analysis of the daily incidence of COVID-19 in eighteen countries around the world. In particular, we investigate whether it is possible to fit count regression models to the number of daily new cases of COVID-19 in various countries and make short term predictions of these numbers. The results suggest that the biggest advantage of these methods is that they are simplistic and straightforward allowing us to obtain preliminary results and an overall picture of the trends in the daily confirmed cases of COVID-19 around the world. The best fitting count regression model for modelling the number of new daily COVID-19 cases of all countries analysed was shown to be a negative binomial distribution with log link function. Whilst the results cannot solely be used to determine and influence policy decisions, they provide an alternative to more specialised epidemiological models and can help to support or contradict results obtained from other analysis.
PubMed: 33162665
DOI: 10.1016/j.physa.2020.125460 -
PloS One 2022RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A...
RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution.
Topics: Binomial Distribution; High-Throughput Nucleotide Sequencing; RNA-Seq; Sample Size; Sequence Analysis, RNA
PubMed: 36112652
DOI: 10.1371/journal.pone.0264246 -
Infectious Disease Modelling Dec 2023Accurately estimating the effective reproduction number is crucial for characterizing the transmissibility of infectious diseases to optimize interventions and responses...
Accurately estimating the effective reproduction number is crucial for characterizing the transmissibility of infectious diseases to optimize interventions and responses during epidemic outbreaks. In this study, we improve the estimation of the effective reproduction number through two main approaches. First, we derive a discrete model to represent a time series of case counts and propose an estimation method based on this framework. We also conduct numerical experiments to demonstrate the effectiveness of the proposed discretization scheme. By doing so, we enhance the accuracy of approximating the underlying epidemic process compared to previous methods, even when the counting period is similar to the mean generation time of an infectious disease. Second, we employ a negative binomial distribution to model the variability of count data to accommodate overdispersion. Specifically, given that observed incidence counts follow a negative binomial distribution, the posterior distribution of secondary infections is obtained as a Dirichlet multinomial distribution. With this formulation, we establish posterior uncertainty bounds for the effective reproduction number. Finally, we demonstrate the effectiveness of the proposed method using incidence data from the COVID-19 pandemic.
PubMed: 37701756
DOI: 10.1016/j.idm.2023.08.006 -
Statistics in Medicine Oct 2015Zero-inflated Poisson (ZIP) and negative binomial (ZINB) models are widely used to model zero-inflated count responses. These models extend the Poisson and negative...
Zero-inflated Poisson (ZIP) and negative binomial (ZINB) models are widely used to model zero-inflated count responses. These models extend the Poisson and negative binomial (NB) to address excessive zeros in the count response. By adding a degenerate distribution centered at 0 and interpreting it as describing a non-risk group in the population, the ZIP (ZINB) models a two-component population mixture. As in applications of Poisson and NB, the key difference between ZIP and ZINB is the allowance for overdispersion by the ZINB in its NB component in modeling the count response for the at-risk group. Overdispersion arising in practice too often does not follow the NB, and applications of ZINB to such data yield invalid inference. If sources of overdispersion are known, other parametric models may be used to directly model the overdispersion. Such models too are subject to assumed distributions. Further, this approach may not be applicable if information about the sources of overdispersion is unavailable. In this paper, we propose a distribution-free alternative and compare its performance with these popular parametric models as well as a moment-based approach proposed by Yu et al. [Statistics in Medicine 2013; 32: 2390-2405]. Like the generalized estimating equations, the proposed approach requires no elaborate distribution assumptions. Compared with the approach of Yu et al., it is more robust to overdispersed zero-inflated responses. We illustrate our approach with both simulated and real study data.
Topics: Binomial Distribution; Biometry; Computer Simulation; HIV Infections; Humans; Likelihood Functions; Male; Models, Statistical; Poisson Distribution; Randomized Controlled Trials as Topic
PubMed: 26078035
DOI: 10.1002/sim.6560 -
Journal of Research in Medical Sciences... May 2014
PubMed: 25097634
DOI: No ID Found -
Journal of Multidisciplinary Healthcare 2022This article provides a thorough explanation of methods and theoretical concepts to detect infectivity of COVID-19. The concept of heterogeneity is discussed and its...
This article provides a thorough explanation of methods and theoretical concepts to detect infectivity of COVID-19. The concept of heterogeneity is discussed and its impacts on COVID-19 pandemics are explored. Observable heterogeneity is distinguished from non-observable heterogeneity. The data support the concepts of heterogeneity and the methods to extract and interpret the data evidence for the conclusions in this paper. Heterogeneity among the vulnerable to COVID-19 is a significant factor in the contagion of COVID-19, as demonstrated with incidence rates using data of a Diamond Princess cruise ship. Given the nature of the pandemic, its heterogeneity with different social norms, pre- and post-voyage quick testing procedures ought to become the new standard for cruise ship passengers and crew. With quick testing, identification of those infected and thus, not allowing to embark on a cruise or quarantine those disembarking, and other mitigation strategies, the popular cruise adventure could become norm for safe voyage. The novel method used in this article adds valuable insight in the modeling of disease and specifically, the COVID-19 virus.
PubMed: 35018100
DOI: 10.2147/JMDH.S322637