-
Genome Biology 2010High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in...
High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.
Topics: Animals; Binomial Distribution; Chromatin Immunoprecipitation; Computational Biology; Drosophila; Gene Expression Profiling; High-Throughput Nucleotide Sequencing; Linear Models; Models, Genetic; Saccharomyces cerevisiae; Sequence Analysis, RNA; Stem Cells; Tissue Culture Techniques
PubMed: 20979621
DOI: 10.1186/gb-2010-11-10-r106 -
Infectious Disease Modelling Dec 2023Accurately estimating the effective reproduction number is crucial for characterizing the transmissibility of infectious diseases to optimize interventions and responses...
Accurately estimating the effective reproduction number is crucial for characterizing the transmissibility of infectious diseases to optimize interventions and responses during epidemic outbreaks. In this study, we improve the estimation of the effective reproduction number through two main approaches. First, we derive a discrete model to represent a time series of case counts and propose an estimation method based on this framework. We also conduct numerical experiments to demonstrate the effectiveness of the proposed discretization scheme. By doing so, we enhance the accuracy of approximating the underlying epidemic process compared to previous methods, even when the counting period is similar to the mean generation time of an infectious disease. Second, we employ a negative binomial distribution to model the variability of count data to accommodate overdispersion. Specifically, given that observed incidence counts follow a negative binomial distribution, the posterior distribution of secondary infections is obtained as a Dirichlet multinomial distribution. With this formulation, we establish posterior uncertainty bounds for the effective reproduction number. Finally, we demonstrate the effectiveness of the proposed method using incidence data from the COVID-19 pandemic.
PubMed: 37701756
DOI: 10.1016/j.idm.2023.08.006 -
Journal of Statistical Theory and... 2022Two families of bivariate discrete Poisson-Lindley distributions are introduced. The first is derived by mixing the common parameter in a bivariate Poisson distribution...
Two families of bivariate discrete Poisson-Lindley distributions are introduced. The first is derived by mixing the common parameter in a bivariate Poisson distribution by different models of univariate continuous Lindley distributions. The second is obtained by generalizing a bivariate binomial distribution with respect to its exponent when it follows any of five different univariate discrete Poisson-Lindley distributions with one or two parameters. The use of probability-generating functions is mainly employed to derive some general properties for both families and specific characteristics for each one of their members. We obtain expressions for probabilities, moments, conditional distributions, regression functions, as well as characterizations for certain bivariate models and their marginals. An attractive property of all bivariate individual models is that they contain only two or three parameters, and one of them is readily estimated by simple ratios of their sample means. This feature, and since all marginal distributions are over-dispersed, strongly suggests their potential use to describe bivariate dependent count data in many different areas.
PubMed: 35493334
DOI: 10.1007/s42519-022-00261-z -
Scientific Reports Jul 2023Among diseases, cancer exhibits the fastest global spread, presenting a substantial challenge for patients, their families, and the communities they belong to. This...
Among diseases, cancer exhibits the fastest global spread, presenting a substantial challenge for patients, their families, and the communities they belong to. This paper is devoted to modeling such a disease as a special case. A newly proposed distribution called the binomial-discrete Erlang-truncated exponential (BDETE) is introduced. The BDETE is a mixture of binomial distribution with the number of trials (parameter [Formula: see text]) taken after a discrete Erlang-truncated exponential distribution. A comprehensive mathematical treatment of the proposed distribution and expressions of its density, cumulative distribution function, survival function, failure rate function, Quantile function, moment generating function, Shannon entropy, order statistics, and stress-strength reliability, are provided. The distribution's parameters are estimated using the maximum likelihood method. Two real-world lifetime count data sets from the cancer disease, both of which are right-skewed and over-dispersed, are fitted using the proposed BDETE distribution to evaluate its efficacy and viability. We expect the findings to become standard works in probability theory and its related fields.
Topics: Humans; Reproducibility of Results; Statistical Distributions; Entropy; Neoplasms
PubMed: 37507433
DOI: 10.1038/s41598-023-38709-2 -
Physica A Feb 2021At the end of 2019, the current novel coronavirus emerged as a severe acute respiratory disease that has now become a worldwide pandemic. Future generations will look...
At the end of 2019, the current novel coronavirus emerged as a severe acute respiratory disease that has now become a worldwide pandemic. Future generations will look back on this difficult period and see how our society as a whole united and rose to this challenge. Many reports have suggested that this new virus is becoming comparable to the Spanish flu pandemic of 1918. We provide a statistical study on the modelling and analysis of the daily incidence of COVID-19 in eighteen countries around the world. In particular, we investigate whether it is possible to fit count regression models to the number of daily new cases of COVID-19 in various countries and make short term predictions of these numbers. The results suggest that the biggest advantage of these methods is that they are simplistic and straightforward allowing us to obtain preliminary results and an overall picture of the trends in the daily confirmed cases of COVID-19 around the world. The best fitting count regression model for modelling the number of new daily COVID-19 cases of all countries analysed was shown to be a negative binomial distribution with log link function. Whilst the results cannot solely be used to determine and influence policy decisions, they provide an alternative to more specialised epidemiological models and can help to support or contradict results obtained from other analysis.
PubMed: 33162665
DOI: 10.1016/j.physa.2020.125460 -
PloS One 2022RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A...
RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution.
Topics: Binomial Distribution; High-Throughput Nucleotide Sequencing; RNA-Seq; Sample Size; Sequence Analysis, RNA
PubMed: 36112652
DOI: 10.1371/journal.pone.0264246 -
Journal of Research in Medical Sciences... May 2014
PubMed: 25097634
DOI: No ID Found -
Statistics in Medicine Oct 2015Zero-inflated Poisson (ZIP) and negative binomial (ZINB) models are widely used to model zero-inflated count responses. These models extend the Poisson and negative...
Zero-inflated Poisson (ZIP) and negative binomial (ZINB) models are widely used to model zero-inflated count responses. These models extend the Poisson and negative binomial (NB) to address excessive zeros in the count response. By adding a degenerate distribution centered at 0 and interpreting it as describing a non-risk group in the population, the ZIP (ZINB) models a two-component population mixture. As in applications of Poisson and NB, the key difference between ZIP and ZINB is the allowance for overdispersion by the ZINB in its NB component in modeling the count response for the at-risk group. Overdispersion arising in practice too often does not follow the NB, and applications of ZINB to such data yield invalid inference. If sources of overdispersion are known, other parametric models may be used to directly model the overdispersion. Such models too are subject to assumed distributions. Further, this approach may not be applicable if information about the sources of overdispersion is unavailable. In this paper, we propose a distribution-free alternative and compare its performance with these popular parametric models as well as a moment-based approach proposed by Yu et al. [Statistics in Medicine 2013; 32: 2390-2405]. Like the generalized estimating equations, the proposed approach requires no elaborate distribution assumptions. Compared with the approach of Yu et al., it is more robust to overdispersed zero-inflated responses. We illustrate our approach with both simulated and real study data.
Topics: Binomial Distribution; Biometry; Computer Simulation; HIV Infections; Humans; Likelihood Functions; Male; Models, Statistical; Poisson Distribution; Randomized Controlled Trials as Topic
PubMed: 26078035
DOI: 10.1002/sim.6560 -
Computer Methods and Programs in... Oct 2021Our goal is to provide an overall strategy for utilizing continuous accelerated life models in the discrete setting that provides a unique and flexible modeling approach... (Review)
Review
BACKGROUND AND OBJECTIVE
Our goal is to provide an overall strategy for utilizing continuous accelerated life models in the discrete setting that provides a unique and flexible modeling approach across a variety of hazard shapes.
METHODS
We convert well-known continuous accelerated life distributions into their discrete counterpart and show theoretically that the existing software that currently exists to accommodate, left, right and interval censoring in the continuous case is re-usable in the discrete setting due to the structure of the likelihood equations.
RESULTS
We demonstrate across a variety of simulated and real-world data that our modeling approach can accommodate discrete data that may either be approximately symmetric, left-skewed or right skewed, overcoming the limitations of more traditional modeling approaches.
CONCLUSIONS
We illustrate both theoretically and through simulations that our approach for accommodating discrete failure time and count data is quite flexible. We demonstrate that the special case of the discrete Weibull model readily can accommodate truly Poisson distributed data and has a great degree of flexibility for non-Poisson distributed data.
Topics: Models, Statistical; Software; Survival Analysis
PubMed: 34469807
DOI: 10.1016/j.cmpb.2021.106337 -
Archives of Disease in Childhood Feb 1993
Review
Topics: Analysis of Variance; Binomial Distribution; Discriminant Analysis; Predictive Value of Tests; Statistics as Topic
PubMed: 8481051
DOI: 10.1136/adc.68.2.246