-
The New England Journal of Medicine Apr 2020Efforts to prevent infection continue to expand across the health care spectrum in the United States. Whether these efforts are reducing the national burden of...
BACKGROUND
Efforts to prevent infection continue to expand across the health care spectrum in the United States. Whether these efforts are reducing the national burden of infection is unclear.
METHODS
The Emerging Infections Program identified cases of infection (stool specimens positive for in a person ≥1 year of age with no positive test in the previous 8 weeks) in 10 U.S. sites. We used case and census sampling weights to estimate the national burden of infection, first recurrences, hospitalizations, and in-hospital deaths from 2011 through 2017. Health care-associated infections were defined as those with onset in a health care facility or associated with recent admission to a health care facility; all others were classified as community-associated infections. For trend analyses, we used weighted random-intercept models with negative binomial distribution and logistic-regression models to adjust for the higher sensitivity of nucleic acid amplification tests (NAATs) as compared with other test types.
RESULTS
The number of cases of infection in the 10 U.S. sites was 15,461 in 2011 (10,177 health care-associated and 5284 community-associated cases) and 15,512 in 2017 (7973 health care-associated and 7539 community-associated cases). The estimated national burden of infection was 476,400 cases (95% confidence interval [CI], 419,900 to 532,900) in 2011 and 462,100 cases (95% CI, 428,600 to 495,600) in 2017. With accounting for NAAT use, the adjusted estimate of the total burden of infection decreased by 24% (95% CI, 6 to 36) from 2011 through 2017; the adjusted estimate of the national burden of health care-associated infection decreased by 36% (95% CI, 24 to 54), whereas the adjusted estimate of the national burden of community-associated infection was unchanged. The adjusted estimate of the burden of hospitalizations for infection decreased by 24% (95% CI, 0 to 48), whereas the adjusted estimates of the burden of first recurrences and in-hospital deaths did not change significantly.
CONCLUSIONS
The estimated national burden of infection and associated hospitalizations decreased from 2011 through 2017, owing to a decline in health care-associated infections. (Funded by the Centers for Disease Control and Prevention.).
Topics: Clostridioides difficile; Clostridium Infections; Community-Acquired Infections; Cross Infection; Hospital Mortality; Hospitalization; Humans; Incidence; Population Surveillance; Recurrence; Treatment Outcome; United States
PubMed: 32242357
DOI: 10.1056/NEJMoa1910215 -
NAR Genomics and Bioinformatics Sep 2020The benefit of integrating batches of genomic data to increase statistical power is often hindered by batch effects, or unwanted variation in data caused by differences...
The benefit of integrating batches of genomic data to increase statistical power is often hindered by batch effects, or unwanted variation in data caused by differences in technical factors across batches. It is therefore critical to effectively address batch effects in genomic data to overcome these challenges. Many existing methods for batch effects adjustment assume the data follow a continuous, bell-shaped Gaussian distribution. However in RNA-seq studies the data are typically skewed, over-dispersed counts, so this assumption is not appropriate and may lead to erroneous results. Negative binomial regression models have been used previously to better capture the properties of counts. We developed a batch correction method, ComBat-seq, using a negative binomial regression model that retains the integer nature of count data in RNA-seq studies, making the batch adjusted data compatible with common differential expression software packages that require integer counts. We show in realistic simulations that the ComBat-seq adjusted data results in better statistical power and control of false positives in differential expression compared to data adjusted by the other available methods. We further demonstrated in a real data example that ComBat-seq successfully removes batch effects and recovers the biological signal in the data.
PubMed: 33015620
DOI: 10.1093/nargab/lqaa078 -
Physica A Feb 2021At the end of 2019, the current novel coronavirus emerged as a severe acute respiratory disease that has now become a worldwide pandemic. Future generations will look...
At the end of 2019, the current novel coronavirus emerged as a severe acute respiratory disease that has now become a worldwide pandemic. Future generations will look back on this difficult period and see how our society as a whole united and rose to this challenge. Many reports have suggested that this new virus is becoming comparable to the Spanish flu pandemic of 1918. We provide a statistical study on the modelling and analysis of the daily incidence of COVID-19 in eighteen countries around the world. In particular, we investigate whether it is possible to fit count regression models to the number of daily new cases of COVID-19 in various countries and make short term predictions of these numbers. The results suggest that the biggest advantage of these methods is that they are simplistic and straightforward allowing us to obtain preliminary results and an overall picture of the trends in the daily confirmed cases of COVID-19 around the world. The best fitting count regression model for modelling the number of new daily COVID-19 cases of all countries analysed was shown to be a negative binomial distribution with log link function. Whilst the results cannot solely be used to determine and influence policy decisions, they provide an alternative to more specialised epidemiological models and can help to support or contradict results obtained from other analysis.
PubMed: 33162665
DOI: 10.1016/j.physa.2020.125460 -
BMC Cancer Jan 2022The incidence of early-onset colorectal cancer (EOCRC) is increasing at an alarming rate and further studies are needed to identify risk factors and to develop... (Meta-Analysis)
Meta-Analysis Review
BACKGROUND
The incidence of early-onset colorectal cancer (EOCRC) is increasing at an alarming rate and further studies are needed to identify risk factors and to develop prevention strategies.
METHODS
Risk factors significantly associated with EOCRC were identified using meta-analysis. An individual risk appraisal model was constructed using the Rothman-Keller model. Next, a group of random data sets was generated using the binomial distribution function method, to determine nodes of risk assessment levels and to identify low, medium, and high risk populations.
RESULTS
A total of 32,843 EOCRC patients were identified in this study, and nine significant risk factors were identified using meta-analysis, including male sex, Caucasian ethnicity, sedentary lifestyle, inflammatory bowel disease, and high intake of red meat and processed meat. After simulating the risk assessment data of 10,000 subjects, scores of 0 to 0.0018, 0.0018 to 0.0036, and 0.0036 or more were respectively considered as low-, moderate-, and high-risk populations for the EOCRC population based on risk trends from the Rothman-Keller model.
CONCLUSION
This model can be used for screening of young adults to predict high risk of EOCRC and will contribute to the primary prevention strategies and the reduction of risk of developing EOCRC.
Topics: Adult; Clinical Decision Rules; Colorectal Neoplasms; Early Detection of Cancer; Female; Humans; Incidence; Male; Middle Aged; Risk Assessment; Risk Factors; Young Adult
PubMed: 35093005
DOI: 10.1186/s12885-022-09238-4 -
Public Health Aug 2020This study aimed to examine the link between human mobility and the number of coronavirus disease 2019 (COVID-19)-infected people in countries.
OBJECTIVES
This study aimed to examine the link between human mobility and the number of coronavirus disease 2019 (COVID-19)-infected people in countries.
STUDY DESIGN
Our data set covers 144 countries for which complete data are available. To analyze the link between human mobility and COVID-19-infected people, our study focused on the volume of air travel, the number of airports, and the Schengen system.
METHODS
To analyze the variation in COVID-19-infected people in countries, we used negative binomial regression analysis.
RESULTS
Our findings suggest a positive relationship between higher volume of airline passenger traffic carried in a country and higher numbers of patients with COVID-19. We further found that countries which have a higher number of airports are associated with higher number of COVID-19 cases. Schengen countries, countries which have higher population density, and higher percentage of elderly population are also found to be more likely to have more COVID-19 cases than other countries.
CONCLUSIONS
The article brings a novel insight into the COVID-19 pandemic from a human mobility perspective. Future research should assess the impacts of the scale of sea/bus/car travel on the epidemic. The findings of this article are relevant for public health authorities, community and health service providers, as well as policy-makers.
Topics: Airports; Binomial Distribution; COVID-19; Coronavirus Infections; Global Health; Humans; Pandemics; Pneumonia, Viral; Regression Analysis; Travel
PubMed: 32739776
DOI: 10.1016/j.puhe.2020.07.002 -
Journal of Applied Statistics 2022We propose a new model called LogLindley-Binomial and ordinal joint model with random effects for analyzing mixed overdispersed binomial and ordinal longitudinal...
We propose a new model called LogLindley-Binomial and ordinal joint model with random effects for analyzing mixed overdispersed binomial and ordinal longitudinal responses. A new distribution called the LogLindley-Binomial is presented, which is appropriate for the analysis of overdispersed binomial variables. A full likelihood-based approach is used to obtain maximum likelihood estimates. A comparison between LogLindley-Binomial and Beta-Binomial distributions are given by a simulation study. Also, to illustrate the utility of the proposed model, some simulation studies are conducted. In simulation studies, the performances of the LogLindley-Binomial distribution and the proposed model are well in some situations. Also, the new model's performance for analyzing a real dataset, extracted from the British Household Panel Survey, is studied. The proposed model performs well in comparison with another model for analyzing real data. Finally, the proposed distribution and the new model are found to be applicable for analyzing overdispersed binomial and mixed data.
PubMed: 35707561
DOI: 10.1080/02664763.2021.1881455 -
Journal of Statistical Theory and... 2022Two families of bivariate discrete Poisson-Lindley distributions are introduced. The first is derived by mixing the common parameter in a bivariate Poisson distribution...
Two families of bivariate discrete Poisson-Lindley distributions are introduced. The first is derived by mixing the common parameter in a bivariate Poisson distribution by different models of univariate continuous Lindley distributions. The second is obtained by generalizing a bivariate binomial distribution with respect to its exponent when it follows any of five different univariate discrete Poisson-Lindley distributions with one or two parameters. The use of probability-generating functions is mainly employed to derive some general properties for both families and specific characteristics for each one of their members. We obtain expressions for probabilities, moments, conditional distributions, regression functions, as well as characterizations for certain bivariate models and their marginals. An attractive property of all bivariate individual models is that they contain only two or three parameters, and one of them is readily estimated by simple ratios of their sample means. This feature, and since all marginal distributions are over-dispersed, strongly suggests their potential use to describe bivariate dependent count data in many different areas.
PubMed: 35493334
DOI: 10.1007/s42519-022-00261-z -
PloS One 2022RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A...
RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution.
Topics: Binomial Distribution; High-Throughput Nucleotide Sequencing; RNA-Seq; Sample Size; Sequence Analysis, RNA
PubMed: 36112652
DOI: 10.1371/journal.pone.0264246 -
Journal of Multidisciplinary Healthcare 2022This article provides a thorough explanation of methods and theoretical concepts to detect infectivity of COVID-19. The concept of heterogeneity is discussed and its...
This article provides a thorough explanation of methods and theoretical concepts to detect infectivity of COVID-19. The concept of heterogeneity is discussed and its impacts on COVID-19 pandemics are explored. Observable heterogeneity is distinguished from non-observable heterogeneity. The data support the concepts of heterogeneity and the methods to extract and interpret the data evidence for the conclusions in this paper. Heterogeneity among the vulnerable to COVID-19 is a significant factor in the contagion of COVID-19, as demonstrated with incidence rates using data of a Diamond Princess cruise ship. Given the nature of the pandemic, its heterogeneity with different social norms, pre- and post-voyage quick testing procedures ought to become the new standard for cruise ship passengers and crew. With quick testing, identification of those infected and thus, not allowing to embark on a cruise or quarantine those disembarking, and other mitigation strategies, the popular cruise adventure could become norm for safe voyage. The novel method used in this article adds valuable insight in the modeling of disease and specifically, the COVID-19 virus.
PubMed: 35018100
DOI: 10.2147/JMDH.S322637 -
Scientific Reports Jul 2023Among diseases, cancer exhibits the fastest global spread, presenting a substantial challenge for patients, their families, and the communities they belong to. This...
Among diseases, cancer exhibits the fastest global spread, presenting a substantial challenge for patients, their families, and the communities they belong to. This paper is devoted to modeling such a disease as a special case. A newly proposed distribution called the binomial-discrete Erlang-truncated exponential (BDETE) is introduced. The BDETE is a mixture of binomial distribution with the number of trials (parameter [Formula: see text]) taken after a discrete Erlang-truncated exponential distribution. A comprehensive mathematical treatment of the proposed distribution and expressions of its density, cumulative distribution function, survival function, failure rate function, Quantile function, moment generating function, Shannon entropy, order statistics, and stress-strength reliability, are provided. The distribution's parameters are estimated using the maximum likelihood method. Two real-world lifetime count data sets from the cancer disease, both of which are right-skewed and over-dispersed, are fitted using the proposed BDETE distribution to evaluate its efficacy and viability. We expect the findings to become standard works in probability theory and its related fields.
Topics: Humans; Reproducibility of Results; Statistical Distributions; Entropy; Neoplasms
PubMed: 37507433
DOI: 10.1038/s41598-023-38709-2