statistical significance - OpenMD.com Journal Search

The Cochrane Database of Systematic... Jun 2014

Bronchiolitis is an acute, viral lower respiratory tract infection affecting infants and is sometimes treated with bronchodilators. (Meta-Analysis)

Summary PubMed Full Text PDF

Meta-Analysis Review

Authors: Anne M Gadomski, Melissa B Scribani

BACKGROUND

Bronchiolitis is an acute, viral lower respiratory tract infection affecting infants and is sometimes treated with bronchodilators.

OBJECTIVES

To assess the effects of bronchodilators on clinical outcomes in infants (0 to 12 months) with acute bronchiolitis.

SEARCH METHODS

We searched CENTRAL 2013, Issue 12, MEDLINE (1966 to January Week 2, 2014) and EMBASE (1998 to January 2014).

SELECTION CRITERIA

Randomized controlled trials (RCTs) comparing bronchodilators (other than epinephrine) with placebo for bronchiolitis.

DATA COLLECTION AND ANALYSIS

Two authors assessed trial quality and extracted data. We obtained unpublished data from trial authors.

MAIN RESULTS

We included 30 trials (35 data sets) representing 1992 infants with bronchiolitis. In 11 inpatient and 10 outpatient studies, oxygen saturation did not improve with bronchodilators (mean difference (MD) -0.43, 95% confidence interval (CI) -0.92 to 0.06, n = 1242). Outpatient bronchodilator treatment did not reduce the rate of hospitalization (11.9% in bronchodilator group versus 15.9% in placebo group, odds ratio (OR) 0.75, 95% CI 0.46 to 1.21, n = 710). Inpatient bronchodilator treatment did not reduce the duration of hospitalization (MD 0.06, 95% CI -0.27 to 0.39, n = 349).Effect estimates for inpatients (MD -0.62, 95% CI -1.40 to 0.16) were slightly larger than for outpatients (MD -0.25, 95% CI -0.61 to 0.11) for oximetry. Oximetry outcomes showed significant heterogeneity (I(2) statistic = 81%). Including only studies with low risk of bias had little impact on the overall effect size of oximetry (MD -0.38, 95% CI -0.75 to 0.00) but results were close to statistical significance.In eight inpatient studies, there was no change in average clinical score (standardized MD (SMD) -0.14, 95% CI -0.41 to 0.12) with bronchodilators. In nine outpatient studies, the average clinical score decreased slightly with bronchodilators (SMD -0.42, 95% CI -0.79 to -0.06), a statistically significant finding of questionable clinical importance. The clinical score outcome showed significant heterogeneity (I(2) statistic = 73%). Including only studies with low risk of bias reduced the heterogeneity but had little impact on the overall effect size of average clinical score (SMD -0.22, 95% CI -0.41 to -0.03).Sub-analyses limited to nebulized albuterol or salbutamol among outpatients (nine studies) showed no effect on oxygen saturation (MD -0.19, 95% CI -0.59 to 0.21, n = 572), average clinical score (SMD -0.36, 95% CI -0.83 to 0.11, n = 532) or hospital admission after treatment (OR 0.77, 95% CI 0.44 to 1.33, n = 404).Adverse effects included tachycardia, oxygen desaturation and tremors.

AUTHORS' CONCLUSIONS

Bronchodilators such as albuterol or salbutamol do not improve oxygen saturation, do not reduce hospital admission after outpatient treatment, do not shorten the duration of hospitalization and do not reduce the time to resolution of illness at home. Given the adverse side effects and the expense associated with these treatments, bronchodilators are not effective in the routine management of bronchiolitis. This meta-analysis continues to be limited by the small sample sizes and the lack of standardized study design and validated outcomes across the studies. Future trials with large sample sizes, standardized methodology across clinical sites and consistent assessment methods are needed to answer completely the question of efficacy.

Topics: Acute Disease; Albuterol; Ambulatory Care; Bronchiolitis; Bronchodilator Agents; Hospitalization; Humans; Infant; Infant, Newborn; Oxygen; Randomized Controlled Trials as Topic

PubMed: 24937099
DOI: 10.1002/14651858.CD001266.pub4

The chi-square test of independence.

Biochemia Medica 2013

The Chi-square statistic is a non-parametric (distribution free) tool designed to analyze group differences when the dependent variable is measured at a nominal level....

Summary PubMed Full Text PDF

Authors: Mary L McHugh

The Chi-square statistic is a non-parametric (distribution free) tool designed to analyze group differences when the dependent variable is measured at a nominal level. Like all non-parametric statistics, the Chi-square is robust with respect to the distribution of the data. Specifically, it does not require equality of variances among the study groups or homoscedasticity in the data. It permits evaluation of both dichotomous independent variables, and of multiple group studies. Unlike many other non-parametric and some parametric statistics, the calculations needed to compute the Chi-square provide considerable information about how each of the groups performed in the study. This richness of detail allows the researcher to understand the results and thus to derive more detailed information from this statistic than from many others. The Chi-square is a significance statistic, and should be followed with a strength statistic. The Cramer's V is the most common strength test used to test the data when a significant Chi-square result has been obtained. Advantages of the Chi-square include its robustness with respect to distribution of the data, its ease of computation, the detailed information that can be derived from the test, its use in studies for which parametric assumptions cannot be met, and its flexibility in handling data from both two group and multiple group studies. Limitations include its sample size requirements, difficulty of interpretation when there are large numbers of categories (20 or more) in the independent or dependent variables, and tendency of the Cramer's V to produce relative low correlation measures, even for highly significant results.

Topics: Chi-Square Distribution; Data Interpretation, Statistical

PubMed: 23894860
DOI: 10.11613/bm.2013.018

A systematic review assessing soft tissue augmentation techniques.

Clinical Oral Implants Research Sep 2009

The aim of the present review was to systematically assess the dental literature in terms of soft tissue grafting techniques. The focused question was: is one method... (Meta-Analysis)

Summary PubMed Full Text

Meta-Analysis Review

Authors: Daniel S Thoma, Goran I Benić, Marcel Zwahlen...

AIM

The aim of the present review was to systematically assess the dental literature in terms of soft tissue grafting techniques. The focused question was: is one method superior over others for augmentation and stability of the augmented soft tissue in terms of increasing the width of keratinized tissue (part 1) and gain in soft tissue volume (part 2).

METHODS

A Medline search was performed for human studies focusing on augmentation of keratinized tissue and/or soft tissue volume, and complemented by additional hand searching. Relevant studies were identified and statistical results were reported for meta-analyses including the test minus control weighted mean differences with 95% confidence intervals, the I-squared statistic for tests of heterogeneity, and the number of significant studies.

RESULTS

Twenty-five (part 1) and three (part 2) studies met the inclusion criteria; 14 studies (part 1) were eligible for comparison using meta-analyses. An apically positioned flap/vestibuloplasty (APF/V) procedure resulted in a statistically significantly greater gain in keratinized tissue than untreated controls. APF/V plus autogenous tissue revealed statistically significantly more attached gingiva compared with untreated controls and a borderline statistical significance compared with APF/V plus allogenic tissue. Statistically significantly more shrinkage was observed for the APF/V plus allogenic graft compared with the APF/V plus autogenous tissue. Patient-centered outcomes did not reveal any of the treatment methods to be superior regarding postoperative complications. The three studies reporting on soft tissue volume augmentation could not be compared due to lack of homogeneity. The use of subepithelial connective tissue grafts (SCTGs) resulted in statistically significantly more soft tissue volume gain compared with free gingival grafts (FGGs).

CONCLUSIONS

APF/V is a successful treatment concept to increase the width of keratinized tissue or attached gingiva around teeth. The addition of autogenous tissue statistically significantly increases the width of attached gingiva. For soft tissue volume augmentation, only limited data are available favoring SCTGs over FGG.

Topics: Collagen; Connective Tissue; Gingiva; Gingivoplasty; Humans; Keratins; Skin, Artificial; Vestibuloplasty

PubMed: 19663961
DOI: 10.1111/j.1600-0501.2009.01784.x

There is life beyond the statistical significance.

Reproductive Health Apr 2021

This article challenges the "tyranny of P-value" and promote more valuable and applicable interpretations of the results of research on health care delivery. We provide...

Summary PubMed Full Text PDF

Authors: Agustín Ciapponi, José M Belizán, Gilda Piaggio...

This article challenges the "tyranny of P-value" and promote more valuable and applicable interpretations of the results of research on health care delivery. We provide here solid arguments to retire statistical significance as the unique way to interpret results, after presenting the current state of the debate inside the scientific community. Instead, we promote reporting the much more informative confidence intervals and eventually adding exact P-values. We also provide some clues to integrate statistical and clinical significance by referring to minimal important differences and integrating the effect size of an intervention and the certainty of evidence ideally using the GRADE approach. We have argued against interpreting or reporting results as statistically significant or statistically non-significant. We recommend showing important clinical benefits with their confidence intervals in cases of point estimates compatible with results benefits and even important harms. It seems fair to report the point estimate and the more likely values along with a very clear statement of the implications of extremes of the intervals. We recommend drawing conclusions, considering the multiple factors besides P-values such as certainty of the evidence for each outcome, net benefit, economic considerations and values and preferences. We use several examples and figures to illustrate different scenarios and further suggest a wording to standardize the reporting. Several statistical measures have a role in the scientific communication of studies, but it is time to understand that there is life beyond the statistical significance. There is a great opportunity for improvement towards a more complete interpretation and to a more standardized reporting.

Topics: Data Interpretation, Statistical; Decision Making; Humans; Jurisprudence; Statistics as Topic

PubMed: 33865412
DOI: 10.1186/s12978-021-01131-w

Predictive power of statistical significance.

World Journal of Methodology Dec 2017

A statistically significant research finding should not be defined as a -value of 0.05 or less, because this definition does not take into account study power....

Summary PubMed Full Text PDF

Authors: Thomas F Heston, Jackson M King

A statistically significant research finding should not be defined as a -value of 0.05 or less, because this definition does not take into account study power. Statistical significance was originally defined by Fisher RA as a -value of 0.05 or less. According to Fisher, any finding that is likely to occur by random variation no more than 1 in 20 times is considered significant. Neyman J and Pearson ES subsequently argued that Fisher's definition was incomplete. They proposed that statistical significance could only be determined by analyzing the chance of incorrectly considering a study finding was significant (a Type I error) or incorrectly considering a study finding was insignificant (a Type II error). Their definition of statistical significance is also incomplete because the error rates are considered separately, not together. A better definition of statistical significance is the positive predictive value of a -value, which is equal to the power divided by the sum of power and the -value. This definition is more complete and relevant than Fisher's or Neyman-Peason's definitions, because it takes into account both concepts of statistical significance. Using this definition, a statistically significant finding requires a -value of 0.05 or less when the power is at least 95%, and a -value of 0.032 or less when the power is 60%. To achieve statistical significance, -values must be adjusted downward as the study power decreases.

PubMed: 29354483
DOI: 10.5662/wjm.v7.i4.112

Using the confidence interval confidently.

Journal of Thoracic Disease Oct 2017

Biomedical research is seldom done with entire populations but rather with samples drawn from a population. Although we work with samples, our goal is to describe and...

Summary PubMed Full Text PDF

Authors: Avijit Hazra

Biomedical research is seldom done with entire populations but rather with samples drawn from a population. Although we work with samples, our goal is to describe and draw inferences regarding the underlying population. It is possible to use a sample statistic and estimates of error in the sample to get a fair idea of the population parameter, not as a single value, but as a range of values. This range is the confidence interval (CI) which is estimated on the basis of a desired confidence level. Calculation of the CI of a sample statistic takes the general form: CI = Point estimate ± Margin of error, where the margin of error is given by the product of a critical value (z) derived from the standard normal curve and the standard error of point estimate. Calculation of the standard error varies depending on whether the sample statistic of interest is a mean, proportion, odds ratio (OR), and so on. The factors affecting the width of the CI include the desired confidence level, the sample size and the variability in the sample. Although the 95% CI is most often used in biomedical research, a CI can be calculated for any level of confidence. A 99% CI will be wider than 95% CI for the same sample. Conflict between clinical importance and statistical significance is an important issue in biomedical research. Clinical importance is best inferred by looking at the effect size, that is how much is the actual change or difference. However, statistical significance in terms of P only suggests whether there is any difference in probability terms. Use of the CI supplements the P value by providing an estimate of actual clinical effect. Of late, clinical trials are being designed specifically as superiority, non-inferiority or equivalence studies. The conclusions from these alternative trial designs are based on CI values rather than the P value from intergroup comparison.

PubMed: 29268424
DOI: 10.21037/jtd.2017.09.14

Reasoning under uncertainty.

Evidence-based Mental Health Feb 2019

It is difficult to reason correctly when the information available is uncertain. Reasoning under uncertainty is also known as probabilistic reasoning.

Summary PubMed Full Text PDF

Authors: Colin Aitken, Dimitris Mavridis

INTRODUCTION

It is difficult to reason correctly when the information available is uncertain. Reasoning under uncertainty is also known as probabilistic reasoning.

METHODS

We discuss probabilistic reasoning in the context of a medical diagnosis or prognosis. The information available are symptoms for the diagnosis or diagnosis for the prognosis. We show how probabilities of events are updated in the light of new evidence (conditional probabilities/Bayes' theorem). A resolution is explained in which the support of the information for the diagnosis or prognosis is measured by the comparison of two probabilities, a statistic also known as the likelihood ratio.

RESULTS

The likelihood ratio is a continuous measure of support that is not subject to the discrete nature of statistical significance where a result is either classified as 'significant' or 'not significant'. It updates prior beliefs about diagnoses or prognoses in a coherent manner and enables proper consideration of successive pieces of information.

DISCUSSION

Probabilistic reasoning is not innate and relies on good education. Common mistakes include the 'prosecutor's fallacy' and the interpretation of relative measures without consideration of the actual risks of the outcome, for example, interpretation of a likelihood ratio without taking into account the prior odds.

Topics: Clinical Decision-Making; Humans; Models, Theoretical; Thinking; Uncertainty

PubMed: 30679196
DOI: 10.1136/ebmental-2018-300074

Statistical significance and publication reporting bias in abstracts of reproductive medicine studies.

Human Reproduction (Oxford, England) Nov 2023

What were the frequency and temporal trends of reporting P-values and effect measures in the abstracts of reproductive medicine studies in 1990-2022, how were reported...

Summary PubMed Full Text PDF

Authors: Qian Feng, Ben W Mol, John P A Ioannidis...

STUDY QUESTION

What were the frequency and temporal trends of reporting P-values and effect measures in the abstracts of reproductive medicine studies in 1990-2022, how were reported P-values distributed, and what proportion of articles that present with statistical inference reported statistically significant results, i.e. 'positive' results?

SUMMARY ANSWER

Around one in six abstracts reported P-values alone without effect measures, while the prevalence of effect measures, whether reported alone or accompanied by P-values, has been increasing, especially in meta-analyses and randomized controlled trials (RCTs); the reported P-values were frequently observed around certain cut-off values, notably at 0.001, 0.01, or 0.05, and among abstracts present with statistical inference (i.e. P-value, CIs, or significant terms), a large majority (77%) reported at least one statistically significant finding.

WHAT IS KNOWN ALREADY

Publishing or reporting only results that show a 'positive' finding causes bias in evaluating interventions and risk factors and may incur adverse health outcomes for patients.

UNLABELLED

Despite efforts to minimize publication reporting bias in medical research, it remains unclear whether the magnitude and patterns of the bias have changed over time.

STUDY DESIGN, SIZE, DURATION

We studied abstracts of reproductive medicine studies from 1990 to 2022. The reproductive medicine studies were published in 23 first-quartile journals under the category of Obstetrics and Gynaecology and Reproductive Biology in Journal Citation Reports and 5 high-impact general medical journals (The Journal of the American Medical Association, The Lancet, The BMJ, The New England Journal of Medicine, and PLoS Medicine). Articles without abstracts, animal studies, and non-research articles, such as case reports or guidelines, were excluded.

PARTICIPANTS/MATERIALS, SETTING, METHODS

Automated text-mining was used to extract three types of statistical significance reporting, including P-values, CIs, and text description. Meanwhile, abstracts were text-mined for the presence of effect size metrics and Bayes factors. Five hundred abstracts were randomly selected and manually checked for the accuracy of automatic text extraction. The extracted statistical significance information was then analysed for temporal trends and distribution in general as well as in subgroups of study designs and journals.

MAIN RESULTS AND THE ROLE OF CHANCE

A total of 24 907 eligible reproductive medicine articles were identified from 170 739 screened articles published in 28 journals. The proportion of abstracts not reporting any statistical significance inference halved from 81% (95% CI, 76-84%) in 1990 to 40% (95% CI, 38-44%) in 2021, while reporting P-values alone remained relatively stable, at 15% (95% CI, 12-18%) in 1990 and 19% (95% CI, 16-22%) in 2021. By contrast, the proportion of abstracts reporting effect measures alone increased considerably from 4.1% (95% CI, 2.6-6.3%) in 1990 to 26% (95% CI, 23-29%) in 2021. Similarly, the proportion of abstracts reporting effect measures together with P-values showed substantial growth from 0.8% (95% CI, 0.3-2.2%) to 14% (95% CI, 12-17%) during the same timeframe. Of 30 182 statistical significance inferences, 56% (n = 17 077) conveyed statistical inferences via P-values alone, 30% (n = 8945) via text description alone such as significant or non-significant, 9.3% (n = 2820) via CIs alone, and 4.7% (n = 1340) via both CI and P-values. The reported P-values (n = 18 417), including both a continuum of P-values and dichotomized P-values, were frequently observed around common cut-off values such as 0.001 (20%), 0.05 (16%), and 0.01 (10%). Of the 13 200 reproductive medicine abstracts containing at least one statistical inference, 77% of abstracts made at least one statistically significant statement. Among articles that reported statistical inference, a decline in the proportion of making at least one statistically significant inference was only seen in RCTs, dropping from 71% (95% CI, 48-88%) in 1990 to 59% (95% CI, 42-73%) in 2021, whereas the proportion in the rest of study types remained almost constant over the years. Of abstracts that reported P-value, 87% (95% CI, 86-88%) reported at least one statistically significant P-value; it was 92% (95% CI, 82-97%) in 1990 and reached its peak at 97% (95% CI, 93-99%) in 2001 before declining to 81% (95% CI, 76-85%) in 2021.

LIMITATIONS, REASONS FOR CAUTION

First, our analysis focused solely on reporting patterns in abstracts but not full-text papers; however, in principle, abstracts should include condensed impartial information and avoid selective reporting. Second, while we attempted to identify all types of statistical significance reporting, our text mining was not flawless. However, the manual assessment showed that inaccuracies were not frequent.

WIDER IMPLICATIONS OF THE FINDINGS

There is a welcome trend that effect measures are increasingly reported in the abstracts of reproductive medicine studies, specifically in RCTs and meta-analyses. Publication reporting bias remains a major concern. Inflated estimates of interventions and risk factors could harm decisions built upon biased evidence, including clinical recommendations and planning of future research.

STUDY FUNDING/COMPETING INTEREST(S)

No funding was received for this study. B.W.M. is supported by an NHMRC Investigator grant (GNT1176437); B.W.M. reports research grants and travel support from Merck and consultancy from Merch and ObsEva. W.L. is supported by an NHMRC Investigator Grant (GNT2016729). Q.F. reports receiving a PhD scholarship from Merck. The other author has no conflict of interest to declare.

TRIAL REGISTRATION NUMBER

N/A.

PubMed: 38015794
DOI: 10.1093/humrep/dead248

Problems and alternatives of testing significance using null hypothesis and -value in food research.

Food Science and Biotechnology May 2023

A testing method to identify statistically significant differences by comparing the significance level and the probability value based on the Null Hypothesis... (Review)

Summary PubMed Full Text PDF

Review

Authors: Won-Seok Choi

A testing method to identify statistically significant differences by comparing the significance level and the probability value based on the Null Hypothesis Significance Test (NHST) has been used in food research. However, problems with this testing method have been discussed. Several alternatives to the NHST and the -value test methods have been proposed including lowering the -value threshold and using confidence interval (CI), effect size, and Bayesian statistics. The CI estimates the extent of the effect or difference and determines the presence or absence of statistical significance. The effect size index determines the degree of effect difference and allows for the comparison of various statistical results. Bayesian statistics enable predictions to be made even when only a small amount of data is available. In conclusion, CI, effect size, and Bayesian statistics can complement or replace traditional statistical tests in food research by replacing the use of NHST and -value.

PubMed: 37363053
DOI: 10.1007/s10068-023-01348-4

Health significance and statistical uncertainty. The value of P-value.

La Medicina Del Lavoro Oct 2017

The P-value is widely used as a summary statistics of scientific results. Unfortunately, there is a widespread tendency to dichotomize its value in "P<0.05" (defined as...

Summary PubMed Full Text

Authors: Dario Consonni, Pier Alberto Bertazzi

BACKGROUND

The P-value is widely used as a summary statistics of scientific results. Unfortunately, there is a widespread tendency to dichotomize its value in "P<0.05" (defined as "statistically significant") and "P>0.05" ("statistically not significant"), with the former implying a "positive" result and the latter a "negative" one.

OBJECTIVE

To show the unsuitability of such an approach when evaluating the effects of environmental and occupational risk factors.

METHODS

We provide examples of distorted use of P-value and of the negative consequences for science and public health of such a black-and-white vision.

RESULTS

The rigid interpretation of P-value as a dichotomy favors the confusion between health relevance and statistical significance, discourages thoughtful thinking, and distorts attention from what really matters, the health significance.

DISCUSSION

A much better way to express and communicate scientific results involves reporting effect estimates (e.g., risks, risks ratios or risk differences) and their confidence intervals (CI), which summarize and convey both health significance and statistical uncertainty. Unfortunately, many researchers do not usually consider the whole interval of CI but only examine if it includes the null-value, therefore degrading this procedure to the same P-value dichotomy (statistical significance or not).

CONCLUSIONS

In reporting statistical results of scientific research present effects estimates with their confidence intervals and do not qualify the P-value as "significant" or "not significant".

Topics: Humans; Occupational Health; Uncertainty

PubMed: 29084124
DOI: 10.23749/mdl.v108i5.6603