-
Indian Journal of Psychological Medicine 2019The calculation of a value in research and especially the use of a threshold to declare the statistical significance of the value have both been challenged in recent... (Review)
Review
The calculation of a value in research and especially the use of a threshold to declare the statistical significance of the value have both been challenged in recent years. There are at least two important reasons for this challenge: research data contain much more meaning than is summarized in a value and its statistical significance, and these two concepts are frequently misunderstood and consequently inappropriately interpreted. This article considers why 5% may be set as a reasonable cut-off for statistical significance, explains the correct interpretation of < 0.05 and other values of P, examines arguments for and against the concept of statistical significance, and suggests other and better ways for analyzing data and for presenting, interpreting, and discussing the results.
PubMed: 31142921
DOI: 10.4103/IJPSYM.IJPSYM_193_19 -
American Journal of Epidemiology Feb 2021Measures of information and surprise, such as the Shannon information value (S value), quantify the signal present in a stream of noisy data. We illustrate the use of...
Measures of information and surprise, such as the Shannon information value (S value), quantify the signal present in a stream of noisy data. We illustrate the use of such information measures in the context of interpreting P values as compatibility indices. S values help communicate the limited information supplied by conventional statistics and cast a critical light on cutoffs used to judge and construct those statistics. Misinterpretations of statistics may be reduced by interpreting P values and interval estimates using compatibility concepts and S values instead of "significance" and "confidence."
Topics: Confidence Intervals; Data Interpretation, Statistical; Epidemiologic Methods; Humans; Uncertainty
PubMed: 32648906
DOI: 10.1093/aje/kwaa136 -
Global Epidemiology Dec 2022Misinterpretations of -values and 95% confidence intervals are ubiquitous in medical research. Specifically, the terms significance or confidence, extensively used in...
Misinterpretations of -values and 95% confidence intervals are ubiquitous in medical research. Specifically, the terms significance or confidence, extensively used in medical papers, ignore biases and violations of statistical assumptions and hence should be called overconfidence terms. In this paper, we present the compatibility view of -values and confidence intervals; the P-value is interpreted as an index of compatibility between data and the model, including the test hypothesis and background assumptions, whereas a confidence interval is interpreted as the range of parameter values that are compatible with the data under background assumptions. We also suggest the use of a surprisal measure, often referred to as the S-value, a novel metric that transforms the -value, for gauging compatibility in terms of an intuitive experiment of coin tossing.
PubMed: 37637018
DOI: 10.1016/j.gloepi.2022.100085 -
The Journal of Arthroplasty Oct 2022The results of statistical tests in orthopedic studies are typically reported using P-values. If a P-value is smaller than the pre-determined level of significance (eg,... (Review)
Review
The results of statistical tests in orthopedic studies are typically reported using P-values. If a P-value is smaller than the pre-determined level of significance (eg, < .05), the null hypothesis is rejected in support of the alternative. This automaticity in interpreting statistical results without consideration of the power of the study has been denounced over the years by statisticians, since it can potentially lead to misinterpretation of the study conclusions. In this paper, we review fundamental misconceptions and misinterpretations of P-values and power, along with their connection with confidence intervals, and we provide guidelines to orthopedic researchers for evaluating and reporting study results. We provide real-world orthopedic examples to illustrate the main concepts. Please visit the followinghttps://youtu.be/bdPU4luYmF0for videos that explain the highlights of the paper in practical terms.
Topics: Biomedical Research; Humans; Orthopedics; Statistics as Topic
PubMed: 36162927
DOI: 10.1016/j.arth.2022.05.026 -
Journal of Thoracic Disease Oct 2017Biomedical research is seldom done with entire populations but rather with samples drawn from a population. Although we work with samples, our goal is to describe and...
Biomedical research is seldom done with entire populations but rather with samples drawn from a population. Although we work with samples, our goal is to describe and draw inferences regarding the underlying population. It is possible to use a sample statistic and estimates of error in the sample to get a fair idea of the population parameter, not as a single value, but as a range of values. This range is the confidence interval (CI) which is estimated on the basis of a desired confidence level. Calculation of the CI of a sample statistic takes the general form: CI = Point estimate ± Margin of error, where the margin of error is given by the product of a critical value (z) derived from the standard normal curve and the standard error of point estimate. Calculation of the standard error varies depending on whether the sample statistic of interest is a mean, proportion, odds ratio (OR), and so on. The factors affecting the width of the CI include the desired confidence level, the sample size and the variability in the sample. Although the 95% CI is most often used in biomedical research, a CI can be calculated for any level of confidence. A 99% CI will be wider than 95% CI for the same sample. Conflict between clinical importance and statistical significance is an important issue in biomedical research. Clinical importance is best inferred by looking at the effect size, that is how much is the actual change or difference. However, statistical significance in terms of P only suggests whether there is any difference in probability terms. Use of the CI supplements the P value by providing an estimate of actual clinical effect. Of late, clinical trials are being designed specifically as superiority, non-inferiority or equivalence studies. The conclusions from these alternative trial designs are based on CI values rather than the P value from intergroup comparison.
PubMed: 29268424
DOI: 10.21037/jtd.2017.09.14 -
Patterns (New York, N.Y.) Dec 2023Since the 18th century, the p value has been an important part of hypothesis-based scientific investigation. As statistical and data science engines accelerate,... (Review)
Review
Since the 18th century, the p value has been an important part of hypothesis-based scientific investigation. As statistical and data science engines accelerate, questions emerge: to what extent are scientific discoveries based on p values reliable and reproducible? Should one adjust the significance level or find alternatives for the p value? Inspired by these questions and everlasting attempts to address them, here, we provide a systematic examination of the p value from its roles and merits to its misuses and misinterpretations. For the latter, we summarize modest recommendations to handle them. In parallel, we present the Bayesian alternatives for seeking evidence and discuss the pooling of p values from multiple studies and datasets. Overall, we argue that the p value and hypothesis testing form a useful probabilistic decision-making mechanism, facilitating causal inference, feature selection, and predictive modeling, but that the interpretation of the p value must be contextual, considering the scientific question, experimental design, and statistical principles.
PubMed: 38106615
DOI: 10.1016/j.patter.2023.100878 -
BMC Medical Research Methodology Jun 2020In medical research and practice, the p-value is arguably the most often used statistic and yet it is widely misconstrued as the probability of the type I error, which...
BACKGROUND
In medical research and practice, the p-value is arguably the most often used statistic and yet it is widely misconstrued as the probability of the type I error, which comes with serious consequences. This misunderstanding can greatly affect the reproducibility in research, treatment selection in medical practice, and model specification in empirical analyses. By using plain language and concrete examples, this paper is intended to elucidate the p-value confusion from its root, to explicate the difference between significance and hypothesis testing, to illuminate the consequences of the confusion, and to present a viable alternative to the conventional p-value.
MAIN TEXT
The confusion with p-values has plagued the research community and medical practitioners for decades. However, efforts to clarify it have been largely futile, in part, because intuitive yet mathematically rigorous educational materials are scarce. Additionally, the lack of a practical alternative to the p-value for guarding against randomness also plays a role. The p-value confusion is rooted in the misconception of significance and hypothesis testing. Most, including many statisticians, are unaware that p-values and significance testing formed by Fisher are incomparable to the hypothesis testing paradigm created by Neyman and Pearson. And most otherwise great statistics textbooks tend to cobble the two paradigms together and make no effort to elucidate the subtle but fundamental differences between them. The p-value is a practical tool gauging the "strength of evidence" against the null hypothesis. It informs investigators that a p-value of 0.001, for example, is stronger than 0.05. However, p-values produced in significance testing are not the probabilities of type I errors as commonly misconceived. For a p-value of 0.05, the chance a treatment does not work is not 5%; rather, it is at least 28.9%.
CONCLUSIONS
A long-overdue effort to understand p-values correctly is much needed. However, in medical research and practice, just banning significance testing and accepting uncertainty are not enough. Researchers, clinicians, and patients alike need to know the probability a treatment will or will not work. Thus, the calibrated p-values (the probability that a treatment does not work) should be reported in research papers.
Topics: Biomedical Research; Humans; Probability; Reproducibility of Results; Research Design; Research Personnel
PubMed: 32580765
DOI: 10.1186/s12874-020-01051-6 -
Journal of Evidence-based Medicine Nov 2018P-values are often calculated when testing hypotheses in quantitative settings, and low P-values are typically used as evidential measures to support research findings... (Review)
Review
P-values are often calculated when testing hypotheses in quantitative settings, and low P-values are typically used as evidential measures to support research findings in published medical research. This article reviews old and new arguments questioning the evidential value of P-values. Critiques of the P-value include that it is confounded, fickle, and overestimates the evidence against the null. P-values may turn out falsely low in studies due to random or systematic errors. Even correctly low P-values do not logically provide support to any hypothesis. Recent studies show low replication rates of significant findings, questioning the dependability of published low P-values. P-values are poor indicators in support of scientific propositions. P-values must be inferred by a thorough understanding of the study's question, design, and conduct. Null hypothesis significance testing will likely remain an important method in quantitative analysis but may be complemented with other statistical techniques that more straightforwardly address the size and precision of an effect or the plausibility that a hypothesis is true.
Topics: Biomedical Research; Evidence-Based Medicine; Humans; Research Design; Statistics as Topic
PubMed: 30398018
DOI: 10.1111/jebm.12319 -
Journal of Investigative Medicine : the... Oct 2016A threshold probability value of 'p≤0.05' is commonly used in clinical investigations to indicate statistical significance. To allow clinicians to better understand... (Review)
Review
A threshold probability value of 'p≤0.05' is commonly used in clinical investigations to indicate statistical significance. To allow clinicians to better understand evidence generated by research studies, this review defines the p value, summarizes the historical origins of the p value approach to hypothesis testing, describes various applications of p≤0.05 in the context of clinical research and discusses the emergence of p≤5×10(-8) and other values as thresholds for genomic statistical analyses. Corresponding issues include a conceptual approach of evaluating whether data do not conform to a null hypothesis (ie, no exposure-outcome association). Importantly, and in the historical context of when p≤0.05 was first proposed, the 1-in-20 chance of a false-positive inference (ie, falsely concluding the existence of an exposure-outcome association) was offered only as a suggestion. In current usage, however, p≤0.05 is often misunderstood as a rigid threshold, sometimes with a misguided 'win' (p≤0.05) or 'lose' (p>0.05) approach. Also, in contemporary genomic studies, a threshold of p≤10(-8) has been endorsed as a boundary for statistical significance when analyzing numerous genetic comparisons for each participant. A value of p≤0.05, or other thresholds, should not be employed reflexively to determine whether a clinical research investigation is trustworthy from a scientific perspective. Rather, and in parallel with conceptual issues of validity and generalizability, quantitative results should be interpreted using a combined assessment of strength of association, p values, CIs, and sample size.
Topics: Confidence Intervals; Genomics; Probability; Sample Size; Superstitions
PubMed: 27489256
DOI: 10.1136/jim-2016-000206 -
Perspectives on Medical Education 2024The use of the p-value in quantitative research, particularly its threshold of "P < 0.05" for determining "statistical significance," has long been a cornerstone of...
The use of the p-value in quantitative research, particularly its threshold of "P < 0.05" for determining "statistical significance," has long been a cornerstone of statistical analysis in research. However, this standard has been increasingly scrutinized for its potential to mislead findings, especially when the practical significance, the number of comparisons, or the suitability of statistical tests are not properly considered. In response to controversy around use of p-values, the American Statistical Association published a statement in 2016 that challenged the research community to abandon the term "statistically significant". This stance has been echoed by leading scientific journals to urge a significant reduction or complete elimination in the reliance on p-values when reporting results. To provide guidance to researchers in health professions education, this paper provides a succinct overview of the ongoing debate regarding the use of p-values and the definition of p-values. It reflects on the controversy by highlighting the common pitfalls associated with p-value interpretation and usage, such as misinterpretation, overemphasis, and false dichotomization between "significant" and "non-significant" results. This paper also outlines specific recommendations for the effective use of p-values in statistical reporting including the importance of reporting effect sizes, confidence intervals, the null hypothesis, and conducting sensitivity analyses for appropriate interpretation. These considerations aim to guide researchers toward a more nuanced and informative use of p-values.
Topics: Humans; Data Interpretation, Statistical; Research Design
PubMed: 38680196
DOI: 10.5334/pme.1324