-
Indian Journal of Psychological Medicine 2019The calculation of a value in research and especially the use of a threshold to declare the statistical significance of the value have both been challenged in recent... (Review)
Review
The calculation of a value in research and especially the use of a threshold to declare the statistical significance of the value have both been challenged in recent years. There are at least two important reasons for this challenge: research data contain much more meaning than is summarized in a value and its statistical significance, and these two concepts are frequently misunderstood and consequently inappropriately interpreted. This article considers why 5% may be set as a reasonable cut-off for statistical significance, explains the correct interpretation of < 0.05 and other values of P, examines arguments for and against the concept of statistical significance, and suggests other and better ways for analyzing data and for presenting, interpreting, and discussing the results.
PubMed: 31142921
DOI: 10.4103/IJPSYM.IJPSYM_193_19 -
Global Epidemiology Dec 2022Misinterpretations of -values and 95% confidence intervals are ubiquitous in medical research. Specifically, the terms significance or confidence, extensively used in...
Misinterpretations of -values and 95% confidence intervals are ubiquitous in medical research. Specifically, the terms significance or confidence, extensively used in medical papers, ignore biases and violations of statistical assumptions and hence should be called overconfidence terms. In this paper, we present the compatibility view of -values and confidence intervals; the P-value is interpreted as an index of compatibility between data and the model, including the test hypothesis and background assumptions, whereas a confidence interval is interpreted as the range of parameter values that are compatible with the data under background assumptions. We also suggest the use of a surprisal measure, often referred to as the S-value, a novel metric that transforms the -value, for gauging compatibility in terms of an intuitive experiment of coin tossing.
PubMed: 37637018
DOI: 10.1016/j.gloepi.2022.100085 -
American Journal of Epidemiology Feb 2021Measures of information and surprise, such as the Shannon information value (S value), quantify the signal present in a stream of noisy data. We illustrate the use of...
Measures of information and surprise, such as the Shannon information value (S value), quantify the signal present in a stream of noisy data. We illustrate the use of such information measures in the context of interpreting P values as compatibility indices. S values help communicate the limited information supplied by conventional statistics and cast a critical light on cutoffs used to judge and construct those statistics. Misinterpretations of statistics may be reduced by interpreting P values and interval estimates using compatibility concepts and S values instead of "significance" and "confidence."
Topics: Confidence Intervals; Data Interpretation, Statistical; Epidemiologic Methods; Humans; Uncertainty
PubMed: 32648906
DOI: 10.1093/aje/kwaa136 -
Patterns (New York, N.Y.) Dec 2023Since the 18th century, the p value has been an important part of hypothesis-based scientific investigation. As statistical and data science engines accelerate,... (Review)
Review
Since the 18th century, the p value has been an important part of hypothesis-based scientific investigation. As statistical and data science engines accelerate, questions emerge: to what extent are scientific discoveries based on p values reliable and reproducible? Should one adjust the significance level or find alternatives for the p value? Inspired by these questions and everlasting attempts to address them, here, we provide a systematic examination of the p value from its roles and merits to its misuses and misinterpretations. For the latter, we summarize modest recommendations to handle them. In parallel, we present the Bayesian alternatives for seeking evidence and discuss the pooling of p values from multiple studies and datasets. Overall, we argue that the p value and hypothesis testing form a useful probabilistic decision-making mechanism, facilitating causal inference, feature selection, and predictive modeling, but that the interpretation of the p value must be contextual, considering the scientific question, experimental design, and statistical principles.
PubMed: 38106615
DOI: 10.1016/j.patter.2023.100878 -
The Journal of Arthroplasty Oct 2022The results of statistical tests in orthopedic studies are typically reported using P-values. If a P-value is smaller than the pre-determined level of significance (eg,... (Review)
Review
The results of statistical tests in orthopedic studies are typically reported using P-values. If a P-value is smaller than the pre-determined level of significance (eg, < .05), the null hypothesis is rejected in support of the alternative. This automaticity in interpreting statistical results without consideration of the power of the study has been denounced over the years by statisticians, since it can potentially lead to misinterpretation of the study conclusions. In this paper, we review fundamental misconceptions and misinterpretations of P-values and power, along with their connection with confidence intervals, and we provide guidelines to orthopedic researchers for evaluating and reporting study results. We provide real-world orthopedic examples to illustrate the main concepts. Please visit the followinghttps://youtu.be/bdPU4luYmF0for videos that explain the highlights of the paper in practical terms.
Topics: Biomedical Research; Humans; Orthopedics; Statistics as Topic
PubMed: 36162927
DOI: 10.1016/j.arth.2022.05.026 -
European Journal of Nutrition Dec 2021This article discusses the variability and randomness of p values, the most widely used currency of evidence in nutritional and health studies. One implication of this,...
This article discusses the variability and randomness of p values, the most widely used currency of evidence in nutritional and health studies. One implication of this, the importance of always testing interaction terms when subgroups are examined and presented separately is also discussed.
PubMed: 33585951
DOI: 10.1007/s00394-021-02498-z -
The American Statistician 2011P-values are useful statistical measures of evidence against a null hypothesis. In contrast to other statistical estimates, however, their sample-to-sample variability...
P-values are useful statistical measures of evidence against a null hypothesis. In contrast to other statistical estimates, however, their sample-to-sample variability is usually not considered or estimated, and therefore not fully appreciated. Via a systematic study of log-scale p-value standard errors, bootstrap prediction bounds, and reproducibility probabilities for future replicate p-values, we show that p-values exhibit surprisingly large variability in typical data situations. In addition to providing context to discussions about the failure of statistical results to replicate, our findings shed light on the relative value of exact p-values vis-a-vis approximate p-values, and indicate that the use of *, **, and *** to denote levels .05, .01, and .001 of statistical significance in subject-matter journals is about the right level of precision for reporting p-values when judged by widely accepted rules for rounding statistical estimates.
PubMed: 22690019
DOI: 10.1198/tas.2011.10129 -
Journal of Evidence-based Medicine Nov 2018P-values are often calculated when testing hypotheses in quantitative settings, and low P-values are typically used as evidential measures to support research findings... (Review)
Review
P-values are often calculated when testing hypotheses in quantitative settings, and low P-values are typically used as evidential measures to support research findings in published medical research. This article reviews old and new arguments questioning the evidential value of P-values. Critiques of the P-value include that it is confounded, fickle, and overestimates the evidence against the null. P-values may turn out falsely low in studies due to random or systematic errors. Even correctly low P-values do not logically provide support to any hypothesis. Recent studies show low replication rates of significant findings, questioning the dependability of published low P-values. P-values are poor indicators in support of scientific propositions. P-values must be inferred by a thorough understanding of the study's question, design, and conduct. Null hypothesis significance testing will likely remain an important method in quantitative analysis but may be complemented with other statistical techniques that more straightforwardly address the size and precision of an effect or the plausibility that a hypothesis is true.
Topics: Biomedical Research; Evidence-Based Medicine; Humans; Research Design; Statistics as Topic
PubMed: 30398018
DOI: 10.1111/jebm.12319 -
BMC Medical Research Methodology Jun 2020In medical research and practice, the p-value is arguably the most often used statistic and yet it is widely misconstrued as the probability of the type I error, which...
BACKGROUND
In medical research and practice, the p-value is arguably the most often used statistic and yet it is widely misconstrued as the probability of the type I error, which comes with serious consequences. This misunderstanding can greatly affect the reproducibility in research, treatment selection in medical practice, and model specification in empirical analyses. By using plain language and concrete examples, this paper is intended to elucidate the p-value confusion from its root, to explicate the difference between significance and hypothesis testing, to illuminate the consequences of the confusion, and to present a viable alternative to the conventional p-value.
MAIN TEXT
The confusion with p-values has plagued the research community and medical practitioners for decades. However, efforts to clarify it have been largely futile, in part, because intuitive yet mathematically rigorous educational materials are scarce. Additionally, the lack of a practical alternative to the p-value for guarding against randomness also plays a role. The p-value confusion is rooted in the misconception of significance and hypothesis testing. Most, including many statisticians, are unaware that p-values and significance testing formed by Fisher are incomparable to the hypothesis testing paradigm created by Neyman and Pearson. And most otherwise great statistics textbooks tend to cobble the two paradigms together and make no effort to elucidate the subtle but fundamental differences between them. The p-value is a practical tool gauging the "strength of evidence" against the null hypothesis. It informs investigators that a p-value of 0.001, for example, is stronger than 0.05. However, p-values produced in significance testing are not the probabilities of type I errors as commonly misconceived. For a p-value of 0.05, the chance a treatment does not work is not 5%; rather, it is at least 28.9%.
CONCLUSIONS
A long-overdue effort to understand p-values correctly is much needed. However, in medical research and practice, just banning significance testing and accepting uncertainty are not enough. Researchers, clinicians, and patients alike need to know the probability a treatment will or will not work. Thus, the calibrated p-values (the probability that a treatment does not work) should be reported in research papers.
Topics: Biomedical Research; Humans; Probability; Reproducibility of Results; Research Design; Research Personnel
PubMed: 32580765
DOI: 10.1186/s12874-020-01051-6 -
Journal of Investigative Medicine : the... Oct 2016A threshold probability value of 'p≤0.05' is commonly used in clinical investigations to indicate statistical significance. To allow clinicians to better understand... (Review)
Review
A threshold probability value of 'p≤0.05' is commonly used in clinical investigations to indicate statistical significance. To allow clinicians to better understand evidence generated by research studies, this review defines the p value, summarizes the historical origins of the p value approach to hypothesis testing, describes various applications of p≤0.05 in the context of clinical research and discusses the emergence of p≤5×10(-8) and other values as thresholds for genomic statistical analyses. Corresponding issues include a conceptual approach of evaluating whether data do not conform to a null hypothesis (ie, no exposure-outcome association). Importantly, and in the historical context of when p≤0.05 was first proposed, the 1-in-20 chance of a false-positive inference (ie, falsely concluding the existence of an exposure-outcome association) was offered only as a suggestion. In current usage, however, p≤0.05 is often misunderstood as a rigid threshold, sometimes with a misguided 'win' (p≤0.05) or 'lose' (p>0.05) approach. Also, in contemporary genomic studies, a threshold of p≤10(-8) has been endorsed as a boundary for statistical significance when analyzing numerous genetic comparisons for each participant. A value of p≤0.05, or other thresholds, should not be employed reflexively to determine whether a clinical research investigation is trustworthy from a scientific perspective. Rather, and in parallel with conceptual issues of validity and generalizability, quantitative results should be interpreted using a combined assessment of strength of association, p values, CIs, and sample size.
Topics: Confidence Intervals; Genomics; Probability; Sample Size; Superstitions
PubMed: 27489256
DOI: 10.1136/jim-2016-000206