-
American Journal of Epidemiology Feb 2021Measures of information and surprise, such as the Shannon information value (S value), quantify the signal present in a stream of noisy data. We illustrate the use of...
Measures of information and surprise, such as the Shannon information value (S value), quantify the signal present in a stream of noisy data. We illustrate the use of such information measures in the context of interpreting P values as compatibility indices. S values help communicate the limited information supplied by conventional statistics and cast a critical light on cutoffs used to judge and construct those statistics. Misinterpretations of statistics may be reduced by interpreting P values and interval estimates using compatibility concepts and S values instead of "significance" and "confidence."
Topics: Confidence Intervals; Data Interpretation, Statistical; Epidemiologic Methods; Humans; Uncertainty
PubMed: 32648906
DOI: 10.1093/aje/kwaa136 -
Global Epidemiology Dec 2022Misinterpretations of -values and 95% confidence intervals are ubiquitous in medical research. Specifically, the terms significance or confidence, extensively used in...
Misinterpretations of -values and 95% confidence intervals are ubiquitous in medical research. Specifically, the terms significance or confidence, extensively used in medical papers, ignore biases and violations of statistical assumptions and hence should be called overconfidence terms. In this paper, we present the compatibility view of -values and confidence intervals; the P-value is interpreted as an index of compatibility between data and the model, including the test hypothesis and background assumptions, whereas a confidence interval is interpreted as the range of parameter values that are compatible with the data under background assumptions. We also suggest the use of a surprisal measure, often referred to as the S-value, a novel metric that transforms the -value, for gauging compatibility in terms of an intuitive experiment of coin tossing.
PubMed: 37637018
DOI: 10.1016/j.gloepi.2022.100085 -
European Journal of Endocrinology Sep 2019P values should not merely be used to categorize results into significant and non-significant. This practice disregards clinical relevance, confounds non-significance...
P values should not merely be used to categorize results into significant and non-significant. This practice disregards clinical relevance, confounds non-significance with no effect and underestimates the likelihood of false-positive results. Better than to use the P value as a dichotomizing instrument, the P values and the confidence intervals around effect estimates can be used to put research findings in a context, thereby taking clinical relevance but also uncertainty genuinely into account.
Topics: Data Interpretation, Statistical; False Positive Reactions; Humans; Probability; Uncertainty
PubMed: 31330499
DOI: 10.1530/EJE-19-0531 -
Patterns (New York, N.Y.) Dec 2023Since the 18th century, the p value has been an important part of hypothesis-based scientific investigation. As statistical and data science engines accelerate,... (Review)
Review
Since the 18th century, the p value has been an important part of hypothesis-based scientific investigation. As statistical and data science engines accelerate, questions emerge: to what extent are scientific discoveries based on p values reliable and reproducible? Should one adjust the significance level or find alternatives for the p value? Inspired by these questions and everlasting attempts to address them, here, we provide a systematic examination of the p value from its roles and merits to its misuses and misinterpretations. For the latter, we summarize modest recommendations to handle them. In parallel, we present the Bayesian alternatives for seeking evidence and discuss the pooling of p values from multiple studies and datasets. Overall, we argue that the p value and hypothesis testing form a useful probabilistic decision-making mechanism, facilitating causal inference, feature selection, and predictive modeling, but that the interpretation of the p value must be contextual, considering the scientific question, experimental design, and statistical principles.
PubMed: 38106615
DOI: 10.1016/j.patter.2023.100878 -
Saudi Journal of Anaesthesia 2023The derivation and interpretation of P values derived from inferential testing remain somewhat vague and ambiguous in the minds of some... (Review)
Review
UNLABELLED
The derivation and interpretation of P values derived from inferential testing remain somewhat vague and ambiguous in the minds of some researchers/editors/reviewers/readers. The British polymath Fisher famously averred: "the value for which = 0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant." This sometimes leads to an almost reductio ad absurdum mindset with an automatic discardment of studies with results where > 0.05. It must be remembered that results may be negatively impacted by myriad factors that may be out of the researcher/s control, such as small sample sizes, small effects, bias, and random error. This paper briefly reviews the historical events leading to the acceptance of ≤ 0.05 for statistical significance, the rationale behind the null hypothesis (H), the meaning of (and the potential for Type 1 and 2 Errors), α, β, the possibility of using non-0.05 cut-offs when studies are "trending toward statistical significance," and the importance of including confidence intervals (CIs) in results. values are vital but must be tempered by judicial consideration of CI and study design. P is a probability spectrum and not simply a binary significant/non-significant statistical metric.
MESH
95% confidence interval, biostatistics, value.
PubMed: 37601497
DOI: 10.4103/sja.sja_223_23 -
Pediatric Emergency Care Jul 2021Single-sided (1-tailed) and double-sided (2-tailed) probabilities are products of statistical tests that can be crucial to drawing accurate conclusions in scientific... (Review)
Review
UNLABELLED
Single-sided (1-tailed) and double-sided (2-tailed) probabilities are products of statistical tests that can be crucial to drawing accurate conclusions in scientific studies. In a review of articles published in issues of Pediatric Emergency Care from 2020, we identified 2 where single-sided versus double-sided probability issues potentially reversed a conclusion of study investigators. The purpose of this study is to describe single-sided versus double-sided probability issues found in Pediatric Emergency Care 2020 articles to increase awareness surrounding these issues.
METHODS
This study involved a review of all articles from 2020 issues of the Pediatric Emergency Care journal, examining whether P values between and including the values 0.05 and 0.10, were characterized as not significant when, in fact, they resulted from a double-sided test and arguably should have been halved to yield significant single-sided probabilities less than or equal to 0.05.
RESULTS
Two such studies were identified. In the first study, researchers concluded that their intervention resulted in "no statistically significant improvement," citing a P value of 0.08, but if a single-sided P value was used, it would have been 0.04 and the authors would have instead concluded that their intervention resulted in significant improvement. In the second study, researchers measured resuscitation times in pediatric and adult manikin simulations. They concluded no difference, citing a P value of 0.088, but if a single-sided P value was used, it would have been 0.044, and the authors would have instead concluded that the resuscitation times took longer in the pediatric simulation.
CONCLUSIONS
These articles demonstrate how single-sided versus double-sided probability issues can cause researchers to draw inaccurate conclusions. As such, we would urge that this be more rigorously evaluated when the P values are between 0.05 and 0.10.
Topics: Adult; Child; Computer Simulation; Humans; Probability; Resuscitation
PubMed: 34116549
DOI: 10.1097/PEC.0000000000002477 -
The Journal of Arthroplasty Oct 2022The results of statistical tests in orthopedic studies are typically reported using P-values. If a P-value is smaller than the pre-determined level of significance (eg,... (Review)
Review
The results of statistical tests in orthopedic studies are typically reported using P-values. If a P-value is smaller than the pre-determined level of significance (eg, < .05), the null hypothesis is rejected in support of the alternative. This automaticity in interpreting statistical results without consideration of the power of the study has been denounced over the years by statisticians, since it can potentially lead to misinterpretation of the study conclusions. In this paper, we review fundamental misconceptions and misinterpretations of P-values and power, along with their connection with confidence intervals, and we provide guidelines to orthopedic researchers for evaluating and reporting study results. We provide real-world orthopedic examples to illustrate the main concepts. Please visit the followinghttps://youtu.be/bdPU4luYmF0for videos that explain the highlights of the paper in practical terms.
Topics: Biomedical Research; Humans; Orthopedics; Statistics as Topic
PubMed: 36162927
DOI: 10.1016/j.arth.2022.05.026 -
European Journal of Nutrition Dec 2021This article discusses the variability and randomness of p values, the most widely used currency of evidence in nutritional and health studies. One implication of this,...
This article discusses the variability and randomness of p values, the most widely used currency of evidence in nutritional and health studies. One implication of this, the importance of always testing interaction terms when subgroups are examined and presented separately is also discussed.
PubMed: 33585951
DOI: 10.1007/s00394-021-02498-z -
Knee Surgery, Sports Traumatology,... Oct 2022Due to its frequent misuse, the p value has become a point of contention in the research community. In this editorial, we seek to clarify some of the common...
Due to its frequent misuse, the p value has become a point of contention in the research community. In this editorial, we seek to clarify some of the common misconceptions about p values and the hazardous implications associated with misunderstanding this commonly used statistical concept. This article will discuss issues related to p value interpretation in addition to problems such as p-hacking and statistical fragility; we will also offer some thoughts on addressing these issues. The aim of this editorial is to provide clarity around the concept of statistical significance for those attempting to increase their statistical literacy in Orthopedic research.
Topics: Humans; Orthopedics
PubMed: 35920843
DOI: 10.1007/s00167-022-07083-3 -
Emergency Medicine Australasia : EMA Feb 2022Language that implies a conclusion not supported by the evidence is common in the medical literature. The hypothesis of the present study was that medical journal... (Observational Study)
Observational Study
OBJECTIVE
Language that implies a conclusion not supported by the evidence is common in the medical literature. The hypothesis of the present study was that medical journal publications are more likely to use misleading language for the interpretation of a demonstrated null (i.e. chance or not statistically significant) effect than a demonstrated real (i.e. statistically significant) effect.
METHODS
This was an observational study of the medical literature with a systematic sampling method. Articles published in The Journal of the American Medical Association, The Lancet and The New England Journal of Medicine over the last two decades were eligible. The language used around the P-value was assessed for misleadingness (i.e. either suggesting an effect existed when a real effect did not exist or vice versa).
RESULTS
There were 228 unique manuscripts examined, containing 400 statements interpreting a P-value proximate to 0.05. The P-value was between 0.036 and 0.050 for 303 (75.8%) statements and between 0.050 and 0.064 for 97 (24.3%) statements. Forty-four (11%) of the statements were misleading. There were 40 (41.2%) false-positive sentences, implying statistical significance when the P-value was >0.05, and four (1.3%) false-negative sentences, implying no statistical significance when the P-value <0.05 (relative risk 31.2; 95% confidence interval 11.5-85.1; P < 0.0001). The proportion of included manuscripts containing at least one misleading sentence was 16.2% (95% confidence interval 12.0-21.6).
CONCLUSIONS
Among a random selection of sentences in prestigious journals describing P-values close to 0.05, 1 in 10 are misleading (n = 44, 11%) and this is more prevalent when the P-values are above 0.05 compared to below 0.05. Caution is advised for researchers, clinicians and editors to align with the context and purpose of P-values.
Topics: Humans; Probability; Publishing; Research Design; United States
PubMed: 34355494
DOI: 10.1111/1742-6723.13831