-
Journal of Educational Evaluation For... 2021Appropriate sample size calculation and power analysis have become major issues in research and publication processes. However, the complexity and difficulty of... (Review)
Review
Appropriate sample size calculation and power analysis have become major issues in research and publication processes. However, the complexity and difficulty of calculating sample size and power require broad statistical knowledge, there is a shortage of personnel with programming skills, and commercial programs are often too expensive to use in practice. The review article aimed to explain the basic concepts of sample size calculation and power analysis; the process of sample estimation; and how to calculate sample size using G*Power software (latest ver. 3.1.9.7; Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany) with 5 statistical examples. The null and alternative hypothesis, effect size, power, alpha, type I error, and type II error should be described when calculating the sample size or power. G*Power is recommended for sample size and power calculations for various statistical methods (F, t, χ2, Z, and exact tests), because it is easy to use and free. The process of sample estimation consists of establishing research goals and hypotheses, choosing appropriate statistical tests, choosing one of 5 possible power analysis methods, inputting the required variables for analysis, and selecting the “calculate” button. The G*Power software supports sample size and power calculation for various statistical methods (F, t, χ2, z, and exact tests). This software is helpful for researchers to estimate the sample size and to conduct power analysis.
Topics: Humans; Research Design; Sample Size; Software
PubMed: 34325496
DOI: 10.3352/jeehp.2021.18.17 -
Nephron. Clinical Practice 2011The sample size is the number of patients or other experimental units that need to be included in a study to answer the research question. Pre-study calculation of the... (Review)
Review
The sample size is the number of patients or other experimental units that need to be included in a study to answer the research question. Pre-study calculation of the sample size is important; if a sample size is too small, one will not be able to detect an effect, while a sample that is too large may be a waste of time and money. Methods to calculate the sample size are explained in statistical textbooks, but because there are many different formulas available, it can be difficult for investigators to decide which method to use. Moreover, these calculations are prone to errors, because small changes in the selected parameters can lead to large differences in the sample size. This paper explains the basic principles of sample size calculations and demonstrates how to perform such a calculation for a simple study design.
Topics: Clinical Trials as Topic; Humans; Mathematical Concepts; Research Design; Sample Size
PubMed: 21293154
DOI: 10.1159/000322830 -
Social Science & Medicine (1982) Jan 2022To review empirical studies that assess saturation in qualitative research in order to identify sample sizes for saturation, strategies used to assess saturation, and...
OBJECTIVE
To review empirical studies that assess saturation in qualitative research in order to identify sample sizes for saturation, strategies used to assess saturation, and guidance we can draw from these studies.
METHODS
We conducted a systematic review of four databases to identify studies empirically assessing sample sizes for saturation in qualitative research, supplemented by searching citing articles and reference lists.
RESULTS
We identified 23 articles that used empirical data (n = 17) or statistical modeling (n = 6) to assess saturation. Studies using empirical data reached saturation within a narrow range of interviews (9-17) or focus group discussions (4-8), particularly those with relatively homogenous study populations and narrowly defined objectives. Most studies had a relatively homogenous study population and assessed code saturation; the few outliers (e.g., multi-country research, meta-themes, "code meaning" saturation) needed larger samples for saturation.
CONCLUSIONS
Despite varied research topics and approaches to assessing saturation, studies converged on a relatively consistent sample size for saturation for commonly used qualitative research methods. However, these findings apply to certain types of studies (e.g., those with homogenous study populations). These results provide strong empirical guidance on effective sample sizes for qualitative research, which can be used in conjunction with the characteristics of individual studies to estimate an appropriate sample size prior to data collection. This synthesis also provides an important resource for researchers, academic journals, journal reviewers, ethical review boards, and funding agencies to facilitate greater transparency in justifying and reporting sample sizes in qualitative research. Future empirical research is needed to explore how various parameters affect sample sizes for saturation.
Topics: Data Collection; Focus Groups; Humans; Qualitative Research; Research Design; Sample Size
PubMed: 34785096
DOI: 10.1016/j.socscimed.2021.114523 -
Psicothema Nov 2017The robustness of F-test to non-normality has been studied from the 1930s through to the present day. However, this extensive body of research has yielded contradictory...
BACKGROUND
The robustness of F-test to non-normality has been studied from the 1930s through to the present day. However, this extensive body of research has yielded contradictory results, there being evidence both for and against its robustness. This study provides a systematic examination of F-test robustness to violations of normality in terms of Type I error, considering a wide variety of distributions commonly found in the health and social sciences.
METHOD
We conducted a Monte Carlo simulation study involving a design with three groups and several known and unknown distributions. The manipulated variables were: Equal and unequal group sample sizes; group sample size and total sample size; coefficient of sample size variation; shape of the distribution and equal or unequal shapes of the group distributions; and pairing of group size with the degree of contamination in the distribution.
RESULTS
The results showed that in terms of Type I error the F-test was robust in 100% of the cases studied, independently of the manipulated conditions.
Topics: Analysis of Variance; Monte Carlo Method; Sample Size
PubMed: 29048317
DOI: 10.7334/psicothema2016.383 -
Biochemia Medica Feb 2021Calculating the sample size in scientific studies is one of the critical issues as regards the scientific contribution of the study. The sample size critically affects...
Calculating the sample size in scientific studies is one of the critical issues as regards the scientific contribution of the study. The sample size critically affects the hypothesis and the study design, and there is no straightforward way of calculating the effective sample size for reaching an accurate conclusion. Use of a statistically incorrect sample size may lead to inadequate results in both clinical and laboratory studies as well as resulting in time loss, cost, and ethical problems. This review holds two main aims. The first aim is to explain the importance of sample size and its relationship to effect size (ES) and statistical significance. The second aim is to assist researchers planning to perform sample size estimations by suggesting and elucidating available alternative software, guidelines and references that will serve different scientific purposes.
Topics: Data Interpretation, Statistical; Laboratories; Models, Theoretical; Sample Size; Software
PubMed: 33380887
DOI: 10.11613/BM.2021.010502 -
Statistical Methods in Medical Research Aug 2019Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely...
Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of candidate predictors that can be examined. We present an extensive simulation study in which we studied the influence of EPV, events fraction, number of candidate predictors, the correlations and distributions of candidate predictor variables, area under the ROC curve, and predictor effects on out-of-sample predictive performance of prediction models. The out-of-sample performance (calibration, discrimination and probability prediction error) of developed prediction models was studied before and after regression shrinkage and variable selection. The results indicate that EPV does not have a strong relation with metrics of predictive performance, and is not an appropriate criterion for (binary) prediction model development studies. We show that out-of-sample predictive performance can better be approximated by considering the number of predictors, the total sample size and the events fraction. We propose that the development of new sample size criteria for prediction models should be based on these three parameters, and provide suggestions for improving sample size determination.
Topics: Computer Simulation; Humans; Logistic Models; Models, Statistical; Research Design; Sample Size
PubMed: 29966490
DOI: 10.1177/0962280218784726 -
Statistics in Medicine Mar 2019When designing a study to develop a new prediction model with binary or time-to-event outcomes, researchers should ensure their sample size is adequate in terms of the...
When designing a study to develop a new prediction model with binary or time-to-event outcomes, researchers should ensure their sample size is adequate in terms of the number of participants (n) and outcome events (E) relative to the number of predictor parameters (p) considered for inclusion. We propose that the minimum values of n and E (and subsequently the minimum number of events per predictor parameter, EPP) should be calculated to meet the following three criteria: (i) small optimism in predictor effect estimates as defined by a global shrinkage factor of ≥0.9, (ii) small absolute difference of ≤ 0.05 in the model's apparent and adjusted Nagelkerke's R , and (iii) precise estimation of the overall risk in the population. Criteria (i) and (ii) aim to reduce overfitting conditional on a chosen p, and require prespecification of the model's anticipated Cox-Snell R , which we show can be obtained from previous studies. The values of n and E that meet all three criteria provides the minimum sample size required for model development. Upon application of our approach, a new diagnostic model for Chagas disease requires an EPP of at least 4.8 and a new prognostic model for recurrent venous thromboembolism requires an EPP of at least 23. This reinforces why rules of thumb (eg, 10 EPP) should be avoided. Researchers might additionally ensure the sample size gives precise estimates of key predictor effects; this is especially important when key categorical predictors have few events in some categories, as this may substantially increase the numbers required.
Topics: Computer Simulation; Humans; Multivariate Analysis; Regression Analysis; Sample Size; Time
PubMed: 30357870
DOI: 10.1002/sim.7992 -
Biostatistics (Oxford, England) Apr 2020The bootstrap, introduced in Efron (1979. Bootstrap methods: another look at the jackknife. The Annals of Statistics7, 1-26), is a landmark method for quantifying...
The bootstrap, introduced in Efron (1979. Bootstrap methods: another look at the jackknife. The Annals of Statistics7, 1-26), is a landmark method for quantifying variability. It uses sampling with replacement with a sample size equal to that of the original data. We propose the upstrap, which samples with replacement either more or fewer samples than the original sample size. We illustrate the upstrap by solving a hard, but common, sample size calculation problem. The data and code used for the analysis in this article are available on GitHub (2018. https://github.com/ccrainic/upstrap).
Topics: Algorithms; Biostatistics; Data Interpretation, Statistical; Humans; Regression Analysis; Sample Size
PubMed: 30252026
DOI: 10.1093/biostatistics/kxy054 -
Canadian Association of Radiologists... Nov 2019The required training sample size for a particular machine learning (ML) model applied to medical imaging data is often unknown. The purpose of this study was to provide...
PURPOSE
The required training sample size for a particular machine learning (ML) model applied to medical imaging data is often unknown. The purpose of this study was to provide a descriptive review of current sample-size determination methodologies in ML applied to medical imaging and to propose recommendations for future work in the field.
METHODS
We conducted a systematic literature search of articles using Medline and Embase with keywords including "machine learning," "image," and "sample size." The search included articles published between 1946 and 2018. Data regarding the ML task, sample size, and train-test pipeline were collected.
RESULTS
A total of 167 articles were identified, of which 22 were included for qualitative analysis. There were only 4 studies that discussed sample-size determination methodologies, and 18 that tested the effect of sample size on model performance as part of an exploratory analysis. The observed methods could be categorized as pre hoc model-based approaches, which relied on features of the algorithm, or post hoc curve-fitting approaches requiring empirical testing to model and extrapolate algorithm performance as a function of sample size. Between studies, we observed great variability in performance testing procedures used for curve-fitting, model assessment methods, and reporting of confidence in sample sizes.
CONCLUSIONS
Our study highlights the scarcity of research in training set size determination methodologies applied to ML in medical imaging, emphasizes the need to standardize current reporting practices, and guides future work in development and streamlining of pre hoc and post hoc sample size approaches.
Topics: Biomedical Research; Diagnostic Imaging; Humans; Machine Learning; Sample Size
PubMed: 31522841
DOI: 10.1016/j.carj.2019.06.002 -
The International Journal of... Nov 2016The Bland-Altman method has been widely used for assessing agreement between two methods of measurement. However, it remains unsolved about sample size estimation. We...
The Bland-Altman method has been widely used for assessing agreement between two methods of measurement. However, it remains unsolved about sample size estimation. We propose a new method of sample size estimation for Bland-Altman agreement assessment. According to the Bland-Altman method, the conclusion on agreement is made based on the width of the confidence interval for LOAs (limits of agreement) in comparison to predefined clinical agreement limit. Under the theory of statistical inference, the formulae of sample size estimation are derived, which depended on the pre-determined level of α, β, the mean and the standard deviation of differences between two measurements, and the predefined limits. With this new method, the sample sizes are calculated under different parameter settings which occur frequently in method comparison studies, and Monte-Carlo simulation is used to obtain the corresponding powers. The results of Monte-Carlo simulation showed that the achieved powers could coincide with the pre-determined level of powers, thus validating the correctness of the method. The method of sample size estimation can be applied in the Bland-Altman method to assess agreement between two methods of measurement.
Topics: Biometry; Humans; Monte Carlo Method; Reproducibility of Results; Sample Size
PubMed: 27838682
DOI: 10.1515/ijb-2015-0039