valid - OpenMD.com Journal Search

Overinterpretation of findings in machine learning prediction model studies in oncology: a systematic review.

Journal of Clinical Epidemiology May 2023

In biomedical research, spin is the overinterpretation of findings, and it is a growing concern. To date, the presence of spin has not been evaluated in prognostic model... (Review)

Summary PubMed Full Text

Review

Authors: Paula Dhiman, Jie Ma, Constanza L Andaur Navarro...

OBJECTIVES

In biomedical research, spin is the overinterpretation of findings, and it is a growing concern. To date, the presence of spin has not been evaluated in prognostic model research in oncology, including studies developing and validating models for individualized risk prediction.

STUDY DESIGN AND SETTING

We conducted a systematic review, searching MEDLINE and EMBASE for oncology-related studies that developed and validated a prognostic model using machine learning published between 1st January, 2019, and 5th September, 2019. We used existing spin frameworks and described areas of highly suggestive spin practices.

RESULTS

We included 62 publications (including 152 developed models; 37 validated models). Reporting was inconsistent between methods and the results in 27% of studies due to additional analysis and selective reporting. Thirty-two studies (out of 36 applicable studies) reported comparisons between developed models in their discussion and predominantly used discrimination measures to support their claims (78%). Thirty-five studies (56%) used an overly strong or leading word in their title, abstract, results, discussion, or conclusion.

CONCLUSION

The potential for spin needs to be considered when reading, interpreting, and using studies that developed and validated prognostic models in oncology. Researchers should carefully report their prognostic model research using words that reflect their actual results and strength of evidence.

Topics: Humans; Medical Oncology; Prognosis; Research; Machine Learning

PubMed: 36935090
DOI: 10.1016/j.jclinepi.2023.03.012

Psychometric properties of leadership scales for health professionals: a systematic review.

Implementation Science : IS Aug 2021

The important role of leaders in the translation of health research is acknowledged in the implementation science literature. However, the accurate measurement of... (Review)

Summary PubMed Full Text PDF

Review

Authors: Melissa A Carlson, Sarah Morris, Fiona Day...

BACKGROUND

The important role of leaders in the translation of health research is acknowledged in the implementation science literature. However, the accurate measurement of leadership traits and behaviours in health professionals has not been directly addressed. This review aimed to identify whether scales which measure leadership traits and behaviours have been found to be reliable and valid for use with health professionals.

METHODS

A systematic review was conducted. MEDLINE, EMBASE, PsycINFO, Cochrane, CINAHL, Scopus, ABI/INFORMIT and Business Source Ultimate were searched to identify publications which reported original research testing the reliability, validity or acceptability of a leadership-related scale with health professionals.

RESULTS

Of 2814 records, a total of 39 studies met the inclusion criteria, from which 33 scales were identified as having undergone some form of psychometric testing with health professionals. The most commonly used was the Implementation Leadership Scale (n = 5) and the Multifactor Leadership Questionnaire (n = 3). Of the 33 scales, the majority of scales were validated in English speaking countries including the USA (n = 15) and Canada (n = 4), but also with some translations and use in Europe and Asia, predominantly with samples of nurses (n = 27) or allied health professionals (n = 10). Only two validation studies included physicians. Content validity and internal consistency were evident for most scales (n = 30 and 29, respectively). Only 20 of the 33 scales were found to satisfy the acceptable thresholds for good construct validity. Very limited testing occurred in relation to test-re-test reliability, responsiveness, acceptability, cross-cultural revalidation, convergent validity, discriminant validity and criterion validity.

CONCLUSIONS

Seven scales may be sufficiently sound to be used with professionals, primarily with nurses. There is an absence of validation of leadership scales with regard to physicians. Given that physicians, along with nurses and allied health professionals have a leadership role in driving the implementation of evidence-based healthcare, this constitutes a clear gap in the psychometric testing of leadership scales for use in healthcare implementation research and practice.

TRIAL REGISTRATION

This review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) (see Additional File 1) (PLoS Medicine. 6:e1000097, 2009) and the associated protocol has been registered with the PROSPERO International Prospective Register of Systematic Reviews (Registration Number CRD42019121544 ).

Topics: Health Personnel; Humans; Leadership; Psychometrics; Reproducibility of Results; Surveys and Questionnaires

PubMed: 34454567
DOI: 10.1186/s13012-021-01141-z

Systematic Review of Fitbit Charge 2 Validation Studies for Exercise Tracking.

Translational Journal of the American... 2022

There are research-grade devices that have been validated to measure either heart rate (HR) by electrocardiography (ECG) with a Polar chest strap, or step count with...

Summary PubMed Full Text PDF

Authors: Crista Irwin, Rebecca Gary

CONTEXT

There are research-grade devices that have been validated to measure either heart rate (HR) by electrocardiography (ECG) with a Polar chest strap, or step count with ACTiGraph accelerometer. However, wearable activity trackers that measure HR and steps concurrently have been tested against research-grade accelerometers and HR monitors with conflicting results. This review examines validation studies of the Fitbit Charge 2 (FBC2) for accuracy in measuring HR and step count and evaluates the device's reliability for use by researchers and clinicians.

DESIGN

This registered review was conducted using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The robvis (risk-of-bias visualization) tool was used to assess the strength of each considered article.

ELIGIBILITY CRITERIA

Eligible articles published between 2018 and 2019 were identified using PubMed, CINHAL, Embase, Cochran, and World of Science databases and hand-searches. All articles were HR and/or step count validation studies for the FBC2 in adult ambulatory populations.

STUDY SELECTION

Eight articles were examined in accordance with the eligibility criteria alignment and agreement among the authors and research librarian.

MAIN OUTCOME MEASURES

Concordance correlation coefficients (CCC) were used to measure agreement between the tracker and criterion devices. Mean absolute percent error (MAPE) was used to average the individual absolute percent errors.

RESULTS

Studies that measured CCC found agreement between the FBC2 and criterion devices ranged between 26% and 92% for HR monitoring, decreasing in accuracy as exercise intensity increased. Inversely, CCC increased from 38% to 99% for step count when exercise intensity increased. HR error between MAPE was 9.21% to 68% and showed more error as exercise intensity increased. Step measurement error MAPE was 12% for healthy persons aged 24-72 years but was reported at 46% in an older population with heart failure.

CONCLUSIONS

Relative agreement with criterion and low-to-moderate MAPE were consistent in most studies reviewed and support validation of the FBC2 to accurately measure HR at low or moderate exercise intensities. However, more investigation controlling testing and measurement congruency is needed to validate step capabilities. The literature supports the validity of the FBC2 to accurately monitor HR, but for step count is inconclusive so the device may not be suitable for recommended use in all populations.

PubMed: 36711436
DOI: 10.1249/tjx.0000000000000215

Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review.

BMJ (Clinical Research Ed.) Oct 2021

To assess the methodological quality of studies on prediction models developed using machine learning techniques across all medical specialties.

Summary PubMed Full Text PDF

Authors: Constanza L Andaur Navarro, Johanna A A Damen, Toshihiko Takada...

OBJECTIVE

To assess the methodological quality of studies on prediction models developed using machine learning techniques across all medical specialties.

DESIGN

Systematic review.

DATA SOURCES

PubMed from 1 January 2018 to 31 December 2019.

ELIGIBILITY CRITERIA

Articles reporting on the development, with or without external validation, of a multivariable prediction model (diagnostic or prognostic) developed using supervised machine learning for individualised predictions. No restrictions applied for study design, data source, or predicted patient related health outcomes.

REVIEW METHODS

Methodological quality of the studies was determined and risk of bias evaluated using the prediction risk of bias assessment tool (PROBAST). This tool contains 21 signalling questions tailored to identify potential biases in four domains. Risk of bias was measured for each domain (participants, predictors, outcome, and analysis) and each study (overall).

RESULTS

152 studies were included: 58 (38%) included a diagnostic prediction model and 94 (62%) a prognostic prediction model. PROBAST was applied to 152 developed models and 19 external validations. Of these 171 analyses, 148 (87%, 95% confidence interval 81% to 91%) were rated at high risk of bias. The analysis domain was most frequently rated at high risk of bias. Of the 152 models, 85 (56%, 48% to 64%) were developed with an inadequate number of events per candidate predictor, 62 handled missing data inadequately (41%, 33% to 49%), and 59 assessed overfitting improperly (39%, 31% to 47%). Most models used appropriate data sources to develop (73%, 66% to 79%) and externally validate the machine learning based prediction models (74%, 51% to 88%). Information about blinding of outcome and blinding of predictors was, however, absent in 60 (40%, 32% to 47%) and 79 (52%, 44% to 60%) of the developed models, respectively.

CONCLUSION

Most studies on machine learning based prediction models show poor methodological quality and are at high risk of bias. Factors contributing to risk of bias include small study size, poor handling of missing data, and failure to deal with overfitting. Efforts to improve the design, conduct, reporting, and validation of such studies are necessary to boost the application of machine learning based prediction models in clinical practice.

SYSTEMATIC REVIEW REGISTRATION

PROSPERO CRD42019161764.

Topics: Bias; Clinical Decision Rules; Data Interpretation, Statistical; Humans; Machine Learning; Models, Statistical; Multivariate Analysis; Risk

PubMed: 34670780
DOI: 10.1136/bmj.n2281

Gait metrics analysis utilizing single-point inertial measurement units: a systematic review.

MHealth 2022

Wearable sensors, particularly accelerometers alone or combined with gyroscopes and magnetometers in an inertial measurement unit (IMU), are a logical alternative for... (Review)

Summary PubMed Full Text PDF

Review

Authors: Ralph Jasper Mobbs, Jordan Perring, Suresh Mahendra Raj...

BACKGROUND

Wearable sensors, particularly accelerometers alone or combined with gyroscopes and magnetometers in an inertial measurement unit (IMU), are a logical alternative for gait analysis. While issues with intrusive and complex sensor placement limit practicality of multi-point IMU systems, single-point IMUs could potentially maximize patient compliance and allow inconspicuous monitoring in daily-living. Therefore, this review aimed to examine the validity of single-point IMUs for gait metrics analysis and identify studies employing them for clinical applications.

METHODS

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses Guidelines (PRISMA) were followed utilizing the following databases: PubMed; MEDLINE; EMBASE and Cochrane. Four databases were systematically searched to obtain relevant journal articles focusing on the measurement of gait metrics using single-point IMU sensors.

RESULTS

A total of 90 articles were selected for inclusion. Critical analysis of studies was conducted, and data collected included: sensor type(s); sensor placement; study aim(s); study conclusion(s); gait metrics and methods; and clinical application. Validation research primarily focuses on lower trunk sensors in healthy cohorts. Clinical applications focus on diagnosis and severity assessment, rehabilitation and intervention efficacy and delineating pathological subjects from healthy controls.

DISCUSSION

This review has demonstrated the validity of single-point IMUs for gait metrics analysis and their ability to assist in clinical scenarios. Further validation for continuous monitoring in daily living scenarios and performance in pathological cohorts is required before commercial and clinical uptake can be expected.

PubMed: 35178440
DOI: 10.21037/mhealth-21-17

Is the Health Behavior in School-Aged Survey Questionnaire Reliable and Valid in Assessing Physical Activity and Sedentary Behavior in Young Populations? A Systematic...

Frontiers in Public Health 2022

Using the self-reported questionnaire to assess the levels of physical activity (PA) and sedentary behavior (SB) has been a widely recognized method in public health and...

Summary PubMed Full Text PDF

Is the Health Behavior in School-Aged Survey Questionnaire Reliable and Valid in Assessing Physical Activity and Sedentary Behavior in Young Populations? A Systematic Review.

Authors: Yang Su, Yanjie Zhang, Si-Tong Chen...

BACKGROUNDS

Using the self-reported questionnaire to assess the levels of physical activity (PA) and sedentary behavior (SB) has been a widely recognized method in public health and epidemiology research fields. The selected items of the Health Behavior in School-aged (HBSC) Survey Questionnaire have been used globally for measurements and assessments in PA and SB of children and adolescents. However, there are no comprehensive and critical reviews to assess the quality of studies on reliability and validity of selected items for PA and SB measurement and assessment derived from the HBSC. Thus, this review aimed to critically assess the quality of those studies and summary evidence for future recommendations.

METHODS

A systematic review protocol was used to search potentially eligible studies on assessing reliability and validity of PA and SB measures of the HBSC questionnaire. electronically academic databases were used. The information on the reliability and validity of the PA and SB measures were extracted and evaluated with well-recognized criteria or assessment tools.

RESULTS

After a literature search, six studies were included in this review. The reliability of PA measures of the HBSC questionnaire showed a moderate agreement while the reliability of SB measures showed a great variation across the different items in the different subgroups. The validity of the PA measures had acceptable performance, whereas no studies assess the validity of the SB measures. The included studies all had quality weaknesses on reliability or validity analysis.

CONCLUSIONS

The PA and SB measures of the HBSC questionnaires were reliable in assessing PA and SB among adolescents. However, a little evidence showed that PA measures are partially valid in assessing PA, but no evidence confirmed the validity of SB measures. The included studies all had methodological weaknesses in examining the reliability and validity of the PA and SB measures, which should be addressed in the future. Further studies are encouraged to use a more standardized study design to examine the reliability and validity of the PA and SB measures in more young populations.

Topics: Adolescent; Child; Exercise; Health Behavior; Humans; Reproducibility of Results; Sedentary Behavior; Surveys and Questionnaires

PubMed: 35419332
DOI: 10.3389/fpubh.2022.729641

Assessment of 24-hour physical behaviour in adults via wearables: a systematic review of validation studies under laboratory conditions.

The International Journal of Behavioral... Jun 2023

Wearable technology is used by consumers and researchers worldwide for continuous activity monitoring in daily life. Results of high-quality laboratory-based validation... (Review)

Summary PubMed Full Text PDF

Review

Authors: Marco Giurgiu, Sascha Ketelhut, Claudia Kubica...

BACKGROUND

Wearable technology is used by consumers and researchers worldwide for continuous activity monitoring in daily life. Results of high-quality laboratory-based validation studies enable us to make a guided decision on which study to rely on and which device to use. However, reviews in adults that focus on the quality of existing laboratory studies are missing.

METHODS

We conducted a systematic review of wearable validation studies with adults. Eligibility criteria were: (i) study under laboratory conditions with humans (age ≥ 18 years); (ii) validated device outcome must belong to one dimension of the 24-hour physical behavior construct (i.e., intensity, posture/activity type, and biological state); (iii) study protocol must include a criterion measure; (iv) study had to be published in a peer-reviewed English language journal. Studies were identified via a systematic search in five electronic databases as well as back- and forward citation searches. The risk of bias was assessed based on the QUADAS-2 tool with eight signaling questions.

RESULTS

Out of 13,285 unique search results, 545 published articles between 1994 and 2022 were included. Most studies (73.8% (N = 420)) validated an intensity measure outcome such as energy expenditure; only 14% (N = 80) and 12.2% (N = 70) of studies validated biological state or posture/activity type outcomes, respectively. Most protocols validated wearables in healthy adults between 18 and 65 years. Most wearables were only validated once. Further, we identified six wearables (i.e., ActiGraph GT3X+, ActiGraph GT9X, Apple Watch 2, Axivity AX3, Fitbit Charge 2, Fitbit, and GENEActiv) that had been used to validate outcomes from all three dimensions, but none of them were consistently ranked with moderate to high validity. Risk of bias assessment resulted in 4.4% (N = 24) of all studies being classified as "low risk", while 16.5% (N = 90) were classified as "some concerns" and 79.1% (N = 431) as "high risk".

CONCLUSION

Laboratory validation studies of wearables assessing physical behaviour in adults are characterized by low methodological quality, large variability in design, and a focus on intensity. Future research should more strongly aim at all components of the 24-hour physical behaviour construct, and strive for standardized protocols embedded in a validation framework.

Topics: Humans; Adult; Adolescent; Wearable Electronic Devices; Fitness Trackers; Monitoring, Physiologic; Posture; Sedentary Behavior

PubMed: 37291598
DOI: 10.1186/s12966-023-01473-7

Prediction Models for Osteoporotic Fractures Risk: A Systematic Review and Critical Appraisal.

Aging and Disease Jul 2022

Osteoporotic fractures (OF) are a global public health problem currently. Many risk prediction models for OF have been developed, but their performance and... (Review)

Summary PubMed Full Text PDF

Review

Authors: Xuemei Sun, Yancong Chen, Yinyan Gao...

Osteoporotic fractures (OF) are a global public health problem currently. Many risk prediction models for OF have been developed, but their performance and methodological quality are unclear. We conducted this systematic review to summarize and critically appraise the OF risk prediction models. Three databases were searched until April 2021. Studies developing or validating multivariable models for OF risk prediction were considered eligible. Used the prediction model risk of bias assessment tool to appraise the risk of bias and applicability of included models. All results were narratively summarized and described. A total of 68 studies describing 70 newly developed prediction models and 138 external validations were included. Most models were explicitly developed (n=31, 44%) and validated (n=76, 55%) only for female. Only 22 developed models (31%) were externally validated. The most validated tool was Fracture Risk Assessment Tool. Overall, only a few models showed outstanding (n=3, 1%) or excellent (n=32, 15%) prediction discrimination. Calibration of developed models (n=25, 36%) or external validation models (n=33, 24%) were rarely assessed. No model was rated as low risk of bias, mostly because of an insufficient number of cases and inappropriate assessment of calibration. There are a certain number of OF risk prediction models. However, few models have been thoroughly internally validated or externally validated (with calibration being unassessed for most of the models), and all models showed methodological shortcomings. Instead of developing completely new models, future research is suggested to validate, improve, and analyze the impact of existing models.

PubMed: 35855348
DOI: 10.14336/AD.2021.1206

Development, woman-centricity and psychometric properties of maternity patient-reported experience measures: a systematic review.

American Journal of Obstetrics &... Oct 2023

Valid and reliable maternity patient-reported experience measures are critical to understanding women's experiences of care. They can support clinical practice, health... (Review)

Summary PubMed Full Text

Review

Authors: Claudia Bull, Alayna Carrandi, Valerie Slavin...

OBJECTIVE

Valid and reliable maternity patient-reported experience measures are critical to understanding women's experiences of care. They can support clinical practice, health service and system performance measurement, and research. The aim of this review is to identify and critically appraise the risk of bias, woman-centricity (content validity), and psychometric properties of maternity patient-reported experience measures published in the scientific literature.

DATA SOURCES

MEDLINE, CINAHL Plus, PsycINFO, and Embase were systematically searched for relevant records between January 1, 2010 and July 10, 2021.

STUDY ELIGIBILITY CRITERIA

We searched for articles describing the instrument development of maternity patient-reported experience measures and measurement properties associated with instrument validity and reliability testing. Articles that described patient-reported experience measures developed outside of the maternity context and articles that did not contribute to the instruments' development, content validation, and/or psychometric evaluation were excluded.

METHODS

Included articles underwent risk of bias, content validity, and psychometric properties assessments in line with the COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) guidance. Patient-reported experience measure results were summarized according to language subgroups. An overall recommendation for use was determined for each patient-reported experience measure language subgroup.

RESULTS

A total of 54 studies reported on the development and psychometric evaluation of 25 maternity patient-reported experience measures, grouped into 45 language subgroups. The quality of evidence underpinning the instruments' development was generally poor. Only 2 (4.4%) patient-reported experience measures reported sufficient content validity, and only 1 (2.2%) received a level "A" recommendation, required for real-world use.

CONCLUSION

Maternity patient-reported experience measures demonstrated poor-quality evidence for their measurement properties and insufficient detail about content validity. Future maternity patient-reported experience measure development needs to prioritize women's involvement in deciding what is relevant, comprehensive, and comprehensible to measure. Improving the content validity of maternity patient-reported experience measures will improve overall validity and reliability and facilitate real-world practice improvements. Standardized patient-reported experience measure implementation also needs to be prioritized to support advancements in clinical practice for women.

PubMed: 37517609
DOI: 10.1016/j.ajogmf.2023.101102