-
JAMA Network Open Sep 2023Artificial intelligence (AI) has gained considerable attention in health care, yet concerns have been raised around appropriate methods and fairness. Current AI...
IMPORTANCE
Artificial intelligence (AI) has gained considerable attention in health care, yet concerns have been raised around appropriate methods and fairness. Current AI reporting guidelines do not provide a means of quantifying overall quality of AI research, limiting their ability to compare models addressing the same clinical question.
OBJECTIVE
To develop a tool (APPRAISE-AI) to evaluate the methodological and reporting quality of AI prediction models for clinical decision support.
DESIGN, SETTING, AND PARTICIPANTS
This quality improvement study evaluated AI studies in the model development, silent, and clinical trial phases using the APPRAISE-AI tool, a quantitative method for evaluating quality of AI studies across 6 domains: clinical relevance, data quality, methodological conduct, robustness of results, reporting quality, and reproducibility. These domains included 24 items with a maximum overall score of 100 points. Points were assigned to each item, with higher points indicating stronger methodological or reporting quality. The tool was applied to a systematic review on machine learning to estimate sepsis that included articles published until September 13, 2019. Data analysis was performed from September to December 2022.
MAIN OUTCOMES AND MEASURES
The primary outcomes were interrater and intrarater reliability and the correlation between APPRAISE-AI scores and expert scores, 3-year citation rate, number of Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) low risk-of-bias domains, and overall adherence to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement.
RESULTS
A total of 28 studies were included. Overall APPRAISE-AI scores ranged from 33 (low quality) to 67 (high quality). Most studies were moderate quality. The 5 lowest scoring items included source of data, sample size calculation, bias assessment, error analysis, and transparency. Overall APPRAISE-AI scores were associated with expert scores (Spearman ρ, 0.82; 95% CI, 0.64-0.91; P < .001), 3-year citation rate (Spearman ρ, 0.69; 95% CI, 0.43-0.85; P < .001), number of QUADAS-2 low risk-of-bias domains (Spearman ρ, 0.56; 95% CI, 0.24-0.77; P = .002), and adherence to the TRIPOD statement (Spearman ρ, 0.87; 95% CI, 0.73-0.94; P < .001). Intraclass correlation coefficient ranges for interrater and intrarater reliability were 0.74 to 1.00 for individual items, 0.81 to 0.99 for individual domains, and 0.91 to 0.98 for overall scores.
CONCLUSIONS AND RELEVANCE
In this quality improvement study, APPRAISE-AI demonstrated strong interrater and intrarater reliability and correlated well with several study quality measures. This tool may provide a quantitative approach for investigators, reviewers, editors, and funding organizations to compare the research quality across AI studies for clinical decision support.
Topics: Humans; Artificial Intelligence; Decision Support Systems, Clinical; Reproducibility of Results; Machine Learning; Clinical Relevance
PubMed: 37747733
DOI: 10.1001/jamanetworkopen.2023.35377 -
Npj Mental Health Research Nov 2023Assessing mental health disorders and determining treatment can be difficult for a number of reasons, including access to healthcare providers. Assessments and... (Review)
Review
Assessing mental health disorders and determining treatment can be difficult for a number of reasons, including access to healthcare providers. Assessments and treatments may not be continuous and can be limited by the unpredictable nature of psychiatric symptoms. Machine-learning models using data collected in a clinical setting can improve diagnosis and treatment. Studies have used speech, text, and facial expression analysis to identify depression. Still, more research is needed to address challenges such as the need for multimodality machine-learning models for clinical use. We conducted a review of studies from the past decade that utilized speech, text, and facial expression analysis to detect depression, as defined by the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), using the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guideline. We provide information on the number of participants, techniques used to assess clinical outcomes, speech-eliciting tasks, machine-learning algorithms, metrics, and other important discoveries for each study. A total of 544 studies were examined, 264 of which satisfied the inclusion criteria. A database has been created containing the query results and a summary of how different features are used to detect depression. While machine learning shows its potential to enhance mental health disorder evaluations, some obstacles must be overcome, especially the requirement for more transparent machine-learning models for clinical purposes. Considering the variety of datasets, feature extraction techniques, and metrics used in this field, guidelines have been provided to collect data and train machine-learning models to guarantee reproducibility and generalizability across different contexts.
PubMed: 38609509
DOI: 10.1038/s44184-023-00040-z -
Journal of Clinical Medicine Aug 2023Sepsis, a life-threatening infection-induced inflammatory condition, has significant global health impacts. Timely detection is crucial for improving patient outcomes as... (Review)
Review
BACKGROUND
Sepsis, a life-threatening infection-induced inflammatory condition, has significant global health impacts. Timely detection is crucial for improving patient outcomes as sepsis can rapidly progress to severe forms. The application of machine learning (ML) and deep learning (DL) to predict sepsis using electronic health records (EHRs) has gained considerable attention for timely intervention.
METHODS
PubMed, IEEE Xplore, Google Scholar, and Scopus were searched for relevant studies. All studies that used ML/DL to detect or early-predict the onset of sepsis in the adult population using EHRs were considered. Data were extracted and analyzed from all studies that met the criteria and were also evaluated for their quality.
RESULTS
This systematic review examined 1942 articles, selecting 42 studies while adhering to strict criteria. The chosen studies were predominantly retrospective (n = 38) and spanned diverse geographic settings, with a focus on the United States. Different datasets, sepsis definitions, and prevalence rates were employed, necessitating data augmentation. Heterogeneous parameter utilization, diverse model distribution, and varying quality assessments were observed. Longitudinal data enabled early sepsis prediction, and quality criteria fulfillment varied, with inconsistent funding-article quality correlation.
CONCLUSIONS
This systematic review underscores the significance of ML/DL methods for sepsis detection and early prediction through EHR data.
PubMed: 37685724
DOI: 10.3390/jcm12175658 -
Sensors (Basel, Switzerland) Sep 2023The integration of wearable sensor technology and machine learning algorithms has significantly transformed the field of intelligent medical rehabilitation. These... (Review)
Review
The integration of wearable sensor technology and machine learning algorithms has significantly transformed the field of intelligent medical rehabilitation. These innovative technologies enable the collection of valuable movement, muscle, or nerve data during the rehabilitation process, empowering medical professionals to evaluate patient recovery and predict disease development more efficiently. This systematic review aims to study the application of wearable sensor technology and machine learning algorithms in different disease rehabilitation training programs, obtain the best sensors and algorithms that meet different disease rehabilitation conditions, and provide ideas for future research and development. A total of 1490 studies were retrieved from two databases, the Web of Science and IEEE Xplore, and finally 32 articles were selected. In this review, the selected papers employ different wearable sensors and machine learning algorithms to address different disease rehabilitation problems. Our analysis focuses on the types of wearable sensors employed, the application of machine learning algorithms, and the approach to rehabilitation training for different medical conditions. It summarizes the usage of different sensors and compares different machine learning algorithms. It can be observed that the combination of these two technologies can optimize the disease rehabilitation process and provide more possibilities for future home rehabilitation scenarios. Finally, the present limitations and suggestions for future developments are presented in the study.
Topics: Humans; Algorithms; Databases, Factual; Intelligence; Machine Learning; Wearable Electronic Devices
PubMed: 37765724
DOI: 10.3390/s23187667 -
International Journal of Medical... Aug 2023Acute respiratory diseases are a leading cause of morbidity and mortality in children. Cough is a common symptom of acute respiratory diseases and the sound of cough can... (Review)
Review
BACKGROUND
Acute respiratory diseases are a leading cause of morbidity and mortality in children. Cough is a common symptom of acute respiratory diseases and the sound of cough can be indicative of the respiratory disease. However, cough sound assessment in routine clinical practice is limited to human perception and the skills of the clinician. Objective cough sound evaluation has the potential to aid clinicians in acute respiratory disease diagnosis. In this systematic review, we assess and summarize the predictive ability of machine learning algorithms in analyzing cough sounds of acute respiratory diseases in the pediatric population.
METHOD
Our systematic search of the Scopus, Medline, and Embase databases on 25 January 2023 identified six articles meeting the inclusion criteria. Quality assessment of the included studies was performed using the checklist for the assessment of medical artificial intelligence.
RESULTS
Our analysis shows variability in the input to the machine learning algorithms, such as the use of various cough sound features and combining cough sound features with clinical features. The use of the machine learning algorithms also varies from conventional algorithms, such as logistic regression and support vector machine, to deep learning techniques, such as convolutional neural networks. The classification accuracy for the detection of bronchiolitis, croup, pertussis, and pneumonia across five articles is in the range of 82-96%. However, a significant drop is observed in the detection accuracy for bronchiolitis and pneumonia in the remaining article.
CONCLUSION
The number of articles is limited but, in general, the predictive ability of cough sound classification algorithms in childhood acute respiratory diseases shows promise.
Topics: Child; Humans; Cough; Artificial Intelligence; Algorithms; Pneumonia; Bronchiolitis; Machine Learning
PubMed: 37224643
DOI: 10.1016/j.ijmedinf.2023.105093 -
Critical Reviews in Food Science and... 2024Neural network (i.e. deep learning, NN)-based data analysis techniques have been listed as a pivotal opportunity to protect the integrity and safety of the global food... (Review)
Review
Neural network (i.e. deep learning, NN)-based data analysis techniques have been listed as a pivotal opportunity to protect the integrity and safety of the global food supply chain and forecast $11.2 billion in agriculture markets. As a general-purpose data analytic tool, NN has been applied in several areas of food science, such as food recognition, food supply chain security and omics analysis, and so on. Therefore, given the rapid emergence of NN applications in food safety, this review aims to provide a comprehensive overview of the NN application in food analysis for the first time, focusing on domain-specific applications in food analysis by introducing fundamental methodology, reviewing recent and notable progress, and discussing challenges and potential pitfalls. NN demonstrated that it has a bright future through effective collaboration between food specialist and the broader community in the food field, for example, superiority in food recognition, sensory evaluation, pattern recognition of spectroscopy and chromatography. However, major challenges impeded NN extension including void in the food scientist-friendly interface software package, incomprehensible model behavior, multi-source heterogeneous data, and so on. The breakthrough from other fields proved NN has the potential to offer a revolution in the immediate future.
Topics: Neural Networks, Computer; Humans; Food Analysis; Food Safety; Food Technology; Food Supply; Deep Learning
PubMed: 36322538
DOI: 10.1080/10408398.2022.2139217 -
Journal of Medical Systems Jan 2024Ischemic stroke is a serious disease posing significant threats to human health and life, with the highest absolute and relative risks of a poor prognosis following the... (Review)
Review
Ischemic stroke is a serious disease posing significant threats to human health and life, with the highest absolute and relative risks of a poor prognosis following the first occurrence, and more than 90% of strokes are attributable to modifiable risk factors. Currently, machine learning (ML) is widely used for the prediction of ischemic stroke outcomes. By identifying risk factors, predicting the risk of poor prognosis and thus developing personalized treatment plans, it effectively reduces the probability of poor prognosis, leading to more effective secondary prevention. This review includes 41 studies since 2018 that used ML algorithms to build prognostic prediction models for ischemic stroke, transient ischemic attack (TIA), and acute ischemic stroke (AIS). We analyzed in detail the risk factors used in these studies, the sources and processing methods of the required data, the model building and validation, and their application in different prediction time windows. The results indicate that among the included studies, the top five risk factors in terms of frequency were cardiovascular diseases, age, sex, national institutes of health stroke scale (NIHSS) score, and diabetes. Furthermore, 64% of the studies used single-center data, 65% of studies using imbalanced data did not perform data balancing, 88% of the studies did not utilize external validation datasets for model validation, and 72% of the studies did not provide explanations for their models. Addressing these issues is crucial for enhancing the credibility and effectiveness of the research, consequently improving the development and implementation of secondary prevention measures.
Topics: United States; Humans; Ischemic Stroke; Secondary Prevention; Stroke; Risk Factors; Machine Learning
PubMed: 38165495
DOI: 10.1007/s10916-023-02020-4 -
International Journal of Medical... Jul 2024Human Emotion Recognition (HER) has been a popular field of study in the past years. Despite the great progresses made so far, relatively little attention has been paid... (Review)
Review
BACKGROUND
Human Emotion Recognition (HER) has been a popular field of study in the past years. Despite the great progresses made so far, relatively little attention has been paid to the use of HER in autism. People with autism are known to face problems with daily social communication and the prototypical interpretation of emotional responses, which are most frequently exerted via facial expressions. This poses significant practical challenges to the application of regular HER systems, which are normally developed for and by neurotypical people.
OBJECTIVE
This study reviews the literature on the use of HER systems in autism, particularly with respect to sensing technologies and machine learning methods, as to identify existing barriers and possible future directions.
METHODS
We conducted a systematic review of articles published between January 2011 and June 2023 according to the 2020 PRISMA guidelines. Manuscripts were identified through searching Web of Science and Scopus databases. Manuscripts were included when related to emotion recognition, used sensors and machine learning techniques, and involved children with autism, young, or adults.
RESULTS
The search yielded 346 articles. A total of 65 publications met the eligibility criteria and were included in the review.
CONCLUSIONS
Studies predominantly used facial expression techniques as the emotion recognition method. Consequently, video cameras were the most widely used devices across studies, although a growing trend in the use of physiological sensors was observed lately. Happiness, sadness, anger, fear, disgust, and surprise were most frequently addressed. Classical supervised machine learning techniques were primarily used at the expense of unsupervised approaches or more recent deep learning models. Studies focused on autism in a broad sense but limited efforts have been directed towards more specific disorders of the spectrum. Privacy or security issues were seldom addressed, and if so, at a rather insufficient level of detail.
Topics: Humans; Machine Learning; Emotions; Autistic Disorder; Facial Expression; Child
PubMed: 38723429
DOI: 10.1016/j.ijmedinf.2024.105469 -
JMIR Medical Informatics May 2024With the increasing availability of data, computing resources, and easier-to-use software libraries, machine learning (ML) is increasingly used in disease detection and... (Review)
Review
BACKGROUND
With the increasing availability of data, computing resources, and easier-to-use software libraries, machine learning (ML) is increasingly used in disease detection and prediction, including for Parkinson disease (PD). Despite the large number of studies published every year, very few ML systems have been adopted for real-world use. In particular, a lack of external validity may result in poor performance of these systems in clinical practice. Additional methodological issues in ML design and reporting can also hinder clinical adoption, even for applications that would benefit from such data-driven systems.
OBJECTIVE
To sample the current ML practices in PD applications, we conducted a systematic review of studies published in 2020 and 2021 that used ML models to diagnose PD or track PD progression.
METHODS
We conducted a systematic literature review in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines in PubMed between January 2020 and April 2021, using the following exact string: "Parkinson's" AND ("ML" OR "prediction" OR "classification" OR "detection" or "artificial intelligence" OR "AI"). The search resulted in 1085 publications. After a search query and review, we found 113 publications that used ML for the classification or regression-based prediction of PD or PD-related symptoms.
RESULTS
Only 65.5% (74/113) of studies used a holdout test set to avoid potentially inflated accuracies, and approximately half (25/46, 54%) of the studies without a holdout test set did not state this as a potential concern. Surprisingly, 38.9% (44/113) of studies did not report on how or if models were tuned, and an additional 27.4% (31/113) used ad hoc model tuning, which is generally frowned upon in ML model optimization. Only 15% (17/113) of studies performed direct comparisons of results with other models, severely limiting the interpretation of results.
CONCLUSIONS
This review highlights the notable limitations of current ML systems and techniques that may contribute to a gap between reported performance in research and the real-life applicability of ML models aiming to detect and predict diseases such as PD.
PubMed: 38771237
DOI: 10.2196/50117 -
BMC Psychiatry Sep 2023Perinatal depression (PND) is a significant contributor to maternal morbidity globally. Recognized as a major cause of poor infant development, epidemiological and... (Meta-Analysis)
Meta-Analysis
Perinatal depression (PND) is a significant contributor to maternal morbidity globally. Recognized as a major cause of poor infant development, epidemiological and interventional research on it has increased over the last decade. Recently, studies have pointed out that PND is a heterogeneous condition, with variability in its phenotypes, rather than a homogenous latent entity and a concrete diagnosis, as previously conceptualized in psychometric literature and diagnostic systems. Therefore, it is pertinent that researchers recognize this to progress in elucidating its aetiology and developing efficacious interventions.This systematic review is conducted in accordance with the Meta-analysis of observational studies in epidemiology (MOOSE). It aims to provide an updated and comprehensive account of research on heterogeneity in phenotypes of PND and its implications in research, public health, and clinical practice. It provides a synthesis and quality assessment of studies reporting heterogeneity in PND using cutting-edge statistical techniques and machine learning algorithms. After reporting the phenotypes of PND, based on heterogeneous trajectories and symptom profiles, it also elucidates the risk factors associated with severe forms of PND, followed by robust evidence for adverse child outcomes. Furthermore, recommendations are made to improve public health and clinical practice in screening, diagnosis, and treatment of PND.
Topics: Female; Pregnancy; Humans; Depression; Depressive Disorder; Algorithms; Machine Learning; Phenotype; Observational Studies as Topic
PubMed: 37667216
DOI: 10.1186/s12888-023-05121-z