-
JB & JS Open Access 2023Publicly available AI language models such as ChatGPT have demonstrated utility in text generation and even problem-solving when provided with clear instructions. Amidst... (Review)
Review
INTRODUCTION
Publicly available AI language models such as ChatGPT have demonstrated utility in text generation and even problem-solving when provided with clear instructions. Amidst this transformative shift, the aim of this study is to assess ChatGPT's performance on the orthopaedic surgery in-training examination (OITE).
METHODS
All 213 OITE 2021 web-based questions were retrieved from the AAOS-ResStudy website (https://www.aaos.org/education/examinations/ResStudy). Two independent reviewers copied and pasted the questions and response options into ChatGPT Plus (version 4.0) and recorded the generated answers. All media-containing questions were flagged and carefully examined. Twelve OITE media-containing questions that relied purely on images (clinical pictures, radiographs, MRIs, CT scans) and could not be rationalized from the clinical presentation were excluded. Cohen's Kappa coefficient was used to examine the agreement of ChatGPT-generated responses between reviewers. Descriptive statistics were used to summarize the performance (% correct) of ChatGPT Plus. The 2021 norm table was used to compare ChatGPT Plus' performance on the OITE to national orthopaedic surgery residents in that same year.
RESULTS
A total of 201 questions were evaluated by ChatGPT Plus. Excellent agreement was observed between raters for the 201 ChatGPT-generated responses, with a Cohen's Kappa coefficient of 0.947. 45.8% (92/201) were media-containing questions. ChatGPT had an average overall score of 61.2% (123/201). Its score was 64.2% (70/109) on non-media questions. When compared to the performance of all national orthopaedic surgery residents in 2021, ChatGPT Plus performed at the level of an average PGY3.
DISCUSSION
ChatGPT Plus is able to pass the OITE with an overall score of 61.2%, ranking at the level of a third-year orthopaedic surgery resident. It provided logical reasoning and justifications that may help residents improve their understanding of OITE cases and general orthopaedic principles. Further studies are still needed to examine their efficacy and impact on long-term learning and OITE/ABOS performance.
PubMed: 38638869
DOI: 10.2106/JBJS.OA.23.00103 -
Molecular Systems Biology Aug 2023The assessment of variant effect predictor (VEP) performance is fraught with biases introduced by benchmarking against clinical observations. In this study, building on...
The assessment of variant effect predictor (VEP) performance is fraught with biases introduced by benchmarking against clinical observations. In this study, building on our previous work, we use independently generated measurements of protein function from deep mutational scanning (DMS) experiments for 26 human proteins to benchmark 55 different VEPs, while introducing minimal data circularity. Many top-performing VEPs are unsupervised methods including EVE, DeepSequence and ESM-1v, a protein language model that ranked first overall. However, the strong performance of recent supervised VEPs, in particular VARITY, shows that developers are taking data circularity and bias issues seriously. We also assess the performance of DMS and unsupervised VEPs for discriminating between known pathogenic and putatively benign missense variants. Our findings are mixed, demonstrating that some DMS datasets perform exceptionally at variant classification, while others are poor. Notably, we observe a striking correlation between VEP agreement with DMS data and performance in identifying clinically relevant variants, strongly supporting the validity of our rankings and the utility of DMS for independent benchmarking.
Topics: Humans; Benchmarking; Mutation; Mutation, Missense; Proteins
PubMed: 37310135
DOI: 10.15252/msb.202211474 -
Cureus Nov 2023Large language models (LLMs) have broad potential applications in medicine, such as aiding with education, providing reassurance to patients, and supporting clinical...
Large language models (LLMs) have broad potential applications in medicine, such as aiding with education, providing reassurance to patients, and supporting clinical decision-making. However, there is a notable gap in understanding their applicability and performance in the surgical domain and how their performance varies across specialties. This paper aims to evaluate the performance of LLMs in answering surgical questions relevant to clinical practice and to assess how this performance varies across different surgical specialties. We used the MedMCQA dataset, a large-scale multi-choice question-answer (MCQA) dataset consisting of clinical questions across all areas of medicine. We extracted the relevant 23,035 surgical questions and submitted them to the popular LLMs Generative Pre-trained Transformers (GPT)-3.5 and GPT-4 (OpenAI OpCo, LLC, San Francisco, CA). Generative Pre-trained Transformer is a large language model that can generate human-like text by predicting subsequent words in a sentence based on the context of the words that come before it. It is pre-trained on a diverse range of texts and can perform a variety of tasks, such as answering questions, without needing task-specific training. The question-answering accuracy of GPT was calculated and compared between the two models and across surgical specialties. Both GPT-3.5 and GPT-4 achieved accuracies of 53.3% and 64.4%, respectively, on surgical questions, showing a statistically significant difference in performance. When compared to their performance on the full MedMCQA dataset, the two models performed differently: GPT-4 performed worse on surgical questions than on the dataset as a whole, while GPT-3.5 showed the opposite pattern. Significant variations in accuracy were also observed across different surgical specialties, with strong performances in anatomy, vascular, and paediatric surgery and worse performances in orthopaedics, ENT, and neurosurgery. Large language models exhibit promising capabilities in addressing surgical questions, although the variability in their performance between specialties cannot be ignored. The lower performance of the latest GPT-4 model on surgical questions relative to questions across all medicine highlights the need for targeted improvements and continuous updates to ensure relevance and accuracy in surgical applications. Further research and continuous monitoring of LLM performance in surgical domains are crucial to fully harnessing their potential and mitigating the risks of misinformation.
PubMed: 38098921
DOI: 10.7759/cureus.48788 -
Science Advances Jul 2023Industrial heterogeneous catalysts show high performance coupled with high material complexity. Deconvoluting this complexity into simplified models eases mechanistic...
Industrial heterogeneous catalysts show high performance coupled with high material complexity. Deconvoluting this complexity into simplified models eases mechanistic studies. However, this approach dilutes the relevance because models are often less performing. We present a holistic approach to reveal the origin of high performance without losing the relevance by pivoting the system at an industrial benchmark. Combining kinetic and structural analyses, we show how the performance of Bi-Mo-Co-Fe-K-O industrial acrolein catalysts occurs. The surface BiMoO ensembles decorated with K supported on β-CoFeMoO perform the propene oxidation, while the K-doped iron molybdate pools electrons to activate dioxygen. The nanostructured vacancy-rich and self-doped bulk phases ensure the charge transport between the two active sites. The features particular to the real system enable the high performance.
PubMed: 37436998
DOI: 10.1126/sciadv.adh5331 -
BMC Ecology and Evolution Jan 2024Abrupt environmental changes can lead to evolutionary shifts in trait evolution. Identifying these shifts is an important step in understanding the evolutionary history...
Abrupt environmental changes can lead to evolutionary shifts in trait evolution. Identifying these shifts is an important step in understanding the evolutionary history of phenotypes. The detection performances of different methods are influenced by many factors, including different numbers of shifts, shift sizes, where a shift occurs on a tree, and the types of phylogenetic structure. Furthermore, the model assumptions are oversimplified, so are likely to be violated in real data, which could cause the methods to fail. We perform simulations to assess the effect of these factors on the performance of shift detection methods. To make the comparisons more complete, we also propose an ensemble variable selection method (R package ELPASO) and compare it with existing methods (R packages [Formula: see text]1ou and PhylogeneticEM). The performances of methods are highly dependent on the selection criterion. [Formula: see text]1ou+pBIC is usually the most conservative method and it performs well when signal sizes are large. [Formula: see text]1ou+BIC is the least conservative method and it performs well when signal sizes are small. The ensemble method provides more balanced choices between those two methods. Moreover, the performances of all methods are heavily impacted by measurement error, tree reconstruction error and shifts in variance.
Topics: Phylogeny; Phenotype
PubMed: 38245667
DOI: 10.1186/s12862-024-02201-w -
Journal of Cachexia, Sarcopenia and... Jun 2024The way physical activity (PA) and sedentary behaviour (SB) independently and interactively modify the age-related decline in physical capacity remains poorly...
BACKGROUND
The way physical activity (PA) and sedentary behaviour (SB) independently and interactively modify the age-related decline in physical capacity remains poorly understood. This cross-sectional study investigated the independent and interactive associations of PA and SB with physical function and performance throughout the adult life course.
METHODS
Data from 499 community-dwelling adults (63% female) aged 20-92 years, involved in the INSPIRE Human Translational Cohort, were used in this cross-sectional study. Daily time spent on moderate-to-vigorous PA (MVPA, min/day) and SB (h/day) was measured with activPAL triaxial accelerometers. Physical function and performance were assessed through the measurement of the 4-m usual gait speed (m/s), handgrip strength (kg), lower-limb strength (isokinetic knee extension torque, N·m), estimated lower-limb power (five-time chair-rise test performance, s) and cardiorespiratory fitness (V̇Omax, mL/kg/min). Confounder-adjusted multiple linear and curvilinear regressions were performed to investigate how MVPA, SB and their interactions were associated with the physical outcomes (all square root-transformed except gait speed) throughout the adulthood spectrum.
RESULTS
Interaction analyses revealed that the combination of higher levels of MVPA with lower levels of SB favourably reshaped the negative relationship between handgrip strength and age (age × SB × MVPA: B = -7E-08, SE = 3E-08, P < 0.05). In addition, higher levels of MVPA were independently associated with an improved age-related profile in gait speed (age × MVPA: B = 3E-06, SE = 1E-06, P < 0.05), chair-rise performance (age × MVPA: B = -9E-05, SE = 4E-05, P < 0.05) and V̇Omax (MVPA at 21 years: B = 3E-02, SE = 7E-03, P < 0.05; age × MVPA: B = -5E-04, SE = 2E-04, P < 0.05). Conversely, the detrimental association of age with lower-limb muscle strength (age × SB: B = -1E-04, SE = 6E-05, P < 0.05) and chair-rise performance (age × SB: B = 1E-05, SE = 7E-06, P < 0.05) was exacerbated with increasing duration of SB, independently of MVPA. Supplementary analyses further revealed that some of these associations were age and sex specific.
CONCLUSIONS
This cross-sectional study demonstrated that reduced sedentary time and increased activity duration were independently and synergistically associated with an attenuated age-related loss in physical capacity. These findings need to be confirmed with longitudinal data but encourage both adopting an active lifestyle and reducing sedentary time as preventive measures against physical aging.
Topics: Humans; Sedentary Behavior; Female; Male; Adult; Middle Aged; Aged; Cross-Sectional Studies; Exercise; Aged, 80 and over; Young Adult; Hand Strength; Muscle Strength
PubMed: 38638004
DOI: 10.1002/jcsm.13457 -
Plants (Basel, Switzerland) Jul 2023Macroevolutionary patterns in the association between plant species and their herbivores result from ecological divergence promoted by, among other factors, plants'...
Macroevolutionary patterns in the association between plant species and their herbivores result from ecological divergence promoted by, among other factors, plants' defenses and nutritional quality, and herbivore adaptations. Here, we assessed the performance of the herbivores , a trophic specialist on , and , a polyphagous pest herbivore, when fed with species of . We used comparative phylogenetics and multivariate methods to examine the effects of species' tropane alkaloids, leaf trichomes, and plant macronutrients on the two herbivores´ performances (amount of food consumed, number of damaged leaves, larval biomass increment, and larval growth efficiency). The results indicate that species of do vary in their general suitability as food host for the two herbivores. Overall, the specialist performs better than the generalist herbivore across species, and performance of both herbivores is associated with suites of plant defenses and nutrient characteristics. Leaf trichomes and major alkaloids of the species are strongly related to herbivores' food consumption and biomass increase. Although hyoscyamine better predicts the key components of the performance of the specialist herbivore, scopolamine better predicts the performance of the generalist; however, only leaf trichomes are implicated in most performance components of the two herbivores. Nutrient quality more widely predicts the performance of the generalist herbivore. The contrasting effects of plant traits and the performances of herbivores could be related to adaptive differences to cope with plant toxins and achieve nutrient balance and evolutionary trade-offs and synergisms between plant traits to deal with a diverse community of herbivores.
PubMed: 37514225
DOI: 10.3390/plants12142611 -
Journal of Stroke and Cerebrovascular... Nov 2023Stroke diagnosis is dependent on lengthy clinical and neuroimaging assessments, while rapid treatment initiation improves clinical outcome. Currently, more sensitive... (Meta-Analysis)
Meta-Analysis
BACKGROUND
Stroke diagnosis is dependent on lengthy clinical and neuroimaging assessments, while rapid treatment initiation improves clinical outcome. Currently, more sensitive biomarker assays of both non-coding RNA- and protein biomarkers have improved their detectability, which could accelerate stroke diagnosis. This systematic review and meta-analysis compares non-coding RNA- with protein biomarkers for their potential to diagnose and differentiate acute stroke (subtypes) in (pre-)hospital settings.
METHODS
We performed a systematic review and meta-analysis of studies evaluating diagnostic performance of non-coding RNA- and protein biomarkers to differentiate acute ischemic and hemorrhagic stroke, stroke mimics, and (healthy) controls. Quality appraisal of individual studies was assessed using the QUADAS-2 tool while the meta-analysis was performed with the sROC approach and by assessing pooled sensitivity and specificity, diagnostic odds ratios, positive- and negative likelihood ratios, and the Youden Index.
SUMMARY OF REVIEW
112 studies were included in the systematic review and 42 studies in the meta-analysis containing 11627 patients with ischemic strokes, 2110 patients with hemorrhagic strokes, 1393 patients with a stroke mimic, and 5548 healthy controls. Proteins (IL-6 and S100 calcium-binding protein B (S100B)) and microRNAs (miR-30a) have similar performance in ischemic stroke diagnosis. To differentiate between ischemic- or hemorrhagic strokes, glial fibrillary acidic protein (GFAP) levels and autoantibodies to the NR2 peptide (NR2aAb, a cleavage product of NMDA neuroreceptors) were best performing whereas no investigated protein or non-coding RNA biomarkers differentiated stroke from stroke mimics with high diagnostic potential.
CONCLUSIONS
Despite sampling time differences, circulating microRNAs (< 24 h) and proteins (< 4,5 h) perform equally well in ischemic stroke diagnosis. GFAP differentiates stroke subtypes, while a biomarker panel of GFAP and UCH-L1 improved the sensitivity and specificity of UCH-L1 alone to differentiate stroke.
Topics: Humans; Hemorrhagic Stroke; Stroke; Biomarkers; Ischemic Stroke; Glial Fibrillary Acidic Protein; MicroRNAs; RNA, Untranslated
PubMed: 37778160
DOI: 10.1016/j.jstrokecerebrovasdis.2023.107388 -
Journal of Sports Science & Medicine Sep 2023The aim of the present study was two-fold: (i) to analyze the progression and variability of swimming performance (from entry times to best performances) in the 50, 100,...
The aim of the present study was two-fold: (i) to analyze the progression and variability of swimming performance (from entry times to best performances) in the 50, 100, and 200 m at the most recent FINA World Championships and (ii) to compare the performance of the Top16, semifinalists, and finalists between all rounds. Swimmers who qualified with the FINA A and B standards for the Budapest 2022 World Championships were considered. A total of 1102 individual performances swimmers were analyzed in freestyle, backstroke, breaststroke, and butterfly events. The data was retrieved from the official open-access websites of OMEGA and FINA. Wilcoxon test was used to compare swimmers' entry times and best performances. Repeated measures ANOVA followed by the Bonferroni post-hoc test were performed to analyze the round-to-round progression. The percentage of improvement and variation in the swimmers' performance was computed between rounds. A negative progression (entry times better than best performance) and a high variability (> 0.69%) were found for most events. The finalists showed a positive progression with a greater improvement (~1%) from the heats to the semifinals. However, the performance progression remained unchanged between the semifinals and finals. The variability tended to decrease between rounds making each round more homogeneous. Coaches and swimmers can use these indicators to prepare a race strategy between rounds.
Topics: Humans; Swimming; Hot Temperature
PubMed: 37711703
DOI: 10.52082/jssm.2023.417