-
Drug Discovery Today May 2024In the past 40 years, therapeutic antibody discovery and development have advanced considerably, with machine learning (ML) offering a promising way to speed up the... (Review)
Review
In the past 40 years, therapeutic antibody discovery and development have advanced considerably, with machine learning (ML) offering a promising way to speed up the process by reducing costs and the number of experiments required. Recent progress in ML-guided antibody design and development (D&D) has been hindered by the diversity of data sets and evaluation methods, which makes it difficult to conduct comparisons and assess utility. Establishing standards and guidelines will be crucial for the wider adoption of ML and the advancement of the field. This perspective critically reviews current practices, highlights common pitfalls and proposes method development and evaluation guidelines for various ML-based techniques in therapeutic antibody D&D. Addressing challenges across the ML process, best practices are recommended for each stage to enhance reproducibility and progress.
PubMed: 38762089
DOI: 10.1016/j.drudis.2024.104025 -
Acta Psychologica May 2024Speech is a complex auditory signal that contains multiple layers of linguistic and non-linguistic structure, it contains both linguistic and social class information....
Speech is a complex auditory signal that contains multiple layers of linguistic and non-linguistic structure, it contains both linguistic and social class information. Perceivers are exquisitely sensitive to this layered structure and extract not only linguistic properties, but also indexical characteristics that provide information about individual talkers and groups of talkers. Social class information involves inferring the speaker's social class or forming an impression of their social status based on their speech. Previous research on social class perception in speech has primarily focused on English, with relatively little research on Chinese. This study examines social class perception in Chinese speech. Study 1 employed class judgment and evaluation tasks with a subjective social class scale as the main measure to examine whether listeners could infer class information from Chinese speech and how their own class background influenced their perception. The results of Study 1 showed that subjects could accurately discriminate between speakers' social classes, but there may be a response bias that overestimates lower-class speakers as upper-class speakers. Study 2 focused on whether the speech of different classes of speakers actually differed on a number of indicators. It was found that the speech of higher class speakers was perceived to be more standardised, more pleasant to listen to and less accent-intensive. Overall, listeners can perceive class information from Chinese speech; different classes of Chinese speech do contain different levels of indexical information. In Chinese language societies, individuals can also judge their class information through the speech, which is consistent with the relevant research results in English.
PubMed: 38761753
DOI: 10.1016/j.actpsy.2024.104324 -
Acta Psychologica May 2024Chinese universities have placed increasing emphasis, on incorporating English as a medium of instruction (EMI) courses to enhance their competitiveness both nationally...
Chinese universities have placed increasing emphasis, on incorporating English as a medium of instruction (EMI) courses to enhance their competitiveness both nationally and internationally. However, the successful implementation of these courses requires learners to acquire content knowledge. To promote EMI courses, specific initiatives and policies have been put in place by universities reflecting a trend in the globalization of higher education and the growing demand for English proficiency in academic settings. Despite the attention given to education in Chinese society, many learners are not adequately prepared to overcome challenges associated with EMI classes. This leads to inefficiencies and drawbacks within the educational system. For instance, some learners struggle with understanding subject matter due to language barriers or encounter difficulties fully engaging with course materials because of language related challenges. This study aims to fill a significant research gap by providing a comprehensive exploration of the main challenges faced by learners in the Chinese EMI context and by highlighting the crucial contribution of EMI courses to China's higher education competitiveness internationally. By identifying factors and variables that can predict success in EMI contexts, particularly by studying learners' academic language-related skills as potential predictors of EMI success, this study offers novel insights into the impact of EMI courses on China's position in the global academic arena. A total of 361 male and female EFL learners participated in the study and completed the EMI Challenges Scale. The data analysis, including descriptive statistics and multiple linear regression, revealed that writing, reading, and listening skills significantly predicted success in EMI. Writing emerged as the best predictor, explaining 28.19 % of the variance in EMI success, followed by reading (19.54 %). The results of this study contribute to current debates on international affairs of higher education by not only illustrating the problems students face in EMI courses, but also providing important suggestions for improving the learning environment on EMI language in China. This leads to ongoing debates about the globalization of higher education and the need for English proficiency in academic settings. In addition, the results of this study are useful for teachers and policy makers; it emphasizes the importance of improving EMI learners' writing, reading, and listening skills in EMI learners to enhance their success in academic settings. In particular, they may consider implementing language development programs and provide EMI learners with additional support to improve their writing, reading and listening skills. This study also highlights the need for appropriate support and resources to address the specific language challenges faced by EMI students, paving the way for better instructional strategies and guidelines in Chinese EMI terms.
PubMed: 38761752
DOI: 10.1016/j.actpsy.2024.104309 -
BMC Medical Research Methodology May 2024Smoking is a critical risk factor responsible for over eight million annual deaths worldwide. It is essential to obtain information on smoking habits to advance research...
BACKGROUND
Smoking is a critical risk factor responsible for over eight million annual deaths worldwide. It is essential to obtain information on smoking habits to advance research and implement preventive measures such as screening of high-risk individuals. In most countries, including Denmark, smoking habits are not systematically recorded and at best documented within unstructured free-text segments of electronic health records (EHRs). This would require researchers and clinicians to manually navigate through extensive amounts of unstructured data, which is one of the main reasons that smoking habits are rarely integrated into larger studies. Our aim is to develop machine learning models to classify patients' smoking status from their EHRs.
METHODS
This study proposes an efficient natural language processing (NLP) pipeline capable of classifying patients' smoking status and providing explanations for the decisions. The proposed NLP pipeline comprises four distinct components, which are; (1) considering preprocessing techniques to address abbreviations, punctuation, and other textual irregularities, (2) four cutting-edge feature extraction techniques, i.e. Embedding, BERT, Word2Vec, and Count Vectorizer, employed to extract the optimal features, (3) utilization of a Stacking-based Ensemble (SE) model and a Convolutional Long Short-Term Memory Neural Network (CNN-LSTM) for the identification of smoking status, and (4) application of a local interpretable model-agnostic explanation to explain the decisions rendered by the detection models. The EHRs of 23,132 patients with suspected lung cancer were collected from the Region of Southern Denmark during the period 1/1/2009-31/12/2018. A medical professional annotated the data into 'Smoker' and 'Non-Smoker' with further classifications as 'Active-Smoker', 'Former-Smoker', and 'Never-Smoker'. Subsequently, the annotated dataset was used for the development of binary and multiclass classification models. An extensive comparison was conducted of the detection performance across various model architectures.
RESULTS
The results of experimental validation confirm the consistency among the models. However, for binary classification, BERT method with CNN-LSTM architecture outperformed other models by achieving precision, recall, and F1-scores between 97% and 99% for both Never-Smokers and Active-Smokers. In multiclass classification, the Embedding technique with CNN-LSTM architecture yielded the most favorable results in class-specific evaluations, with equal performance measures of 97% for Never-Smoker and measures in the range of 86 to 89% for Active-Smoker and 91-92% for Never-Smoker.
CONCLUSION
Our proposed NLP pipeline achieved a high level of classification performance. In addition, we presented the explanation of the decision made by the best performing detection model. Future work will expand the model's capabilities to analyze longer notes and a broader range of categories to maximize its utility in further research and screening applications.
Topics: Humans; Denmark; Electronic Health Records; Natural Language Processing; Smoking; Machine Learning; Female; Male; Middle Aged; Neural Networks, Computer
PubMed: 38760718
DOI: 10.1186/s12874-024-02231-4 -
Clinical Proteomics May 2024COVID-19 is a complex, multi-system disease with varying severity and symptoms. Identifying changes in critically ill COVID-19 patients' proteomes enables a better...
BACKGROUND
COVID-19 is a complex, multi-system disease with varying severity and symptoms. Identifying changes in critically ill COVID-19 patients' proteomes enables a better understanding of markers associated with susceptibility, symptoms, and treatment. We performed plasma antibody microarray and machine learning analyses to identify novel proteins of COVID-19.
METHODS
A case-control study comparing the concentration of 2000 plasma proteins in age- and sex-matched COVID-19 inpatients, non-COVID-19 sepsis controls, and healthy control subjects. Machine learning was used to identify a unique proteome signature in COVID-19 patients. Protein expression was correlated with clinically relevant variables and analyzed for temporal changes over hospitalization days 1, 3, 7, and 10. Expert-curated protein expression information was analyzed with Natural language processing (NLP) to determine organ- and cell-specific expression.
RESULTS
Machine learning identified a 28-protein model that accurately differentiated COVID-19 patients from ICU non-COVID-19 patients (accuracy = 0.89, AUC = 1.00, F1 = 0.89) and healthy controls (accuracy = 0.89, AUC = 1.00, F1 = 0.88). An optimal nine-protein model (PF4V1, NUCB1, CrkL, SerpinD1, Fen1, GATA-4, ProSAAS, PARK7, and NET1) maintained high classification ability. Specific proteins correlated with hemoglobin, coagulation factors, hypertension, and high-flow nasal cannula intervention (P < 0.01). Time-course analysis of the 28 leading proteins demonstrated no significant temporal changes within the COVID-19 cohort. NLP analysis identified multi-system expression of the key proteins, with the digestive and nervous systems being the leading systems.
CONCLUSIONS
The plasma proteome of critically ill COVID-19 patients was distinguishable from that of non-COVID-19 sepsis controls and healthy control subjects. The leading 28 proteins and their subset of 9 proteins yielded accurate classification models and are expressed in multiple organ systems. The identified COVID-19 proteomic signature helps elucidate COVID-19 pathophysiology and may guide future COVID-19 treatment development.
PubMed: 38760690
DOI: 10.1186/s12014-024-09488-3 -
Journal of Medical Internet Research May 2024Artificial intelligence is increasingly being applied to many workflows. Large language models (LLMs) are publicly accessible platforms trained to understand, interact... (Comparative Study)
Comparative Study
Utility of Large Language Models for Health Care Professionals and Patients in Navigating Hematopoietic Stem Cell Transplantation: Comparison of the Performance of ChatGPT-3.5, ChatGPT-4, and Bard.
BACKGROUND
Artificial intelligence is increasingly being applied to many workflows. Large language models (LLMs) are publicly accessible platforms trained to understand, interact with, and produce human-readable text; their ability to deliver relevant and reliable information is also of particular interest for the health care providers and the patients. Hematopoietic stem cell transplantation (HSCT) is a complex medical field requiring extensive knowledge, background, and training to practice successfully and can be challenging for the nonspecialist audience to comprehend.
OBJECTIVE
We aimed to test the applicability of 3 prominent LLMs, namely ChatGPT-3.5 (OpenAI), ChatGPT-4 (OpenAI), and Bard (Google AI), in guiding nonspecialist health care professionals and advising patients seeking information regarding HSCT.
METHODS
We submitted 72 open-ended HSCT-related questions of variable difficulty to the LLMs and rated their responses based on consistency-defined as replicability of the response-response veracity, language comprehensibility, specificity to the topic, and the presence of hallucinations. We then rechallenged the 2 best performing chatbots by resubmitting the most difficult questions and prompting to respond as if communicating with either a health care professional or a patient and to provide verifiable sources of information. Responses were then rerated with the additional criterion of language appropriateness, defined as language adaptation for the intended audience.
RESULTS
ChatGPT-4 outperformed both ChatGPT-3.5 and Bard in terms of response consistency (66/72, 92%; 54/72, 75%; and 63/69, 91%, respectively; P=.007), response veracity (58/66, 88%; 40/54, 74%; and 16/63, 25%, respectively; P<.001), and specificity to the topic (60/66, 91%; 43/54, 80%; and 27/63, 43%, respectively; P<.001). Both ChatGPT-4 and ChatGPT-3.5 outperformed Bard in terms of language comprehensibility (64/66, 97%; 53/54, 98%; and 52/63, 83%, respectively; P=.002). All displayed episodes of hallucinations. ChatGPT-3.5 and ChatGPT-4 were then rechallenged with a prompt to adapt their language to the audience and to provide source of information, and responses were rated. ChatGPT-3.5 showed better ability to adapt its language to nonmedical audience than ChatGPT-4 (17/21, 81% and 10/22, 46%, respectively; P=.03); however, both failed to consistently provide correct and up-to-date information resources, reporting either out-of-date materials, incorrect URLs, or unfocused references, making their output not verifiable by the reader.
CONCLUSIONS
In conclusion, despite LLMs' potential capability in confronting challenging medical topics such as HSCT, the presence of mistakes and lack of clear references make them not yet appropriate for routine, unsupervised clinical use, or patient counseling. Implementation of LLMs' ability to access and to reference current and updated websites and research papers, as well as development of LLMs trained in specialized domain knowledge data sets, may offer potential solutions for their future clinical application.
Topics: Humans; Hematopoietic Stem Cell Transplantation; Health Personnel; Artificial Intelligence; Language
PubMed: 38758582
DOI: 10.2196/54758 -
Data in Brief Jun 2024In the Islamic domain, Hadiths hold significant importance, standing as crucial texts following the Holy Quran. Each Hadith contains three main parts: the ISNAD (chain...
In the Islamic domain, Hadiths hold significant importance, standing as crucial texts following the Holy Quran. Each Hadith contains three main parts: the ISNAD (chain of narrators), TARAF (starting part, often from Prophet Muhammad), and MATN (Hadith content). ISNAD, a chain of narrators involved in transmitting that particular MATN. Hadith scholars determine the trustworthiness of the transmitted MATN by the quality of the ISNAD. The ISNAD's data is available in its original Arabic language, with narrator names transliterated into English. This paper presents the Multi-IsnadSet (MIS), that has great potential to be employed by the social scientist and theologist. A multi-directed graph structure is used to represents the complex interactions among the narrators of Hadith. The MIS dataset represent directed graph which consists of 2092 nodes, representing individual narrators, and 77,797 edges represent the Sanad-Hadith connections. The MIS dataset represents multiple ISNAD of the Hadith based on the Sahih Muslim Hadith book. The dataset was carefully extracted from online multiple Hadith sources using data scraping and web crawling techniques tools, providing extensive Hadith details. Each dataset entry provides a complete view of a specific Hadith, including the original book, Hadith number, textual content (MATN), list of narrators, narrator count, sequence of narrators, and ISNAD count. In this paper, four different tools were designed and constructed for modeling and analyzing narrative network such as python library (NetworkX), powerful graph database Neo4j and two different network analysis tools named Gephi and CytoScape. The Neo4j graph database is used to represent the multi-dimensional graph related data for the ease of extraction and establishing new relationships among nodes. Researchers can use MIS to explore Hadith credibility including classification of Hadiths (Sahih=perfection in the Sanad/Dhaif=imperfection in the Sanad), and narrators (trustworthy/not). Traditionally, scholars have focused on identifying the longest and shortest Sanad between two Narrators, but in MIS, the emphasis shifts to determining the optimum/authentic Sanad, considering narrator qualities. The graph representation of the authentic and manually curated dataset will open ways for the development of computational models that could identify the significance of a chain and a narrator. The dataset allows the researchers to provide Hadith narrators and Hadith ISNAD that could be used in a wide variety of future research studies related to Hadith authentication and rules extraction. Moreover, the dataset encourages cross-disciplinary research, bridging the gap between Islamic studies, artificial intelligence (AI), social network analysis (SNA), and Graph Neural Network (GNN).
PubMed: 38756930
DOI: 10.1016/j.dib.2024.110439 -
Proceedings of the Conference on... Dec 2023Biomedical entity linking (BioEL) is the process of connecting entities referenced in documents to entries in biomedical databases such as the Unified Medical Language...
Biomedical entity linking (BioEL) is the process of connecting entities referenced in documents to entries in biomedical databases such as the Unified Medical Language System (UMLS) or Medical Subject Headings (MeSH). The study objective was to comprehensively evaluate nine recent state-of-the-art biomedical entity linking models under a unified framework. We compare these models along axes of (1) accuracy, (2) speed, (3) ease of use, (4) generalization, and (5) adaptability to new ontologies and datasets. We additionally quantify the impact of various preprocessing choices such as abbreviation detection. Systematic evaluation reveals several notable gaps in current methods. In particular, current methods struggle to correctly link genes and proteins and often have difficulty effectively incorporating context into linking decisions. To expedite future development and baseline testing, we release our unified evaluation framework and all included models on GitHub at https://github.com/davidkartchner/biomedical-entity-linking.
PubMed: 38756862
DOI: 10.18653/v1/2023.emnlp-main.893 -
Frontiers in Human Neuroscience 2024Previous studies underscore the importance of speech input, particularly infant-directed speech (IDS) during one-on-one (1:1) parent-infant interaction, for child...
INTRODUCTION
Previous studies underscore the importance of speech input, particularly infant-directed speech (IDS) during one-on-one (1:1) parent-infant interaction, for child language development. We hypothesize that infants' attention to speech input, specifically IDS, supports language acquisition. In infants, attention and orienting responses are associated with heart rate deceleration. We examined whether individual differences in infants' heart rate measured during 1:1 mother-infant interaction is related to speech input and later language development scores in a longitudinal study.
METHODS
Using a sample of 31 3-month-olds, we assessed infant heart rate during mother-infant face-to-face interaction in a laboratory setting. Multiple measures of speech input were gathered at 3 months of age during naturally occurring interactions at home using the Language ENvironment Analysis (LENA) system. Language outcome measures were assessed in the same children at 30 months of age using the MacArthur-Bates Communicative Development Inventory (CDI).
RESULTS
Two novel findings emerged. First, we found that higher maternal IDS in a 1:1 context at home, as well as more mother-infant conversational turns at home, are associated with a lower heart rate measured during mother-infant social interaction in the laboratory. Second, we found significant associations between infant heart rate during mother-infant interaction in the laboratory at 3 months and prospective language development (CDI scores) at 30 months of age.
DISCUSSION
Considering the current results in conjunction with other converging theoretical and neuroscientific data, we argue that high IDS input in the context of 1:1 social interaction increases infants' attention to speech and that infants' attention to speech in early development fosters their prospective language growth.
PubMed: 38756844
DOI: 10.3389/fnhum.2024.1380075 -
BMJ Surgery, Interventions, & Health... 2024Build the theoretical and evidence-base for a digital platform (map-OR) which delivers intraoperative language tests during awake craniotomy and facilitates...
OBJECTIVES
Build the theoretical and evidence-base for a digital platform (map-OR) which delivers intraoperative language tests during awake craniotomy and facilitates collaborative sharing of brain mapping data.
DESIGN
Mixed methodology study including two scoping reviews, international survey, synthesis of development guiding principles and a risk assessment using failure modes and effects analysis.
SETTING
The two scoping reviews examined the literature published in the English language. International survey was completed by members of awake craniotomy teams from 14 countries.
MAIN OUTCOME MEASURES
Scoping review 1: number of technologies described for language mapping during awake craniotomy. Scoping review 2: barriers and facilitators to adopting novel technology in surgery. International survey: degree of language mapping technology penetration into clinical practice.
RESULTS
A total of 12 research articles describing 6 technologies were included. The technologies required a range of hardware components including portable devices, virtual reality headsets and large integrated multiscreen stacks. The facilitators and barriers of technology adoption in surgery were extracted from 11 studies and mapped onto the 4 Unified Theory of Acceptance and Use of Technology constructs. A total of 37 awake craniotomy teams from 14 countries completed the survey. Of the responses, 20 (54.1%) delivered their language tests digitally, 10 (27.0%) delivered tests using cards and 7 (18.9%) used a combination of both. The most commonly used devices were tablet computers (67.7%; n=21) and the most common software used was Microsoft PowerPoint (60.6%; n=20). Four key risks for the proposed digital platform were identified, the highest risk being a software and internet connectivity failure during surgery.
CONCLUSIONS
This work represents a rigorous and structured approach to the development of a digital platform for standardized intraoperative language testing during awake craniotomy and for collaborative sharing of brain mapping data.
TRIAL REGISTRATION NUMBER
Scoping review protocol registrations in OSF registries (scoping review 1: osf.io/su9xm; scoping review 2: osf.io/x4wsc).
PubMed: 38756704
DOI: 10.1136/bmjsit-2023-000234