transformer - OpenMD.com Journal Search

D-TrAttUnet: Toward hybrid CNN-transformer architecture for generic and subtle segmentation in medical images.

Computers in Biology and Medicine May 2024

Over the past two decades, machine analysis of medical imaging has advanced rapidly, opening up significant potential for several important medical applications. As...

Summary PubMed Full Text

Authors: Fares Bougourzi, Fadi Dornaika, Cosimo Distante...

Over the past two decades, machine analysis of medical imaging has advanced rapidly, opening up significant potential for several important medical applications. As complicated diseases increase and the number of cases rises, the role of machine-based imaging analysis has become indispensable. It serves as both a tool and an assistant to medical experts, providing valuable insights and guidance. A particularly challenging task in this area is lesion segmentation, a task that is challenging even for experienced radiologists. The complexity of this task highlights the urgent need for robust machine learning approaches to support medical staff. In response, we present our novel solution: the D-TrAttUnet architecture. This framework is based on the observation that different diseases often target specific organs. Our architecture includes an encoder-decoder structure with a composite Transformer-CNN encoder and dual decoders. The encoder includes two paths: the Transformer path and the Encoders Fusion Module path. The Dual-Decoder configuration uses two identical decoders, each with attention gates. This allows the model to simultaneously segment lesions and organs and integrate their segmentation losses. To validate our approach, we performed evaluations on the Covid-19 and Bone Metastasis segmentation tasks. We also investigated the adaptability of the model by testing it without the second decoder in the segmentation of glands and nuclei. The results confirmed the superiority of our approach, especially in Covid-19 infections and the segmentation of bone metastases. In addition, the hybrid encoder showed exceptional performance in the segmentation of glands and nuclei, solidifying its role in modern medical image analysis.

PubMed: 38763066
DOI: 10.1016/j.compbiomed.2024.108590

YOLOX-SwinT algorithm improves the accuracy of AO/OTA classification of intertrochanteric fractures by orthopedic trauma surgeons.

Chinese Journal of Traumatology =... Apr 2024

Intertrochanteric fracture (ITF) classification is crucial for surgical decision-making. However, orthopedic trauma surgeons have shown lower accuracy in ITF...

Summary PubMed Full Text

Authors: Xue-Si Liu, Rui Nie, Ao-Wen Duan...

PURPOSE

Intertrochanteric fracture (ITF) classification is crucial for surgical decision-making. However, orthopedic trauma surgeons have shown lower accuracy in ITF classification than expected. The objective of this study was to utilize an artificial intelligence (AI) method to improve the accuracy of ITF classification.

METHODS

We trained a network called YOLOX-SwinT, which is based on the You Only Look Once X (YOLOX) object detection network with Swin Transformer (SwinT) as the backbone architecture, using 762 radiographic ITF examinations as the training set. Subsequently, we recruited 5 senior orthopedic trauma surgeons (SOTS) and 5 junior orthopedic trauma surgeons (JOTS) to classify the 85 original images in the test set, as well as the images with the prediction results of the network model in sequence. Statistical analysis was performed using the Statistical Package for the Social Sciences (SPSS) 20.0 (IBM Corp., Armonk, NY, USA) to compare the differences among the SOTS, JOTS, SOTS + AI, JOTS + AI, SOTS + JOTS, and SOTS + JOTS + AI groups. All images were classified according to the AO/OTA 2018 classification system by 2 experienced trauma surgeons and verified by another expert in this field. Based on the actual clinical needs, after discussion, we integrated 8 subgroups into 5 new subgroups, and the dataset was divided into training, validation, and test sets by the ratio of 8:1:1.

RESULTS

The mean average precision at the intersection over union (IoU) of 0.5 (mAP50) for subgroup detection reached 90.29%. The classification accuracy values of SOTS, JOTS, SOTS + AI, and JOTS + AI groups were 56.24% ± 4.02%, 35.29% ± 18.07%, 79.53% ± 7.14%, and 71.53% ± 5.22%, respectively. The paired t-test results showed that the difference between the SOTS and SOTS + AI groups was statistically significant, as well as the difference between the JOTS and JOTS + AI groups, and the SOTS + JOTS and SOTS + JOTS + AI groups. Moreover, the difference between the SOTS + JOTS and SOTS + JOTS + AI groups in each subgroup was statistically significant, with all p < 0.05. The independent samples t-test results showed that the difference between the SOTS and JOTS groups was statistically significant, while the difference between the SOTS + AI and JOTS + AI groups was not statistically significant. With the assistance of AI, the subgroup classification accuracy of both SOTS and JOTS was significantly improved, and JOTS achieved the same level as SOTS.

CONCLUSION

In conclusion, the YOLOX-SwinT network algorithm enhances the accuracy of AO/OTA subgroups classification of ITF by orthopedic trauma surgeons.

PubMed: 38762418
DOI: 10.1016/j.cjtee.2024.04.002

ET-Network: A novel efficient transformer deep learning model for automated Urdu handwritten text recognition.

PloS One 2024

Automatic Urdu handwritten text recognition is a challenging task in the OCR industry. Unlike printed text, Urdu handwriting lacks a uniform font and structure. This...

Summary PubMed Full Text PDF

Authors: Ameer Hamza, Shengbing Ren, Usman Saeed...

Automatic Urdu handwritten text recognition is a challenging task in the OCR industry. Unlike printed text, Urdu handwriting lacks a uniform font and structure. This lack of uniformity causes data inconsistencies and recognition issues. Different writing styles, cursive scripts, and limited data make Urdu text recognition a complicated task. Major languages, such as English, have experienced advances in automated recognition, whereas low-resource languages, such as Urdu, still lag. Transformer-based models are promising for automated recognition in high- and low-resource languages such as Urdu. This paper presents a transformer-based method called ET-Network that integrates self-attention into EfficientNet for feature extraction and a transformer for language modeling. The use of self-attention layers in EfficientNet helps to extract global and local features that capture long-range dependencies. These features proceeded into a vanilla transformer to generate text, and a prefix beam search is used for the finest outcome. NUST-UHWR, UPTI2.0, and MMU-OCR-21 are three datasets used to train and test the ET Network for a handwritten Urdu script. The ET-Network improved the character error rate by 4% and the word error rate by 1.55%, while establishing a new state-of-the-art character error rate of 5.27% and a word error rate of 19.09% for Urdu handwritten text.

Topics: Handwriting; Deep Learning; Humans; Language; Pattern Recognition, Automated; Algorithms

PubMed: 38758731
DOI: 10.1371/journal.pone.0302590

Safety and quality of AI chatbots for drug-related inquiries: A real-world comparison with licensed pharmacists.

Digital Health 2024

Pharmacists play a pivotal role in ensuring patients are administered safe and effective medications; however, they encounter obstacles such as elevated workloads and a...

Summary PubMed Full Text PDF

Authors: Yasser Albogami, Almaha Alfakhri, Abdulaziz Alaqil...

INTRODUCTION

Pharmacists play a pivotal role in ensuring patients are administered safe and effective medications; however, they encounter obstacles such as elevated workloads and a scarcity of qualified professionals. Despite the prospective utility of large language models (LLMs), such as Generative Pre-trained Transformers (GPTs), in addressing pharmaceutical inquiries, their applicability in real-world cases remains unexplored.

OBJECTIVE

To evaluate GPT-based chatbots' accuracy in real-world drug-related inquiries, comparing their performance to licensed pharmacists.

METHODS

In this cross-sectional study, authors analyzed real-world drug inquiries from a Drug Information Inquiry Database. Two independent pharmacists evaluated the performance of GPT-based chatbots (GPT-3, GPT-3.5, GPT-4) against human pharmacists using accuracy, detail, and risk of harm criteria. Descriptive statistics described inquiry characteristics. Absolute proportion comparative analyses assessed accuracy, detail, and risk of harm. Stratified analyses were performed for different inquiry types.

RESULTS

Seventy inquiries were included. Most inquiries were received from physicians (41%) and pharmacists (44%). Inquiries type included dosage/administration (34.2%), drug interaction (12.8%) and pregnancy/lactation (15.7%). Majority of inquires included adults (83%) and female patients (54.3%). GPT-4 had 64.3% completely accurate responses, comparable to human pharmacists. GPT-4 and human pharmacists provided sufficiently detailed responses, with GPT-4 offering additional relevant details. Both GPT-4 and human pharmacists delivered 95% safe responses; however, GPT-4 provided proactive risk mitigation information in 70% of the instances, whereas similar information was included in 25.7% of human pharmacists' responses.

CONCLUSION

Our study showcased GPT-4's potential in addressing drug-related inquiries accurately and safely, comparable to human pharmacists. Current GPT-4-based chatbots could support healthcare professionals and foster global health improvements.

PubMed: 38757086
DOI: 10.1177/20552076241253523

A Survey on 3D Skeleton-Based Action Recognition Using Learning Method.

Cyborg and Bionic Systems (Washington,... 2024

Three-dimensional skeleton-based action recognition (3D SAR) has gained important attention within the computer vision community, owing to the inherent advantages...

Summary PubMed Full Text PDF

Authors: Bin Ren, Mengyuan Liu, Runwei Ding...

Three-dimensional skeleton-based action recognition (3D SAR) has gained important attention within the computer vision community, owing to the inherent advantages offered by skeleton data. As a result, a plethora of impressive works, including those based on conventional handcrafted features and learned feature extraction methods, have been conducted over the years. However, prior surveys on action recognition have primarily focused on video or red-green-blue (RGB) data-dominated approaches, with limited coverage of reviews related to skeleton data. Furthermore, despite the extensive application of deep learning methods in this field, there has been a notable absence of research that provides an introductory or comprehensive review from the perspective of deep learning architectures. To address these limitations, this survey first underscores the importance of action recognition and emphasizes the significance of 3-dimensional (3D) skeleton data as a valuable modality. Subsequently, we provide a comprehensive introduction to mainstream action recognition techniques based on 4 fundamental deep architectures, i.e., recurrent neural networks, convolutional neural networks, graph convolutional network, and Transformers. All methods with the corresponding architectures are then presented in a data-driven manner with detailed discussion. Finally, we offer insights into the current largest 3D skeleton dataset, NTU-RGB+D, and its new edition, NTU-RGB+D 120, along with an overview of several top-performing algorithms on these datasets. To the best of our knowledge, this research represents the first comprehensive discussion of deep learning-based action recognition using 3D skeleton data.

PubMed: 38757045
DOI: 10.34133/cbsystems.0100

A step-by-step method for cultural annotation by LLMs.

Frontiers in Artificial Intelligence 2024

Building on the growing body of research highlighting the capabilities of Large Language Models (LLMs) like Generative Pre-trained Transformers (GPT), this paper...

Summary PubMed Full Text PDF

Authors: Edgar Dubourg, Valentin Thouzeau, Nicolas Baumard...

Building on the growing body of research highlighting the capabilities of Large Language Models (LLMs) like Generative Pre-trained Transformers (GPT), this paper presents a structured pipeline for the annotation of cultural (big) data through such LLMs, offering a detailed methodology for leveraging GPT's computational abilities. Our approach provides researchers across various fields with a method for efficient and scalable analysis of cultural phenomena, showcasing the potential of LLMs in the empirical study of human cultures. LLMs proficiency in processing and interpreting complex data finds relevance in tasks such as annotating descriptions of non-industrial societies, measuring the importance of specific themes in stories, or evaluating psychological constructs in texts across societies or historical periods. These applications demonstrate the model's versatility in serving disciplines like cultural anthropology, cultural psychology, cultural history, and cultural sciences at large.

PubMed: 38756758
DOI: 10.3389/frai.2024.1365508

Apriori prediction of chemotherapy response in locally advanced breast cancer patients using CT imaging and deep learning: transformer versus transfer learning.

Frontiers in Oncology 2024

Neoadjuvant chemotherapy (NAC) is a key element of treatment for locally advanced breast cancer (LABC). Predicting the response to NAC for patients with Locally Advanced...

Summary PubMed Full Text PDF

Authors: Amir Moslemi, Laurentius Oscar Osapoetra, Archya Dasgupta...

OBJECTIVE

Neoadjuvant chemotherapy (NAC) is a key element of treatment for locally advanced breast cancer (LABC). Predicting the response to NAC for patients with Locally Advanced Breast Cancer (LABC) before treatment initiation could be beneficial to optimize therapy, ensuring the administration of effective treatments. The objective of the work here was to develop a predictive model to predict tumor response to NAC for LABC using deep learning networks and computed tomography (CT).

MATERIALS AND METHODS

Several deep learning approaches were investigated including ViT transformer and VGG16, VGG19, ResNet-50, Res-Net-101, Res-Net-152, InceptionV3 and Xception transfer learning networks. These deep learning networks were applied on CT images to assess the response to NAC. Performance was evaluated based on balanced_accuracy, accuracy, sensitivity and specificity classification metrics. A ViT transformer was applied to utilize the attention mechanism in order to increase the weight of important part image which leads to better discrimination between classes.

RESULTS

Amongst the 117 LABC patients studied, 82 (70%) had clinical-pathological response and 35 (30%) had no response to NAC. The ViT transformer obtained the best performance range (accuracy = 71 ± 3% to accuracy = 77 ± 4%, specificity = 86 ± 6% to specificity = 76 ± 3%, sensitivity = 56 ± 4% to sensitivity = 52 ± 4%, and balanced_accuracy=69 ± 3% to balanced_accuracy=69 ± 3%) depending on the split ratio of train-data and test-data. Xception network obtained the second best results (accuracy = 72 ± 4% to accuracy = 65 ± 4, specificity = 81 ± 6% to specificity = 73 ± 3%, sensitivity = 55 ± 4% to sensitivity = 52 ± 5%, and balanced_accuracy = 66 ± 5% to balanced_accuracy = 60 ± 4%). The worst results were obtained using VGG-16 transfer learning network.

CONCLUSION

Deep learning networks in conjunction with CT imaging are able to predict the tumor response to NAC for patients with LABC prior to start. A ViT transformer could obtain the best performance, which demonstrated the importance of attention mechanism.

PubMed: 38756659
DOI: 10.3389/fonc.2024.1359148

Application of a hybrid algorithm of LSTM and Transformer based on random search optimization for improving rainfall-runoff simulation.

Scientific Reports May 2024

Flood forecasting using traditional physical hydrology models requires consideration of multiple complex physical processes including the spatio-temporal distribution of...

Summary PubMed Full Text

Authors: Wenzhong Li, Chengshuai Liu, Caihong Hu...

Flood forecasting using traditional physical hydrology models requires consideration of multiple complex physical processes including the spatio-temporal distribution of rainfall, the spatial heterogeneity of watershed sub-surface characteristics, and runoff generation and routing behaviours. Data-driven models offer novel solutions to these challenges, though they are hindered by difficulties in hyperparameter selection and a decline in prediction stability as the lead time extends. This study introduces a hybrid model, the RS-LSTM-Transformer, which combines Random Search (RS), Long Short-Term Memory networks (LSTM), and the Transformer architecture. Applied to the typical Jingle watershed in the middle reaches of the Yellow River, this model utilises rainfall and runoff data from basin sites to simulate flood processes, and its outcomes are compared against those from RS-LSTM, RS-Transformer, RS-BP, and RS-MLP models. It was evaluated against RS-LSTM, RS-Transformer, RS-BP, and RS-MLP models using the Nash-Sutcliffe Efficiency Coefficient (NSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Bias percentage as metrics. At a 1-h lead time during calibration and validation, the RS-LSTM-Transformer model achieved NSE, RMSE, MAE, and Bias values of 0.970, 14.001m/s, 5.304m/s, 0.501% and 0.953, 14.124m/s, 6.365m/s, 0.523%, respectively. These results demonstrate the model's superior simulation capabilities and robustness, providing more accurate peak flow forecasts as the lead time increases. The study highlights the RS-LSTM-Transformer model's potential in flood forecasting and the advantages of integrating various data-driven approaches for innovative modelling.

PubMed: 38755303
DOI: 10.1038/s41598-024-62127-7

Multi-scale coupled attention for visual object detection.

Scientific Reports May 2024

The application of deep neural network has achieved remarkable success in object detection. However, the network structures should be still evolved consistently and...

Summary PubMed Full Text PDF

Authors: Fei Li, Hongping Yan, Linsu Shi...

The application of deep neural network has achieved remarkable success in object detection. However, the network structures should be still evolved consistently and tuned finely to acquire better performance. This gears to the continuous demands on high performance in those complex scenes, where multi-scale objects to be detected are located here and there. To this end, this paper proposes a network structure called Multi-Scale Coupled Attention (MSCA) under the framework of self-attention learning with methodologies of importance assessment. Architecturally, it consists of a Multi-Scale Coupled Channel Attention (MSCCA) module, and a Multi-Scale Coupled Spatial Attention (MSCSA) module. Specifically, the MSCCA module is developed to achieve the goal of self-attention learning linearly on the multi-scale channels. In parallel, the MSCSA module is constructed to achieve this goal nonlinearly on the multi-scale spatial grids. The MSCCA and MSSCA modules can be connected together into a sequence, which can be used as a plugin to develop end-to-end learning models for object detection. Finally, our proposed network is compared on two public datasets with 13 classical or state-of-the-art models, including the Faster R-CNN, Cascade R-CNN, RetinaNet, SSD, PP-YOLO, YOLO v3, YOLO v5, YOLO v7, YOLOX, DETR, conditional DETR, UP-DETR and FP-DETR. Comparative experimental results with numerical scores, the ablation study, and the performance behaviour all demonstrate the effectiveness of our proposed model.

PubMed: 38755252
DOI: 10.1038/s41598-024-60897-8

Improved object detection method for unmanned driving based on Transformers.

Frontiers in Neurorobotics 2024

The object detection method serves as the core technology within the unmanned driving perception module, extensively employed for detecting vehicles, pedestrians,...

Summary PubMed Full Text PDF

Authors: Huaqi Zhao, Xiang Peng, Su Wang...

The object detection method serves as the core technology within the unmanned driving perception module, extensively employed for detecting vehicles, pedestrians, traffic signs, and various objects. However, existing object detection methods still encounter three challenges in intricate unmanned driving scenarios: unsatisfactory performance in multi-scale object detection, inadequate accuracy in detecting small objects, and occurrences of false positives and missed detections in densely occluded environments. Therefore, this study proposes an improved object detection method for unmanned driving, leveraging Transformer architecture to address these challenges. First, a multi-scale Transformer feature extraction method integrated with channel attention is used to enhance the network's capability in extracting features across different scales. Second, a training method incorporating Query Denoising with Gaussian decay was employed to enhance the network's proficiency in learning representations of small objects. Third, a hybrid matching method combining Optimal Transport and Hungarian algorithms was used to facilitate the matching process between predicted and actual values, thereby enriching the network with more informative positive sample features. Experimental evaluations conducted on datasets including KITTI demonstrate that the proposed method achieves 3% higher mean Average Precision (mAP) than that of the existing methodologies.

PubMed: 38752022
DOI: 10.3389/fnbot.2024.1342126