-
Diagnostics (Basel, Switzerland) Jul 2021Over the past decade, convolutional neural networks (CNN) have shown very competitive performance in medical image analysis tasks, such as disease classification, tumor...
Over the past decade, convolutional neural networks (CNN) have shown very competitive performance in medical image analysis tasks, such as disease classification, tumor segmentation, and lesion detection. CNN has great advantages in extracting local features of images. However, due to the locality of convolution operation, it cannot deal with long-range relationships well. Recently, transformers have been applied to computer vision and achieved remarkable success in large-scale datasets. Compared with natural images, multi-modal medical images have explicit and important long-range dependencies, and effective multi-modal fusion strategies can greatly improve the performance of deep models. This prompts us to study transformer-based structures and apply them to multi-modal medical images. Existing transformer-based network architectures require large-scale datasets to achieve better performance. However, medical imaging datasets are relatively small, which makes it difficult to apply pure transformers to medical image analysis. Therefore, we propose TransMed for multi-modal medical image classification. TransMed combines the advantages of CNN and transformer to efficiently extract low-level features of images and establish long-range dependencies between modalities. We evaluated our model on two datasets, parotid gland tumors classification and knee injury classification. Combining our contributions, we achieve an improvement of 10.1% and 1.9% in average accuracy, respectively, outperforming other state-of-the-art CNN-based models. The results of the proposed method are promising and have tremendous potential to be applied to a large number of medical image analysis tasks. To our best knowledge, this is the first work to apply transformers to multi-modal medical image classification.
PubMed: 34441318
DOI: 10.3390/diagnostics11081384 -
Computer Methods and Programs in... Feb 2023Age-related macular degeneration (AMD) is an eye disease that happens when ageing causes damage to the macula, and it is the leading cause of blindness in developed...
BACKGROUND AND OBJECTIVE
Age-related macular degeneration (AMD) is an eye disease that happens when ageing causes damage to the macula, and it is the leading cause of blindness in developed countries. Screening retinal fundus images allows ophthalmologists to early detect, diagnose and treat this disease; however, the manual interpretation of images is a time-consuming task. In this paper, we aim to study different deep learning methods to diagnose AMD.
METHODS
We have conducted a thorough study of two families of deep learning models based on convolutional neural networks (CNN) and transformer architectures to automatically diagnose referable/non-referable AMD, and grade AMD severity scales (no AMD, early AMD, intermediate AMD, and advanced AMD). In addition, we have analysed several progressive resizing strategies and ensemble methods for convolutional-based architectures to further improve the performance of the models.
RESULTS
As a first result, we have shown that transformer-based architectures obtain considerably worse results than convolutional-based architectures for diagnosing AMD. Moreover, we have built a model for diagnosing referable AMD that yielded a mean F1-score (SD) of 92.60% (0.47), a mean AUROC (SD) of 97.53% (0.40), and a mean weighted kappa coefficient (SD) of 85.28% (0.91); and an ensemble of models for grading AMD severity scales with a mean accuracy (SD) of 82.55% (2.92), and a mean weighted kappa coefficient (SD) of 84.76% (2.45).
CONCLUSIONS
This work shows that working with convolutional based architectures is more suitable than using transformer based models for classifying and grading AMD from retinal fundus images. Furthermore, convolutional models can be improved by means of progressive resizing strategies and ensemble methods.
Topics: Humans; Reproducibility of Results; Macular Degeneration; Neural Networks, Computer; Macula Lutea; Fundus Oculi
PubMed: 36528999
DOI: 10.1016/j.cmpb.2022.107302 -
Frontiers in Digital Health 2022Transformer model architectures have revolutionized the natural language processing (NLP) domain and continue to produce state-of-the-art results in text-based...
Transformer model architectures have revolutionized the natural language processing (NLP) domain and continue to produce state-of-the-art results in text-based applications. Prior to the emergence of transformers, traditional NLP models such as recurrent and convolutional neural networks demonstrated promising utility for patient-level predictions and health forecasting from longitudinal datasets. However, to our knowledge only few studies have explored transformers for predicting clinical outcomes from electronic health record (EHR) data, and in our estimation, none have adequately derived a health-specific tokenization scheme to fully capture the heterogeneity of EHR systems. In this study, we propose a dynamic method for tokenizing both discrete and continuous patient data, and present a transformer-based classifier utilizing a joint embedding space for integrating disparate temporal patient measurements. We demonstrate the feasibility of our clinical AI framework through multi-task ICU patient acuity estimation, where we simultaneously predict six mortality and readmission outcomes. Our longitudinal EHR tokenization and transformer modeling approaches resulted in more accurate predictions compared with baseline machine learning models, which suggest opportunities for future multimodal data integrations and algorithmic support tools using clinical transformer networks.
PubMed: 36440460
DOI: 10.3389/fdgth.2022.1029191 -
Proceedings of Machine Learning Research Jul 2022Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more...
Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more efficiently have identified many strategies to approximate the self-attention matrix, a key module in a Transformer architecture. Effective ideas include various prespecified sparsity patterns, low-rank basis expansions and combinations thereof. In this paper, we revisit classical Multiresolution Analysis (MRA) concepts such as Wavelets, whose potential value in this setting remains underexplored thus far. We show that simple approximations based on empirical feedback and design choices informed by modern hardware and implementation challenges, eventually yield a MRA-based approach for self-attention with an excellent performance profile across most criteria of interest. We undertake an extensive set of experiments and demonstrate that this multi-resolution scheme outperforms most efficient self-attention proposals and is favorable for both short and long sequences. Code is available at https://github.com/mlpen/mra-attention.
PubMed: 37139473
DOI: No ID Found -
Heliyon Oct 2023The upsurge of multifarious endeavors across scientific fields propelled Big Data in the scientific domain. Despite the advancements in management systems, researchers...
The upsurge of multifarious endeavors across scientific fields propelled Big Data in the scientific domain. Despite the advancements in management systems, researchers find that mathematical knowledge remains one of the most challenging to manage due to the latter's inherent heterogeneity. One novel recourse being explored is variable typing where current works remain preliminary and, thus, provide a wide room for contribution. In this study, a primordial attempt to implement the end-to-end Entity Recognition (ER) and Relation Extraction (RE) approach to variable typing was made using the BERT (Bidirectional Encoder Representations from Transformers) model. A micro-dataset was developed for this process. According to our findings, the ER model and RE model, respectively, have Precision of 0.8142 and 0.4919, Recall of 0.7816 and 0.6030, and F1-Scores of 0.7975 and 0.5418. Despite the limited dataset, the models performed at par with values in the literature. This work also discusses the factors affecting this BERT-based approach, giving rise to suggestions for future implementations.
PubMed: 37842594
DOI: 10.1016/j.heliyon.2023.e20505 -
Sensors (Basel, Switzerland) Oct 2023Medical image segmentation is crucial for medical image processing and the development of computer-aided diagnostics. In recent years, deep Convolutional Neural Networks...
Medical image segmentation is crucial for medical image processing and the development of computer-aided diagnostics. In recent years, deep Convolutional Neural Networks (CNNs) have been widely adopted for medical image segmentation and have achieved significant success. UNet, which is based on CNNs, is the mainstream method used for medical image segmentation. However, its performance suffers owing to its inability to capture long-range dependencies. Transformers were initially designed for Natural Language Processing (NLP), and sequence-to-sequence applications have demonstrated the ability to capture long-range dependencies. However, their abilities to acquire local information are limited. Hybrid architectures of CNNs and Transformer, such as TransUNet, have been proposed to benefit from Transformer's long-range dependencies and CNNs' low-level details. Nevertheless, automatic medical image segmentation remains a challenging task due to factors such as blurred boundaries, the low-contrast tissue environment, and in the context of ultrasound, issues like speckle noise and attenuation. In this paper, we propose a new model that combines the strengths of both CNNs and Transformer, with network architectural improvements designed to enrich the feature representation captured by the skip connections and the decoder. To this end, we devised a new attention module called Three-Level Attention (TLA). This module is composed of an Attention Gate (AG), channel attention, and spatial normalization mechanism. The AG preserves structural information, whereas channel attention helps to model the interdependencies between channels. Spatial normalization employs the spatial coefficient of the Transformer to improve spatial attention akin to TransNorm. To further improve the skip connection and reduce the semantic gap, skip connections between the encoder and decoder were redesigned in a manner similar to that of the UNet++ dense connection. Moreover, deep supervision using a side-output channel was introduced, analogous to BASNet, which was originally used for saliency predictions. Two datasets from different modalities, a CT scan dataset and an ultrasound dataset, were used to evaluate the proposed UNet architecture. The experimental results showed that our model consistently improved the prediction performance of the UNet across different datasets.
Topics: Diagnosis, Computer-Assisted; Electric Power Supplies; Image Processing, Computer-Assisted; Natural Language Processing; Neural Networks, Computer
PubMed: 37896682
DOI: 10.3390/s23208589 -
Journal of Cardiothoracic Surgery Feb 2024Artificial intelligence (AI) is a transformative technology with many benefits, but also risks when applied to healthcare and cardiac surgery in particular. Surgeons... (Review)
Review
Artificial intelligence (AI) is a transformative technology with many benefits, but also risks when applied to healthcare and cardiac surgery in particular. Surgeons must be aware of AI and its application through generative pre-trained transformers (GPT/ChatGPT) to fully understand what this offers to clinical care, decision making, training, research and education. Clinicians must appreciate that the advantages and potential for transformative change in practice is balanced by risks typified by validation, ethical challenges and medicolegal concerns. ChatGPT should be seen as a tool to support and enhance the skills of surgeons, rather than a replacement for their experience and judgment. Human oversight and intervention will always be necessary to ensure patient safety and to make complex decisions that may require a refined understanding of individual patient circumstances.
Topics: Humans; Artificial Intelligence; Cardiac Surgical Procedures; Heart Transplantation; Educational Status; Patient Safety
PubMed: 38409178
DOI: 10.1186/s13019-024-02541-0 -
Computers in Biology and Medicine Aug 2022The accurate identification of Drug-Target Interactions (DTIs) remains a critical turning point in drug discovery and understanding of the binding process. Despite...
The accurate identification of Drug-Target Interactions (DTIs) remains a critical turning point in drug discovery and understanding of the binding process. Despite recent advances in computational solutions to overcome the challenges of in vitro and in vivo experiments, most of the proposed in silico-based methods still focus on binary classification, overlooking the importance of characterizing DTIs with unbiased binding strength values to properly distinguish primary interactions from those with off-targets. Moreover, several of these methods usually simplify the entire interaction mechanism, neglecting the joint contribution of the individual units of each binding component and the interacting substructures involved, and have yet to focus on more explainable and interpretable architectures. In this study, we propose an end-to-end Transformer-based architecture for predicting drug-target binding affinity (DTA) using 1D raw sequential and structural data to represent the proteins and compounds. This architecture exploits self-attention layers to capture the biological and chemical context of the proteins and compounds, respectively, and cross-attention layers to exchange information and capture the pharmacological context of the DTIs. The results show that the proposed architecture is effective in predicting DTA, achieving superior performance in both correctly predicting the value of interaction strength and being able to correctly discriminate the rank order of binding strength compared to state-of-the-art baselines. The combination of multiple Transformer-Encoders was found to result in robust and discriminative aggregate representations of the proteins and compounds for binding affinity prediction, in which the addition of a Cross-Attention Transformer-Encoder was identified as an important block for improving the discriminative power of these representations. Overall, this research study validates the applicability of an end-to-end Transformer-based architecture in the context of drug discovery, capable of self-providing different levels of potential DTI and prediction understanding due to the nature of the attention blocks. The data and source code used in this study are available at: https://github.com/larngroup/DTITR.
Topics: Drug Development; Drug Discovery; Proteins; Software
PubMed: 35777085
DOI: 10.1016/j.compbiomed.2022.105772 -
Biomedical Engineering Online Sep 2023Transformers have been widely used in many computer vision challenges and have shown the capability of producing better results than convolutional neural networks... (Review)
Review
Transformers have been widely used in many computer vision challenges and have shown the capability of producing better results than convolutional neural networks (CNNs). Taking advantage of capturing long-range contextual information and learning more complex relations in the image data, Transformers have been used and applied to histopathological image processing tasks. In this survey, we make an effort to present a thorough analysis of the uses of Transformers in histopathological image analysis, covering several topics, from the newly built Transformer models to unresolved challenges. To be more precise, we first begin by outlining the fundamental principles of the attention mechanism included in Transformer models and other key frameworks. Second, we analyze Transformer-based applications in the histopathological imaging domain and provide a thorough evaluation of more than 100 research publications across different downstream tasks to cover the most recent innovations, including survival analysis and prediction, segmentation, classification, detection, and representation. Within this survey work, we also compare the performance of CNN-based techniques to Transformers based on recently published papers, highlight major challenges, and provide interesting future research directions. Despite the outstanding performance of the Transformer-based architectures in a number of papers reviewed in this survey, we anticipate that further improvements and exploration of Transformers in the histopathological imaging domain are still required in the future. We hope that this survey paper will give readers in this field of study a thorough understanding of Transformer-based techniques in histopathological image analysis, and an up-to-date paper list summary will be provided at https://github.com/S-domain/Survey-Paper .
Topics: Image Processing, Computer-Assisted; Learning; Neural Networks, Computer
PubMed: 37749595
DOI: 10.1186/s12938-023-01157-0 -
Diagnostics (Basel, Switzerland) Jan 2023Breast mass identification is a crucial procedure during mammogram-based early breast cancer diagnosis. However, it is difficult to determine whether a breast lump is...
Breast mass identification is a crucial procedure during mammogram-based early breast cancer diagnosis. However, it is difficult to determine whether a breast lump is benign or cancerous at early stages. Convolutional neural networks (CNNs) have been used to solve this problem and have provided useful advancements. However, CNNs focus only on a certain portion of the mammogram while ignoring the remaining and present computational complexity because of multiple convolutions. Recently, vision transformers have been developed as a technique to overcome such limitations of CNNs, ensuring better or comparable performance in natural image classification. However, the utility of this technique has not been thoroughly investigated in the medical image domain. In this study, we developed a transfer learning technique based on vision transformers to classify breast mass mammograms. The area under the receiver operating curve of the new model was estimated as 1 ± 0, thus outperforming the CNN-based transfer-learning models and vision transformer models trained from scratch. The technique can, hence, be applied in a clinical setting, to improve the early diagnosis of breast cancer.
PubMed: 36672988
DOI: 10.3390/diagnostics13020178