-
Journal of Biomedical Informatics Feb 2022Transformer-based pretrained language models (PLMs) have started a new era in modern natural language processing (NLP). These models combine the power of transformers,... (Review)
Review
Transformer-based pretrained language models (PLMs) have started a new era in modern natural language processing (NLP). These models combine the power of transformers, transfer learning, and self-supervised learning (SSL). Following the success of these models in the general domain, the biomedical research community has developed various in-domain PLMs starting from BioBERT to the latest BioELECTRA and BioALBERT models. We strongly believe there is a need for a survey paper that can provide a comprehensive survey of various transformer-based biomedical pretrained language models (BPLMs). In this survey, we start with a brief overview of foundational concepts like self-supervised learning, embedding layer and transformer encoder layers. We discuss core concepts of transformer-based PLMs like pretraining methods, pretraining tasks, fine-tuning methods, and various embedding types specific to biomedical domain. We introduce a taxonomy for transformer-based BPLMs and then discuss all the models. We discuss various challenges and present possible solutions. We conclude by highlighting some of the open issues which will drive the research community to further improve transformer-based BPLMs. The list of all the publicly available transformer-based BPLMs along with their links is provided at https://mr-nlp.github.io/posts/2021/05/transformer-based-biomedical-pretrained-language-models-list/.
Topics: Biomedical Research; Language; Natural Language Processing
PubMed: 34974190
DOI: 10.1016/j.jbi.2021.103982 -
Bioinformatics Advances 2023The transformer-based language models, including vanilla transformer, BERT and GPT-3, have achieved revolutionary breakthroughs in the field of natural language... (Review)
Review
SUMMARY
The transformer-based language models, including vanilla transformer, BERT and GPT-3, have achieved revolutionary breakthroughs in the field of natural language processing (NLP). Since there are inherent similarities between various biological sequences and natural languages, the remarkable interpretability and adaptability of these models have prompted a new wave of their application in bioinformatics research. To provide a timely and comprehensive review, we introduce key developments of transformer-based language models by describing the detailed structure of transformers and summarize their contribution to a wide range of bioinformatics research from basic sequence analysis to drug discovery. While transformer-based applications in bioinformatics are diverse and multifaceted, we identify and discuss the common challenges, including heterogeneity of training data, computational expense and model interpretability, and opportunities in the context of bioinformatics research. We hope that the broader community of NLP researchers, bioinformaticians and biologists will be brought together to foster future research and development in transformer-based language models, and inspire novel bioinformatics applications that are unattainable by traditional methods.
SUPPLEMENTARY INFORMATION
Supplementary data are available at online.
PubMed: 36845200
DOI: 10.1093/bioadv/vbad001 -
ELife Jan 2023Recent developments in deep learning, coupled with an increasing number of sequenced proteins, have led to a breakthrough in life science applications, in particular in... (Review)
Review
Recent developments in deep learning, coupled with an increasing number of sequenced proteins, have led to a breakthrough in life science applications, in particular in protein property prediction. There is hope that deep learning can close the gap between the number of sequenced proteins and proteins with known properties based on lab experiments. Language models from the field of natural language processing have gained popularity for protein property predictions and have led to a new computational revolution in biology, where old prediction results are being improved regularly. Such models can learn useful multipurpose representations of proteins from large open repositories of protein sequences and can be used, for instance, to predict protein properties. The field of natural language processing is growing quickly because of developments in a class of models based on a particular model-the Transformer model. We review recent developments and the use of large-scale Transformer models in applications for predicting protein characteristics and how such models can be used to predict, for example, post-translational modifications. We review shortcomings of other deep learning models and explain how the Transformer models have quickly proven to be a very promising way to unravel information hidden in the sequences of amino acids.
Topics: Deep Learning; Amino Acid Sequence; Amino Acids; Biological Science Disciplines; Language
PubMed: 36651724
DOI: 10.7554/eLife.82819 -
IEEE Transactions on Pattern Analysis... Oct 2023Vision transformers have shown great success on numerous computer vision tasks. However, their central component, softmax attention, prohibits vision transformers from...
Vision transformers have shown great success on numerous computer vision tasks. However, their central component, softmax attention, prohibits vision transformers from scaling up to high-resolution images, due to both the computational complexity and memory footprint being quadratic. Linear attention was introduced in natural language processing (NLP) which reorders the self-attention mechanism to mitigate a similar issue, but directly applying existing linear attention to vision may not lead to satisfactory results. We investigate this problem and point out that existing linear attention methods ignore an inductive bias in vision tasks, i.e., 2D locality. In this article, we propose Vicinity Attention, which is a type of linear attention that integrates 2D locality. Specifically, for each image patch, we adjust its attention weight based on its 2D Manhattan distance from its neighbouring patches. In this case, we achieve 2D locality in a linear complexity where the neighbouring image patches receive stronger attention than far away patches. In addition, we propose a novel Vicinity Attention Block that is comprised of Feature Reduction Attention (FRA) and Feature Preserving Connection (FPC) in order to address the computational bottleneck of linear attention approaches, including our Vicinity Attention, whose complexity grows quadratically with respect to the feature dimension. The Vicinity Attention Block computes attention in a compressed feature space with an extra skip connection to retrieve the original feature distribution. We experimentally validate that the block further reduces computation without degenerating the accuracy. Finally, to validate the proposed methods, we build a linear vision transformer backbone named Vicinity Vision Transformer (VVT). Targeting general vision tasks, we build VVT in a pyramid structure with progressively reduced sequence length. We perform extensive experiments on CIFAR-100, ImageNet-1 k, and ADE20 K datasets to validate the effectiveness of our method. Our method has a slower growth rate in terms of computational overhead than previous transformer-based and convolution-based networks when the input resolution increases. In particular, our approach achieves state-of-the-art image classification accuracy with 50% fewer parameters than previous approaches.
PubMed: 37310842
DOI: 10.1109/TPAMI.2023.3285569 -
IEEE Transactions on Cybernetics Apr 2024Vision transformers (ViTs) are rapidly evolving and are widely used in computer vision. However, high-performance ViTs require many computations, which limit their...
Vision transformers (ViTs) are rapidly evolving and are widely used in computer vision. However, high-performance ViTs require many computations, which limit their further development in the vision field. In this article, a novel evolutionary dual-stream transformer (E-DST) model is proposed to alleviate the computational resource demand problem. A hybrid attention mechanism structure is proposed for a DST model. The DST model uses a dual-branch structure to fuse convolutional and transformer features. Combining the features learned by the transformer and convolution effectively saves model computational resources. In addition, an evolutionary optimizer is proposed to optimize the parameters of the model. The excellent search ability of the evolutionary algorithm is utilized to optimize the transformer model parameters. The convergence of the evolutionary optimizer is proved in this article. In addition, the proposed E-DST model is experimentally compared with a variety of classic models and their deformations based on three datasets. And, the evolutionary optimizer proves its generality in convolutional and recurrent neural networks. The experimental results show that the E-DST model can effectively reduce computational resources and that the evolutionary optimizer can solve large-scale optimization problems. In conclusion, our proposed method is feasible and effective.
PubMed: 36279360
DOI: 10.1109/TCYB.2022.3213537 -
BMC Bioinformatics Nov 2023Galaxy is a web-based open-source platform for scientific analyses. Researchers use thousands of high-quality tools and workflows for their respective analyses in...
BACKGROUND
Galaxy is a web-based open-source platform for scientific analyses. Researchers use thousands of high-quality tools and workflows for their respective analyses in Galaxy. Tool recommender system predicts a collection of tools that can be used to extend an analysis. In this work, a tool recommender system is developed by training a transformer on workflows available on Galaxy Europe and its performance is compared to other neural networks such as recurrent, convolutional and dense neural networks.
RESULTS
The transformer neural network achieves two times faster convergence, has significantly lower model usage (model reconstruction and prediction) time and shows a better generalisation that goes beyond training workflows than the older tool recommender system created using RNN in Galaxy. In addition, the transformer also outperforms CNN and DNN on several key indicators. It achieves a faster convergence time, lower model usage time, and higher quality tool recommendations than CNN. Compared to DNN, it converges faster to a higher precision@k metric (approximately 0.98 by transformer compared to approximately 0.9 by DNN) and shows higher quality tool recommendations.
CONCLUSION
Our work shows a novel usage of transformers to recommend tools for extending scientific workflows. A more robust tool recommendation model, created using a transformer, having significantly lower usage time than RNN and CNN, higher precision@k than DNN, and higher quality tool recommendations than all three neural networks, will benefit researchers in creating scientifically significant workflows and exploratory data analysis in Galaxy. Additionally, the ability to train faster than all three neural networks imparts more scalability for training on larger datasets consisting of millions of tool sequences. Open-source scripts to create the recommendation model are available under MIT licence at https://github.com/anuprulez/galaxy_tool_recommendation_transformers.
Topics: Software; Neural Networks, Computer; Workflow; Data Analysis; Europe
PubMed: 38012574
DOI: 10.1186/s12859-023-05573-w -
IEEE Transactions on Pattern Analysis... Jan 2023Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism. Thanks to its...
Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism. Thanks to its strong representation capabilities, researchers are looking at ways to apply transformer to computer vision tasks. In a variety of visual benchmarks, transformer-based models perform similar to or better than other types of networks such as convolutional and recurrent neural networks. Given its high performance and less need for vision-specific inductive bias, transformer is receiving more and more attention from the computer vision community. In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages. The main categories we explore include the backbone network, high/mid-level vision, low-level vision, and video processing. We also include efficient transformer methods for pushing transformer into real device-based applications. Furthermore, we also take a brief look at the self-attention mechanism in computer vision, as it is the base component in transformer. Toward the end of this paper, we discuss the challenges and provide several further research directions for vision transformers.
PubMed: 35180075
DOI: 10.1109/TPAMI.2022.3152247 -
Journal of Imaging Sep 2023This work presents BlinkLinMulT, a transformer-based framework for eye blink detection. While most existing approaches rely on frame-wise eye state classification,...
This work presents BlinkLinMulT, a transformer-based framework for eye blink detection. While most existing approaches rely on frame-wise eye state classification, recent advancements in transformer-based sequence models have not been explored in the blink detection literature. Our approach effectively combines low- and high-level feature sequences with linear complexity cross-modal attention mechanisms and addresses challenges such as lighting changes and a wide range of head poses. Our work is the first to leverage the transformer architecture for blink presence detection and eye state recognition while successfully implementing an efficient fusion of input features. In our experiments, we utilized several publicly available benchmark datasets (CEW, ZJU, MRL Eye, RT-BENE, EyeBlink8, Researcher's Night, and TalkingFace) to extensively show the state-of-the-art performance and generalization capability of our trained model. We hope the proposed method can serve as a new baseline for further research.
PubMed: 37888303
DOI: 10.3390/jimaging9100196 -
Applied Optics Jan 2022Transformer oil used in oil-filled electrical power transformers aims at insulating, stopping arcing and corona discharge, and dissipating transformer heat. Transformer...
Transformer oil used in oil-filled electrical power transformers aims at insulating, stopping arcing and corona discharge, and dissipating transformer heat. Transformer running inevitably induces molecule decomposition, thus leading to gases released into transformer oil. The released gases not only reduce the transformer oil's performance but also possibly induce transformer fault. To prevent catastrophic failure, approaches using, e.g., chromatography and spectroscopy, precisely measure dissolved gases to monitor transformer oil quality; however, many of these approaches still suffer from complicated operations, expensive costs, or slow speed. To solve these problems, we provide a new transformer oil quality evaluation method based on quantitative phase microscopy. Using our designed phase real-time microscopic camera (PhaseRMiC), under- and over-focus images of gas bubbles in transformer oil can be simultaneously captured during field of view scanning. Further, oil-to-gas-volume ratio can be computed after phase retrieval via solving the transport of intensity equation to evaluate transformer oil quality. Compared with traditionally and widely used approaches, this newly designed method can successfully distinguish transformer oil quality by only relying on rapid operations and low costs, thus delivering a new solution for transformer prognosis and diagnosis.
PubMed: 35200879
DOI: 10.1364/AO.440583 -
Sensors (Basel, Switzerland) Oct 2022Transformers play an essential role in power networks, ensuring that generated power gets to consumers at the safest voltage level. However, they are prone to insulation... (Review)
Review
Transformers play an essential role in power networks, ensuring that generated power gets to consumers at the safest voltage level. However, they are prone to insulation failure from ageing, which has fatal and economic consequences if left undetected or unattended. Traditional detection methods are based on scheduled maintenance practices that often involve taking samples from in situ transformers and analysing them in laboratories using several techniques. This conventional method exposes the engineer performing the test to hazards, requires specialised training, and does not guarantee reliable results because samples can be contaminated during collection and transportation. This paper reviews the transformer oil types and some traditional ageing detection methods, including breakdown voltage (BDV), spectroscopy, dissolved gas analysis, total acid number, interfacial tension, and corresponding regulating standards. In addition, a review of sensors, technologies to improve the reliability of online ageing detection, and related online transformer ageing systems is covered in this work. A non-destructive online ageing detection method for in situ transformer oil is a better alternative to the traditional offline detection method. Moreover, when combined with the Internet of Things (IoT) and artificial intelligence, a prescriptive maintenance solution emerges, offering more advantages and robustness than offline preventive maintenance approaches.
Topics: Artificial Intelligence; Reproducibility of Results; Electric Power Supplies; Maintenance
PubMed: 36298273
DOI: 10.3390/s22207923