28 minute read

Capabilities of AI Transformers

Capabilities of AI Transformers

Background

Whether GPT, ChatGPT, DALL-E, Whisper, Satablity AI or whatever significant you see in the AI worlds nowdays it is because of Transformer Architecture. Transformers are a type of neural network architecture that have several properties that make them effective for modeling data with long-range dependencies. They generally feature a combination of multi-headed attention mechanisms, residual connections, layer normalization, feedforward connections, and positional embeddings.

Precursors of Transformers were RNN, LSTM, and GRU architecture. Transformers are based on the 2017 research paper “Attention is All You Need”

Initially, Transformers were used for NLP-related tasks. Slowly researchers started exploring the power of the Transformer Architectures and as of 2023 these are used for hundreds of tasks in different AI domains of technologies like:

  • Text Models (NLP, NLU, NLG)
  • Vision Models (Computer Vision)
  • Audio Models (Audio Processing, Classification, Audio Generation)
  • Reinforcement (RL) Models
  • Time-series Models
  • Multimodal: OCR (extract information from scanned documents), video classification, visual QA, table data question answering
  • Graph Models

Starting the journey in 2017, as of now (2023) we have approx 200 Transformer based architectures proposed by various researchers for various purposes. Using these architecture and various benchmark datasets thousands of models have been created which give SOTA performance on various tasks. Based on your need you choose which architecture can help you meet your project objective. There are high chances you will get some pre-trained models which you can use without training (Zero-shot) or small finetuning (one-shot or few-shot) efforts. For that you need to explore Huggingface and PaperWithCode

This articles list all the major Transformer related researcher paper, their object, and capabilities.

Note : Name starting with * are not Transformers, most of them are pretransformer age architectures.

Capabilities of AI Transformers

Sno Transformer Objective Summary NLP Tasks CV Tasks
1 *AlexNet Image Classification A deep convolutional neural network architecture for image classification tasks. - Image Classification, Object Detection
2 *VGG16 Visual Geometry Group Network (16 layers) A deep CNN model with 16 convolutional layers developed by the Visual Geometry Group at Oxford University. - Image Classification, Object Detection
3 *VGG19 Visual Geometry Group Network (19 layers) A deep CNN model with 19 convolutional layers, an extended version of VGG16. - Image Classification, Object Detection
4 *ResNet Residual Networks A deep CNN architecture that introduces residual connections to alleviate the vanishing gradient problem. - Image Classification, Object Detection
5 *InceptionResNet Combination of Inception and ResNet A hybrid CNN model that combines the strengths of the Inception and ResNet architectures. - Image Classification, Object Detection
6 *ConvNeXt Improved Convolutional Neural Network A convolutional neural network architecture that aims to capture richer spatial relationships in images. - Image Classification, Object Detection
7 *DenseNet Dense Connections in Convolutional Networks A densely connected convolutional neural network architecture that encourages feature reuse and reduces the number of parameters. - Image Classification, Object Detection
8 *MobileNetV1 Mobile-oriented CNN Architecture A lightweight convolutional neural network architecture designed for mobile and embedded devices. - Image Classification, Object Detection
9 *Xception Extreme Inception A deep CNN architecture that replaces the standard Inception modules with depthwise separable convolutions. - Image Classification, Object Detection
10 EncoderDecoder Sequence-to-sequence modeling A transformer-based model architecture that combines encoder and decoder for sequence-to-sequence tasks such as machine translation. Machine Translation, Text Summarization -
11 *MobileNetV2 Improved MobileNet Architecture An enhanced version of MobileNet with improved performance and efficiency. - Image Classification, Object Detection
12 Data2Vec Embedding data tables A transformer-based model for embedding and encoding structured data tables. Tabular Data Embedding, Data Table Encoding -
13 GPT Language modeling and text generation A transformer-based model trained on a large corpus to generate coherent and contextually relevant text. Text Generation, Text Completion, Language Modeling -
14 BERT Pre-training and fine-tuning on various NLP tasks A transformer-based model widely used for pre-training and fine-tuning on NLP tasks. Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
15 MarianMT Multilingual Neural Machine Translation A multilingual neural machine translation model based on the Marian framework. Machine Translation -
16 BiT Vision transformer for image classification A vision transformer model pre-trained on large-scale datasets for image classification tasks. - Image Classification, Object Detection, Semantic Segmentation
17 Transformer-XL Transformer model with extended context A transformer model architecture that extends the context window, enabling longer-range dependencies. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
18 XLM Cross-lingual Language Model A transformer-based model for cross-lingual language understanding and machine translation. Cross-lingual Language Understanding, Machine Translation -
19 CTRL Text generation with control codes A transformer-based model that allows fine-grained control over generated text using control codes. Text Generation, Controlled Text Generation -
20 GPT-2 Language modeling and text generation A transformer-based model similar to GPT but with a smaller architecture, trained on a large corpus to generate coherent and contextually relevant text. Text Generation, Text Completion, Language Modeling -
21 Funnel Transformer Improving the efficiency and effectiveness of transformers A transformer-based model architecture that reduces the computational cost of transformers while maintaining their effectiveness. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
22 *EfficientNet B0 Efficient and Scalable CNN A family of convolutional neural network architectures that achieve high accuracy with fewer parameters and computations. - Image Classification, Object Detection
23 ALBERT Improve the efficiency of BERT A lite version of BERT that uses parameter reduction techniques to achieve faster training and lower memory consumption. Classification, Translation, Named Entity Recognition (NER) -
24 EfficientNet Efficient convolutional neural network architecture A convolutional neural network architecture that achieves state-of-the-art performance with significantly fewer parameters. - Image Classification, Object Detection, Semantic Segmentation
25 MobileNetV3 Efficient Mobile Neural Network for Computer Vision A lightweight and efficient neural network architecture designed for computer vision tasks on mobile devices. Image Classification, Object Detection, Semantic Segmentation -
26 Nezha Neural Encoder for Zero-shot Transfer Learning A transformer-based model that enables zero-shot transfer learning by learning a shared semantic space. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
27 BART Text generation and summarization A denoising autoencoder model that can be used for text generation and summarization tasks. Text Generation, Summarization -
28 ERNIE Enhanced representation through knowledge integration A transformer-based model that enhances representation learning by integrating external knowledge sources. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
29 ErnieM Enhanced representation through multitask learning A multitask learning framework that enhances representation learning by jointly training multiple downstream NLP tasks. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
30 FlauBERT French language representation learning A transformer-based model specifically trained for French language representation learning tasks. French Language Processing, Text Classification -
31 LXMERT Vision and Language Multimodal Transformer A multimodal transformer model that combines vision and language information for various tasks. Visual Question Answering (VQA), Visual Dialog, Image Captioning, Visual Grounding -
32 Pegasus Pre-training with Extracted Gap Sentences for Abstractive Summarization A transformer-based model trained for abstractive text summarization tasks. Text Summarization -
33 XLNet Generalized Autoregressive Pretraining A transformer-based model that leverages permutation-based training to learn bidirectional context. Language Modeling, Text Classification -
34 BioGpt Processing biomedical text A variant of the GPT model specifically designed for processing biomedical text. Biomedical Text Processing, Named Entity Recognition (NER), Clinical Text Understanding -
35 Hubert Automatic speech recognition with transformers A transformer-based model designed for automatic speech recognition tasks. Automatic Speech Recognition -
36 REALM Retrieval-Augmented Language Model A language model augmented with a dense retrieval mechanism to improve performance on text retrieval tasks. Information Retrieval, Text Classification, Question Answering (QA) -
37 SpeechToTextTransformer Transformer for Speech-to-Text Conversion A transformer-based model designed specifically for speech-to-text conversion tasks. Speech-to-Text Conversion -
38 XLM-V Cross-lingual Language Understanding A transformer-based model for cross-lingual language understanding, leveraging multilingual embeddings. Cross-lingual Language Understanding -
39 RoBERTa Robustly optimized BERT variant An optimized variant of BERT (Bidirectional Encoder Representations from Transformers) for various NLP tasks. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
40 GPT Neo Efficient and scalable variant of GPT A transformer-based model architecture that provides an efficient and scalable variant of GPT for various natural language processing tasks. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
41 CamemBERT French language processing and text classification A transformer-based model specifically trained for French language processing and text classification tasks. French Language Processing, Text Classification -
42 DialoGPT Conversational AI chatbot A transformer-based model trained for generating human-like conversational responses. Conversational AI, Chatbot -
43 DistilBERT Distilled version of BERT A smaller and faster version of BERT with a similar performance on various NLP tasks. Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
44 LiLT Language learning from transliterated text A transformer-based model for language learning that utilizes transliterated text as training data. Language Learning -
45 LUKE Language Understanding with Knowledge-based Entities A model that integrates knowledge-based entities into transformer-based language understanding tasks. Named Entity Recognition (NER), Relation Extraction, Knowledge Graph Completion -
46 MobileBERT Efficient BERT for Mobile and Edge Devices A compact and efficient version of BERT designed for deployment on mobile and edge devices. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
47 MT5 Multilingual Text-to-Text Transfer Transformer A transformer-based model capable of multilingual text-to-text transfer learning across various NLP tasks. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
48 RAG Retrieval-Augmented Generation A model that combines retrieval and generation methods for open-domain question answering. Open-Domain Question Answering -
49 ConvBERT Text classification and named entity recognition (NER) A transformer-based model for text classification and named entity recognition (NER) tasks. Classification, Named Entity Recognition (NER), Sentiment Analysis -
50 Megatron-GPT2 High-performance GPT-2-based language model A high-performance GPT-2-based language model developed using the Megatron framework. Text Generation, Text -
51 PhoBERT Pretrained language model for Vietnamese A pretrained language model specifically designed for the Vietnamese language. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
52 RoBERTa-PreLayerNorm RoBERTa with PreLayerNorm A variant of RoBERTa with the PreLayerNorm (PLN) technique, which improves training stability and efficiency. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
53 BERTweet Pre-trained BERT models for processing tweets BERT models specifically trained on Twitter data for tweet processing tasks. Classification, Named Entity Recognition (NER), Sentiment Analysis -
54 mBART Multilingual Denoising Autoencoder A multilingual denoising autoencoder based on the BART framework, capable of generating text in multiple languages. Text Generation, Text Completion, Multilingual Language Modeling -
55 Megatron-BERT High-performance BERT-based language model A high-performance BERT-based language model developed using the Megatron framework. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
56 SpeechToTextTransformer2 Transformer model for Speech-to-Text Conversion Another transformer-based model for speech-to-text conversion, providing an alternative approach. Speech-to-Text Conversion -
57 BERT For Sequence Generation Text generation using BERT-based models Fine-tuned BERT models for sequence generation tasks, such as text generation or summarization. Text Generation, Summarization -
58 ConvNeXT Language modeling and text generation A transformer-based model for language modeling and text generation tasks. Language Modeling, Text Generation -
59 ELECTRA Pre-training method for language representation learning A pre-training method that replaces masked language modeling with a generator-discriminator setup for better language representation. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
60 Longformer Long-range sequence modeling with transformers A transformer-based model architecture that extends the standard transformer to handle long-range dependencies. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
61 RegNet Regularized Convolutional Neural Network A convolutional neural network architecture with regularization techniques for efficient and scalable training. - Image Classification, Object Detection, Semantic Segmentation
62 SqueezeBERT Lightweight BERT model with Squeeze-and-Excitation A lightweight variant of BERT with Squeeze-and-Excitation (SE) blocks for efficient training and inference. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
63 LayoutLM Text and layout understanding for document analysis A transformer-based model that combines text and layout information for document understanding tasks. Document Understanding, OCR, Named Entity Recognition (NER) -
64 MPNet Megatron Pretrained Network A model pretrained using the Megatron framework, designed for various NLP tasks with high performance. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
65 VisualBERT Integrating Visual Information with BERT A BERT-based model that incorporates visual information for multimodal understanding. - Vision-Language Tasks, Image Captioning, Visual Question Answering (VQA)
66 Conditional DETR Object detection and instance segmentation A transformer-based model for object detection and instance segmentation tasks. - Object Detection, Instance Segmentation
67 GPTBigCode Code generation for programming languages A transformer-based model trained on a large corpus of code to generate code snippets or complete programs for various programming languages. Code Generation, Programming Language Processing -
68 M-CTC-T Music Transcription with Transformer A transformer-based model designed for music transcription, converting audio into musical notation. Music Transcription -
69 Pix2Struct Image-to-Structure Translation A transformer-based model for translating images into structured representations. - Image-to-Structure Translation
70 ProphetNet Pretrained Sequence-to-Sequence Model A sequence-to-sequence model pretrained for various NLP tasks, based on the transformer architecture. Text Generation, Text Completion, Machine Translation, Summarization -
71 SEW Simple and Efficient Word-level language model A word-level language model that is simple and efficient, designed for various NLP tasks. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
72 T5 Text-to-Text Transfer Transformer A text-to-text transfer transformer model that can be fine-tuned for various NLP tasks. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
73 DeBERTa Improving the effectiveness of BERT A transformer-based model that enhances BERT by addressing its limitations and improving performance on various NLP tasks. Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
74 Informer Time series forecasting with transformers A transformer-based model for time series forecasting tasks, capturing long-term dependencies in the data. Time Series Forecasting -
75 LED Language model for efficient decoding A transformer-based language model designed for efficient decoding, suitable for constrained environments. Text Generation, Text Completion, Language Modeling -
76 SwitchTransformers Transformers with Dynamic Routing A library that provides implementations of various transformer models with dynamic routing capabilities. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) Image Classification, Object Detection, Semantic Segmentation
77 Whisper Unsupervised Representation Learning A transformer-based model for unsupervised representation learning on audio data. Speech Representation Learning -
78 XLM-ProphetNet Cross-lingual Language Generation A transformer-based model for cross-lingual language generation, extending the ProphetNet architecture. Cross-lingual Language Generation -
79 XLM-RoBERTa Cross-lingual Language Representation A cross-lingual variant of RoBERTa, providing multilingual representation learning. Cross-lingual Language Representation -
80 Deformable DETR Object detection and instance segmentation with deformable attention A transformer-based model for object detection and instance segmentation tasks, incorporating deformable attention mechanisms. - Object Detection, Instance Segmentation
81 FNet Image generation with Fourier features A transformer-based model that generates images using Fourier features instead of traditional positional encodings. - Image Generation
82 GPTSAN-japanese Japanese language variant of GPT for sentiment analysis A version of GPT specifically designed and trained for sentiment analysis tasks in the Japanese language. Japanese Language  
83 SEW-D Deep version of Simple and Efficient Word-level language model A deep variant of SEW for improved performance on NLP tasks. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
84 CPM Chinese language processing and text generation A transformer-based model specifically designed for Chinese language processing and text generation tasks. Chinese Language Processing, Text Generation -
85 GIT Generating informative text from structured data A transformer-based model that generates informative text, such as explanations or summaries, from structured data inputs. Data-to-Text Generation, Structured Data Processing -
86 LayoutXLM Multilingual document understanding with transformers A transformer-based model for multilingual document understanding, incorporating text and layout information. Multilingual Document Understanding, OCR, Named Entity Recognition (NER) -
87 DETR Object detection and instance segmentation A transformer-based model for object detection and instance segmentation tasks. - Object Detection, Instance Segmentation
88 GPT NeoX Further improved version of GPT Neo An advanced version of GPT Neo that incorporates additional enhancements and optimizations for natural language processing tasks. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
89 RemBERT Transformer model for code A transformer-based model specifically designed for code-related tasks, such as code generation and understanding. Code Generation, Code Understanding -
90 RoCBert Robustly optimized Chinese BERT variant A Chinese language variant of RoBERTa, optimized for various NLP tasks in Chinese text. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
91 TAPAS Table Parsing via Transformer A transformer-based model designed for table parsing, enabling natural language queries over tabular data. Table Parsing, Question Answering (QA) over Tabular Data -
92 UPerNet Unified Perceptual Parsing Network A unified perceptual parsing network based on the transformer model, designed for image segmentation tasks. Semantic Segmentation, Image Parsing -
93 Vision Transformer (ViT) Transformer-based model for image classification A transformer-based model designed for image classification tasks, replacing convolutional layers with self-attention. - Image Classification, Object Detection, Semantic Segmentation
94 Wav2Vec2 Self-supervised Audio Representation Learning A transformer-based model for self-supervised audio representation learning, capturing phonetic information. Speech Recognition, Speech Representation Learning -
95 PLBart Pre-trained Language model for BART A pre-trained variant of BART (Bidirectional and AutoRegressive Transformers) for various NLP tasks. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
96 DiT Vision transformer for image classification A transformer-based model for image classification tasks that applies vision transformers to process image data. - Image Classification, Object Detection, Semantic Segmentation
97 DPR Dense Passage Retrieval A transformer-based model for dense passage retrieval, enabling efficient and accurate retrieval of relevant passages. Passage Retrieval, Document Ranking -
98 GLPN Learning global-local patterns in natural language processing A transformer-based model that captures both global and local patterns in text for various natural language processing tasks. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
99 LeViT Vision transformer with less computations A vision transformer model that reduces computational requirements by using fewer computations. - Image Classification, Object Detection, Semantic Segmentation
100 NAT Neural Architecture Transformer A transformer-based model that learns to design neural architectures for various tasks. Neural Architecture Search, AutoML -
101 TAPEX Transformer model for text and program execution A transformer-based model capable of executing programs described in natural language text. Text-to-Program Execution, Natural Language Processing -
102 VideoMAE Video Motion Analysis Encoder A transformer-based model for video motion analysis tasks, encoding motion information in videos. - Video Motion Analysis, Action Recognition, Video Understanding
103 Wav2Vec2-Conformer Conformer-based variant of Wav2Vec2 A variant of Wav2Vec2 that incorporates Conformer architecture, improving its performance on speech-related tasks. Speech Recognition, Speech Representation Learning -
104 CLIP Image-text matching and zero-shot learning A transformer-based model that learns to match images and text, enabling zero-shot learning capabilities. - Image-Text Matching, Zero-Shot Learning
105 XLS-R Cross-lingual Speech Recognition A transformer-based model for cross-lingual speech recognition, trained on multilingual speech data. Cross-lingual Speech Recognition -
106 Audio Spectrogram Transformer Processing audio spectrograms A transformer model specifically designed for processing audio spectrograms. Automatic Speech Recognition (ASR), Sound Classification -
107 M2M100 Multilingual Multimodal Transformer A transformer-based model capable of multilingual and multimodal tasks, trained on 100 different languages. Machine Translation, Multilingual Text Classification, Multimodal Tasks -
108 MEGA Multilingual Language Generation with Transformers A transformer-based model for multilingual language generation tasks, capable of producing text in multiple languages. Text Generation, Text Completion, Multilingual Language Modeling -
109 BEiT Vision transformer for image classification Combines concepts from CNNs and transformers for image classification tasks. - Image Classification, Object Detection, Semantic Segmentation
110 BigBird-Pegasus Text generation and summarization A variant of the Pegasus model that incorporates the BigBird sparse attention mechanism. Text Generation, Summarization -
111 BigBird-RoBERTa Classification and named entity recognition A variant of the RoBERTa model that incorporates the BigBird sparse attention mechanism. Classification, Named Entity Recognition (NER) -
112 CLIPSeg Image segmentation A transformer-based model for image segmentation tasks. - Image Segmentation
113 DPT Object detection and instance segmentation with deformable attention A transformer-based model for object detection and instance segmentation tasks, incorporating deformable attention mechanisms. - Object Detection, Instance Segmentation
114 Perceiver IO Perceiver with Input/output processing A transformer model architecture that handles input and output processing jointly, enabling cross-modal tasks. Multimodal Tasks -
115 Reformer Memory-efficient Transformer A transformer model variant designed to be more memory-efficient by using reversible layers. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
116 RoFormer Robustly optimized Transformer variant for images A transformer-based model specifically designed for image-related tasks, leveraging self-attention mechanisms. - Image Classification, Object Detection, Semantic Segmentation
117 Swin Transformer Shifted Window Transformer A transformer model that uses shifted windows to capture long-range dependencies in images. - Image Classification, Object Detection
118 TrOCR Transformer-based OCR model A transformer-based model designed for Optical Character Recognition (OCR) tasks, converting images to text. Optical Character Recognition (OCR) -
119 Wav2Vec2Phoneme Phoneme-level variants of Wav2Vec2 Phoneme-level variants of Wav2Vec2 designed for speech recognition tasks at the phoneme level. Phoneme-level Speech Recognition -
120 X-CLIP Cross-modal Learning with CLIP A transformer-based model for cross-modal learning, incorporating the CLIP framework. - Vision-Language Tasks, Cross-modal Learning
121 XLSR-Wav2Vec2 Cross-lingual Speech Representation A variant of Wav2Vec2 trained for cross-lingual speech representation learning. Cross-lingual Speech Representation -
122 Blenderbot Conversational AI chatbot A chatbot model designed for multi-turn conversations that combines language and dialogue understanding. - -
123 BlenderbotSmall Conversational AI chatbot A smaller version of Blenderbot, designed for multi-turn conversations with language and dialogue understanding capabilities. - -
124 BLIP Image classification and image captioning A transformer-based model for image classification and image captioning tasks. - Image Classification, Image Captioning
125 ByT5 Text translation, classification, and question answering A transformer-based model trained on T5 architecture, suitable for text translation, classification, and question answering tasks. Translation, Text Classification, Question Answering (QA) -
126 CvT Cross Vision and Transformer A transformer-based model that combines vision and language understanding, enabling cross-modal tasks in computer vision. - Image-Text Matching, Vision-Language Tasks
127 DeBERTa-v2 Improved version of DeBERTa An updated version of DeBERTa with improved performance and compatibility for various NLP tasks. Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
128 DeiT Vision transformer for image classification A vision transformer model designed for image classification tasks. - Image Classification, Object Detection, Semantic Segmentation
129 GroupViT Vision transformer with group-based operations A vision transformer model that incorporates group-based operations to enhance its representation capacity. - Image Classification, Object Detection, Semantic Segmentation
130 LayoutLMv2 Improved version of LayoutLM for document analysis An enhanced version of LayoutLM with improved performance and additional capabilities for document analysis. Document Understanding, OCR, Named Entity Recognition (NER) -
131 MaskFormer Masked Language Modeling with Transformers A transformer-based model architecture for masked language modeling tasks, such as pretraining BERT. Language Modeling, Pretraining BERT -
132 SegFormer Segmentation Transformer for computer vision A transformer-based model designed for image segmentation tasks in computer vision. Semantic Segmentation, Object Detection -
133 Time Series Transformer Transformer model for time series data A transformer-based model designed specifically for time series data analysis and forecasting tasks. Time Series Forecasting, Anomaly Detection, Sequence Modeling -
134 TimeSformer Time Series Transformer for video analysis A transformer-based model for video analysis and action recognition tasks, leveraging temporal information. - Video Action Recognition, Temporal Modeling
135 Trajectory Transformer Transformer model for trajectory forecasting A transformer-based model designed for trajectory forecasting tasks, such as predicting object movement. Trajectory Forecasting, Object Movement Prediction -
136 UniSpeech Unified Speech Recognition and Synthesis Transformer A unified transformer-based model for both speech recognition and speech synthesis tasks. Speech Recognition, Text-to-Speech Synthesis -
137 UniSpeechSat Self-supervised pre-training for UniSpeech A self-supervised pre-training method for UniSpeech, improving its performance on speech-related tasks. Speech Recognition, Text-to-Speech Synthesis -
138 ALIGN Joint representation learning for textual and tabular data Enables joint representation learning by aligning textual and tabular data. Text-Tabular Alignment, Joint Representation Learning -
139 BORT Language modeling and reinforcement learning A transformer-based model for language modeling and reinforcement learning tasks. Language Modeling, Text Generation -
140 DePlot Data visualization A transformer-based model that generates interactive and informative visualizations from data. Data Visualization -
141 DETA Document extraction and text analysis A transformer-based model for document extraction, information retrieval, and text analysis tasks. Document Extraction, Information Retrieval, Text Analysis -
142 DiNAT Network traffic anomaly detection A transformer-based model for network traffic anomaly detection, specifically designed for cybersecurity applications. Network Traffic Analysis, Anomaly Detection -
143 Jukebox Music generation with transformers A transformer-based model architecture for generating music with various styles and genres. Music Generation -
144 mBART-50 Compact version of mBART for resource-constrained A compact version of mBART with reduced parameters and computational requirements. Text Generation, Text Completion, Multilingual Language Modeling -
145 Nyströmformer Approximating Full Transformers with Nyström A transformer variant that approximates full self-attention using the Nyström method for efficiency. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
146 ViT Hybrid Hybrid Architecture of Vision Transformer A hybrid architecture that combines vision transformer with convolutional neural networks for image understanding. - Image Classification, Object Detection, Semantic Segmentation
147 X-MOD Cross-modal Language Modeling A transformer-based model for cross-modal language modeling, integrating vision and text. - Vision-Language Tasks, Cross-modal Language Modeling
148 BARTpho Text-to-speech synthesis A variant of BART model for text-to-speech synthesis tasks. Text-to-Speech Synthesis -
149 BridgeTower Language modeling and text generation A transformer-based model for language modeling and text generation tasks. Text Generation, Language Modeling -
150 CodeGen Code generation A transformer-based model for generating code. Code Generation -
151 GPT-J Japanese language variant of GPT-2 A version of GPT-2 specifically designed and trained for Japanese language understanding and generation tasks. Japanese Language Processing, Text Generation -
152 LLaMA Label-agnostic learning with transformers A transformer-based model that learns to perform tasks without explicit labels, leveraging self-supervision. Self-Supervised Learning, Representation Learning, Clustering -
153 MarkupLM Transformer for document structure understanding A transformer-based model for understanding document structure and semantic relationships in text. Document Structure Understanding, Semantic Analysis -
154 PoolFormer Pooling-based Vision Transformer A vision transformer model that incorporates pooling operations for handling images of varying sizes. - Image Classification, Object Detection, Semantic Segmentation
155 QDQBert Query-Doc Bidirectional Transformer A transformer model specifically designed for query-document ranking and retrieval tasks. Information Retrieval, Question Answering, Document Ranking -
156 ViLT Vision-and-Language Transformer A transformer-based model that combines vision and language understanding for multimodal tasks. - Vision-Language Tasks, Image Captioning, Visual Question Answering (VQA)
157 BARThez Text generation and summarization A variant of BART model trained specifically for the French language. Text Generation, Summarization -
158 Donut Anomaly detection in time series data A transformer-based model for detecting anomalies in time series data, suitable for various applications such as monitoring systems. - Anomaly Detection, Time Series Analysis
159 ImageGPT Image generation with transformers A transformer-based model architecture for generating images based on text prompts. - Image Generation
160 OPT Optimization Pretraining Transformer A transformer model pre-trained for optimization tasks, such as combinatorial optimization and planning. Combinatorial Optimization, Planning -
161 Splinter Speech and Language Integrated Transformer A transformer-based model designed for integrating speech and language tasks. Speech-to-Text Conversion, Speech Recognition, Natural Language Processing -
162 XGLM Cross-lingual Language Modeling A transformer-based model for cross-lingual language modeling, learning representations across languages. Cross-lingual Language Modeling -
163 YOSO You Only Speak Once A transformer-based model for low-resource machine translation, using only monolingual data. Low-resource Machine Translation -
164 EfficientFormer Efficient transformer architecture for sequence modeling A transformer-based model architecture designed to improve efficiency and performance for sequence modeling tasks. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Machine Translation -
165 ESM Protein structure prediction A transformer-based model for predicting the 3D structure of proteins from their amino acid sequences. Protein Structure Prediction, Bioinformatics -
166 Mask2Former Transformer-based masked image inpainting A transformer-based model for masked image inpainting, reconstructing missing parts of an image. - Image Inpainting
167 MGP-STR Music Generation with Pre-trained Model A pre-trained model for generating music, leveraging a transformer-based architecture. Music Generation -
168 NLLB Natural Language Logic Board A model that combines natural language understanding and symbolic logic reasoning for language understanding. Natural Language Understanding, Logic Reasoning -
169 T5v1.1 Version 1.1 of the Text-to-Text Transfer Transformer An updated version of the T5 model with improvements and enhancements for better performance. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
170 TVLT Tiny Vision-Language Transformer A compact vision-language transformer model designed for efficient processing of vision and language inputs. - Vision-Language Tasks, Image Captioning, Visual Question Answering (VQA)
171 WavLM Language Modeling for Speech A transformer-based model for language modeling on speech data. Speech Language Modeling -
172 XLM-RoBERTa-XL Cross-lingual Language Representation A larger variant of XLM-RoBERTa for cross-lingual language representation learning. Cross-lingual Language Representation -
173 Chinese-CLIP Chinese language processing and image-text matching A transformer-based model designed for Chinese language processing and image-text matching tasks. Chinese Language Processing, Image-Text Matching -
174 CLAP Image-text representation learning A transformer-based model for learning joint image-text representations. - Image-Text Representation Learning
175 Decision Transformer Decision-making tasks A transformer-based model designed for decision-making tasks that require complex reasoning and inference. Decision-Making, Reasoning, Inference -
176 BLIP-2 Image classification An updated version of BLIP, specializing in image classification tasks. - Image Classification
177 CANINE Document classification A transformer-based model for document classification tasks. Document Classification -
178 Graphormer Graph representation learning with transformers A transformer-based model architecture specifically designed for graph representation learning. Graph Representation Learning, Node Classification, Graph Classification, Graph Generation -
179 I-BERT Incremental learning with transformers A transformer-based model architecture that supports incremental learning, allowing continual model updates. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
180 MatCha Matching Challenge Transformer A transformer-based model for solving matching challenge tasks, such as natural language inference. Natural Language Inference, Textual Entailment -
181 mLUKE Multilingual Language Understanding with Knowledge A multilingual model that incorporates knowledge-based entities for language understanding tasks. Named Entity Recognition (NER), Relation Extraction, Knowledge Graph Completion -
182 MobileViT Vision Transformer for Mobile and Edge Devices A mobile-friendly version of Vision Transformer, optimized for efficient deployment on mobile and edge devices. - Image Classification, Object Detection, Semantic Segmentation
183 OWL-ViT Object-Wide Learning Vision Transformer A vision transformer model designed for object detection and recognition tasks in computer vision. Object Detection, Object Recognition -
184 SpeechT5 T5-based model for Speech-to-Text A transformer-based model trained for speech-to-text conversion tasks using the T5 architecture. Speech-to-Text Conversion -
185 Swin Transformer V2 Advanced version of Swin Transformer An advanced version of the Swin Transformer model, incorporating improvements for better performance in vision tasks. - Image Classification, Object Detection, Semantic Segmentation
186 ViTMAE Vision Transformer for Multi-label Image Classification A vision transformer model designed specifically for multi-label image classification tasks. - Multi-label Image Classification
187 BLOOM Language modeling and text generation A transformer-based model designed for language modeling and text generation tasks. Text Generation, Language Modeling -
188 ConvNeXTV2 Language modeling and text generation An improved version of ConvNeXT for language modeling and text generation tasks. Language Modeling, Text Generation -
189 CPM-Ant Chinese language processing and text generation An enhanced version of CPM with better performance and compatibility for Chinese language processing and text generation tasks. Chinese Language Processing, Text Generation -
190 GPT-Sw3 Swedish language variant of GPT A version of GPT specifically designed and trained for Swedish language understanding and generation tasks. Swedish Language Processing, Text Generation -
191 LongT5 Text-to-Text Transfer Transformer A transformer-based model for text-to-text transfer learning, capable of performing various NLP tasks. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
192 OneFormer Transformer for Text-to-Text Transfer Learning A transformer-based model designed for text-to-text transfer learning tasks across multiple languages. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Question Answering (QA) -
193 Table Transformer Transformer model for table-related tasks A transformer-based model specifically designed for table-related tasks, such as table understanding and extraction. Table Understanding, Table Extraction -
194 VAN Vision-Adaptive Transformer for Video Analysis A transformer model designed specifically for video analysis tasks, adapting to the dynamic visual context. - Video Classification, Action Recognition, Video Understanding
195 AltCLIP Predicting the relationship between two images A transformer-based model that learns to predict the relationship between two images. - Image-Text Matching, Vision-Language Tasks
196 MVP Multimodal Variational Pretraining A multimodal pretraining framework that combines text and image modalities for various downstream tasks. Multimodal Tasks -
197 NLLB-MOE Natural Language Logic Board with MOE An enhanced version of NLLB that incorporates Mixture of Experts (MOE) for improved performance. Natural Language Understanding, Logic Reasoning -
198 PEGASUS-X Large-Scale Pre-training for Abstractive Summarization A variant of Pegasus with larger model capacity, trained on a large-scale corpus for abstractive summarization. Text Summarization -
199 Swin2SR Swin Transformer for Super-Resolution A variant of the Swin Transformer model specifically designed for super-resolution tasks in computer vision. - Super-Resolution Image Reconstruction
200 UL2 Unsupervised Language Learning A transformer-based model designed for unsupervised language learning tasks, leveraging self-supervised learning techniques. Language Modeling, Text Representation Learning -
201 ViTMSN Vision Transformer with Masked Spatial Neurons A vision transformer model with masked spatial neurons, enabling better spatial representation learning. - Image Classification, Object Detection, Semantic Segmentation
202 YOLOS You Only Learn One Sentence A transformer-based model that learns sentence representations for zero-shot classification. Zero-shot Text Classification -
203 FLAN-T5 Fast and lightweight adapter-based transformers for T5 A transformer-based model architecture that enables efficient and lightweight adaptation of T5 models. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Machine Translation -
204 GPT NeoX Japanese Japanese language variant of GPT NeoX A version of GPT NeoX specifically designed and trained for Japanese language understanding and generation tasks. Japanese Language Processing, Text Generation -
205 LayoutLMv3 Further improved version of LayoutLM for documents An advanced version of LayoutLM that incorporates additional enhancements and optimizations. Document Understanding, OCR, Named Entity Recognition (NER) -
206 FLAN-UL2 Fast and lightweight adapter-based transformers for UL2 A transformer-based model architecture that enables efficient and lightweight adaptation of UL2 models. Text Classification, Named Entity Recognition (NER), Sentiment Analysis, Machine Translation -
207 FLAVA Fluency and acceptability evaluation for machine translation A transformer-based model that evaluates the fluency and acceptability of machine translations. Machine Translation Evaluation -

Conclusion

The purpose of this article is to give you a general understanding of the capabilities of the Transformer architecture. It is now up to you to decide which architecture is most suitable to your needs based on the task you have in front of you. Afterwards, you can use hugginface or tfhub to see if there are already models that have been trained using these architectures. The chances are that you will be able to complete your work using zero-shot transfer learning are high.