Getting Started with Transformers

Computer science and artificial intelligence's Natural Language Processing (NLP) branch focuses on how computers and human language interact. It entails the creation of models and algorithms that can analyze, comprehend, and produce human language.

Numerous issues, including language translation, sentiment analysis, text summarization, speech recognition, and question-answering systems, are resolved using NLP. As the amount of digital text data continues to increase exponentially and the need to glean insights and knowledge from this data increases, these applications have grown in significance.

What are Transformers in NLP?

Transformers, a specific type of neural network design, have become quite popular in NLP because of their ability to model long-distance connections in text data. They were initially proposed by Vaswani et al. in a foundational publication in 2017 and have since gained popularity. Recurrent neural networks (RNNs), for instance, are conventional NLP models that take an iterative approach to processing input and rely on hidden states to transmit information across time. This technique, however, might struggle to capture dependencies that happen later in the sequence, which could lead to poor performance on jobs that require long-term context.

Example of an RNN Convention

Take the following phrase from a movie review, for instance: "The film was good, but not great."

The reviewer thinks the movie was decent, but not excellent, which is expressed in this line in a conflicted manner. Nevertheless, depending on which part of the sentence it concentrates on, an RNN model might be unable to accurately capture this nuance and predict the sentiment as either positive or negative. A positive sentiment might be predicted by the model if it only takes the word "good" into account, but a negative feeling might be predicted if it only takes the phrase "not great" into account.More sophisticated models, such as transformers, have been suggested as a solution to this problem. These models have proven to be quite successful in tasks involving sentiment analysis in natural language processing.

Transformers, however, use a self-attention mechanism to compute a weighted total of the input sequence at each location, enabling them to recognize dependencies throughout the entire sequence. This makes them suitable for jobs like language translation, where it's crucial to comprehend the overall text of the complete sentence before producing a translation. Transformers are particularly parallelizable and can model long-range dependencies, which makes them ideal for training on huge datasets in distributed computing systems. This has made it possible for researchers to train bigger, more intricate models, which has significantly improved their performance on a variety of NLP tasks.

There are numerous different types of transformers that are frequently used in natural language processing (NLP). Four of the most significant categories are shown below, each with an example −

  • Encoder only transformers

  • Decoder only transformers

  • Encoder-Decoder transformers

  • Dynamic convolution transformers

Encoder Only Transformers

Transformers that solely have an encoder component—not a decoder component—are referred to as encoder-only transformers. These models are frequently employed for tasks where the input sequence is processed and classified as a single unit, such as phrase classification and named entity identification. BERT (Bidirectional Encoder Representations from Transformers) and RoBERTa (Robustly Optimized BERT pre-training Technique) are two examples of encoder-only transformers.

Decoder Only Transformers

Transformers with only a decoder component are referred to as decoder-only transformers since they lack an encoder component. These models are frequently employed in tasks like language creation and machine translation, where it is necessary for the model to produce an output sequence based on an input sequence. GPT (Generative Pretrained Transformer) and GPT-2 are two examples of transformers that exclusively function as decoders.

Encoder-Decoder Transformers

Transformers with both an encoder and a decoder component are known as encoder-decoder transformers. In applications like machine translation, where the model must first encode an input sequence into a fixed-length representation and then decode that representation into an output sequence, these models are frequently used. The initial transformer model put forth by Vaswani et al. and the more modern T5 (Text-to-Text Transfer Transformer) are two examples of encoder-decoder transformers.

Dynamic Convolution Transformers

A relatively novel class of transformers called dynamic convolutional transformers substitutes dynamic convolutions for the conventional self-attention method. These models preserve the computational effectiveness of conventional transformers while capturing longer-range dependencies in text data. The recently presented Longformer model is an illustration of a dynamic convolutional transformer.

The best transformer type to choose will depend on the unique NLP task at hand, as each has advantages and disadvantages of its own. However, they are all effective modelling tools for natural language data and have helped NLP make significant strides.

Libraries Used and Examples

Hugging Face Transformers

A business called "Hugging Face" creates tools and libraries for natural language processing, such as the well-known Transformers open-source library. Modern NLP models are simple for researchers and practitioners to employ in their own work thanks to the Transformers library, which offers pre-trained models and a variety of NLP tasks.

Here is an illustration of how to use a pre-trained BERT model to do sentiment analysis on a piece of text using the Transformers library −


from transformers import pipeline

# creates a sentiment analysis pipeline
classifier = pipeline("sentiment-analysis", model="bert-base-uncased")

# Distinguish the sentiment of a piece of text
text = "I really enjoyed the movie"
result = classifier(text)[0]

# prints the final result.
print(f"Text: {text}")
print(f"Sentiment: {result['label']}")
print(f"Score: {result['score']}")


Text: I really enjoyed the movie
Sentiment: LABEL_1
Score: 0.7276840806007385

In this example, we use a pre-trained BERT model (bert-base-uncased) and the pipeline function from the Transformers library to build a sentiment analysis pipeline. The pipeline is then given a text sample, and it outputs a dictionary with the projected sentiment label (positive or negative) and a confidence score for the prediction. We finally printed the outcomes.

Noting that there are numerous other pre-trained models and NLP jobs available that can be utilized in a similar manner, this is but one example of how to use the Transformers library.

  • The Illustrated Transformers − The Illustrated Transformer was developed by Jay Alammar and serves as a visual explanation of the inner workings of transformers. It gives a detailed description of how transformers operate, illuminating each step with pictures and bits of code.

  • The Annotated Transformer − The Annotated Transformer is a thorough explanation of Vaswani et al.'s original transformer paper, complete with PyTorch code. It offers a deeper comprehension of the underlying mathematical ideas that underlie transformers and is a wonderful starting point for anyone wishing to build their own transformer models from the ground up.


To sum up, transformers are an effective method for processing natural language and are being used in a growing number of applications. There are several tools available to help you get started, whether you want to optimize a pre-trained model using your own data or create your own transformer from scratch. We sincerely hope that this article has served as an insightful introduction to the world of transformers, and we strongly urge you to learn more and see what they can do for you.

Updated on: 07-Aug-2023


Kickstart Your Career

Get certified by completing the course

Get Started