Agent Recipes

Overview

Encoder-decoder models combine both encoder and decoder components of the Transformer architecture. These models, exemplified by T5 (Text-to-Text Transfer Transformer), BART, and others, excel at sequence-to-sequence tasks like translation and summarization.

Key characteristics

Sequence-to-Sequence: Process an input sequence and generate an output sequence of potentially different length
Cross-Attention: Decoder attends to encoder outputs to condition on the input sequence
Versatile Tasks: Can handle various tasks by framing them as text-to-text problems
Bidirectional Encoder: Encoder processes input in both directions for full context understanding

Popular models

T5 (Text-to-Text Transfer Transformer): Frames all NLP tasks as text generation
BART (Bidirectional and Auto-Regressive Transformers): Combines bidirectional encoding with autoregressive decoding
mBART: Multilingual variant of BART for machine translation
Pegasus: Pre-trained for abstractive summarization

Core steps

Encoder Processing: Input sequence is processed bidirectionally through encoder layers
Cross-Attention: Decoder layers attend to encoder outputs to condition on input
Autoregressive Generation: Decoder generates output tokens one at a time
Output Projection: Final representations are projected to vocabulary logits

Encoder-decoder trade-offs

Input understanding: Full bidirectional vs causal only
Generation: Autoregressive for both
Pretraining efficiency: Lower than decoder-only
Fine-tuning flexibility: Task-specific vs instruction-tuned

Training objectives

Prefix LM: Predict continuation of prefix
Infilling: Fill masked spans
Seq2seq LM: Standard encoder-decoder training

Architecture overview

Encoder-decoder models consist of an encoder that processes the input sequence bidirectionally, and a decoder that generates the output autoregressively while attending to the encoder's representations.

Core steps

Encoder Processing: Input sequence is processed bidirectionally through encoder layers
Cross-Attention: Decoder layers attend to encoder outputs to condition on input
Autoregressive Generation: Decoder generates output tokens one at a time
Output Projection: Final representations are projected to vocabulary logits

Applications

Sequence-to-Sequence Tasks

Machine Translation: Translating text between languages
Summarization: Generating concise summaries of documents
Question Generation: Generating questions from given contexts
Text Simplification: Rewriting text in simpler terms

Generation Tasks

Data-to-Text: Generating text from structured data or tables
Code Generation: Generating code from natural language descriptions
Dialogue Generation: Generating conversational responses

Industry applications

Translation Services: Real-time and document translation across languages
Content Summarization: News summarization, document condensation
Data Extraction: Extracting structured information from unstructured text

Encoder-Decoder Models

Overview

Key characteristics

Popular models

Core steps

Encoder-decoder trade-offs

Training objectives

Architecture overview

Core steps

Applications

Sequence-to-Sequence Tasks

Generation Tasks

Industry applications

Encoder-Decoder Models Implementation