LLM foundations

Architecture
Recipes

Explore the foundational architectures that power modern large language models, from transformers to sparse attention mechanisms.

ENCDECENC-DEC

Architecture Comparison

Trade-offsSelection

Trade-offs

Comparing encoder-only, decoder-only, and encoder-decoder architectures for different use cases.

Decoder-only Models

GPT-styleGeneration

GPT-style

GPT-style architectures focused on text generation.

Encoder-Decoder Models

T5-styleSeq2Seq

T5-style

Encoder-decoder architectures for translation and sequence tasks.

Encoder-only Models

BERT-styleUnderstanding

BERT-style

BERT-style architectures optimized for understanding tasks.

KV Cache Optimization

EfficiencyInference

Efficiency

Techniques for efficient key-value cache management to reduce memory and improve throughput.

Mixture of Experts

SparseEfficiency

Sparse MoE

Models with specialized subnetworks for scalable capacity.

Sparse Attention Mechanisms

EfficiencyOptimization

Efficiency

Alternate attention patterns for complexity reduction.

Transformer Architecture

FoundationAttention

Foundation

The foundational architecture using self-attention mechanisms.