> kb/ai-stack-2026.md · STACK · 12 MIN

FIELD MAP

THE AI STACK — MAY 2026

The application layer most teams ship on is now ten distinct layers deep, wrapped by two rails that touch every one of them. This is a working map, not a buyer's guide: where each category sits, what a few representative providers do, and how the pieces connect.

Read it top to bottom — surface to silicon. The left rail, Observability, and the right rail, Governance, are not steps in the flow; they are concerns that cut across all ten layers. Tap any provider in the diagram to jump to its explanation and an outbound link below.

THE AI STACK — MAY 2026

01
End-User Surfaces

Cursor Perplexity ChatGPT Claude
02
Agent Runtimes

Claude Code Devin Replit Agent Codex Cursor Agent
03
Orchestration Frameworks

LangGraph Microsoft Agent Framework Pydantic AI Mastra Google ADK
04
Protocol Layer NEW

MCP A2A AG-UI
05
Memory NEW

Mem0 Letta Zep
06
Retrieval

Cohere Rerank Voyage AI Neo4j GraphRAG Elastic
07
Storage

pgvector Qdrant Turbopuffer Pinecone neo4j
08
Model Gateway

Portkey LiteLLM OpenRouter
09
Foundation Models

Claude (Anthropic) GPT (OpenAI) Gemini (Google) Meta (Llama) DeepSeek Qwen
10
Inference + Compute

Together AI Fireworks AI vLLM NVIDIA AMD MI400 Google TPU AWS Groq

Tap any item for details ↓

01End-User Surfaces

Cursor

AI-first code editor; agentic edits and codebase-wide changes from natural language.

Perplexity

Answer engine: conversational search with live sources and citations.

ChatGPT

OpenAI's consumer assistant for chat, reasoning and tool use.

Claude

Anthropic's assistant across web, desktop and mobile, tuned for long-context work.

02Agent Runtimes

Claude Code

Terminal-native agentic coding from Anthropic; delegates multi-step engineering tasks.

Devin

Cognition's autonomous software engineer that plans and executes end-to-end.

Replit Agent

Builds and deploys full apps from a prompt inside Replit's cloud IDE.

Codex

OpenAI's coding agent for the cloud and CLI, running tasks in isolated sandboxes.

Cursor Agent

Cursor's background agent mode for parallel, longer-running coding work.

03Orchestration Frameworks

LangGraph

Graph-based orchestration for stateful, multi-step agent workflows (LangChain).

Microsoft Agent Framework

Microsoft's unified agent framework, consolidating Semantic Kernel and AutoGen.

Pydantic AI

Type-safe Python agent framework from the Pydantic team.

Mastra

TypeScript framework bundling agents, workflows, memory and evals.

Google ADK

Google's open-source Agent Development Kit (Python, Java, Go, TypeScript).

04Protocol Layer

MCP

Model Context Protocol (Anthropic): a standard way to connect models to tools and data.

A2A

Agent2Agent: cross-vendor agent interoperability; created by Google, now Linux Foundation.

AG-UI

Agent-User Interaction protocol (CopilotKit): event stream between agent backends and frontends.

05Memory

Mem0

Drop-in memory API combining vector, graph and key-value stores for personalization.

Letta

OS-style agent memory with paging between context and archival storage (formerly MemGPT).

Zep

Temporal knowledge-graph memory (Graphiti) that tracks how facts change over time.

06Retrieval

Cohere Rerank

Reranking models that reorder candidate passages by true relevance.

Voyage AI

High-quality embedding and reranking models (part of MongoDB).

Neo4j GraphRAG

Graph-based RAG that grounds retrieval in a knowledge graph.

Elastic

Hybrid keyword and vector search on the Elasticsearch engine.

07Storage

pgvector

Postgres extension adding vector similarity search to an existing database.

Qdrant

Open-source vector database with payload filtering and hybrid search.

Turbopuffer

Serverless vector and full-text search built on object storage for low cost at scale.

Pinecone

Fully managed vector database for production retrieval.

neo4j

Native graph database for richly connected data.

08Model Gateway

Portkey

AI gateway adding routing, caching, guardrails and observability across providers.

LiteLLM

Unified SDK and proxy exposing 100+ model providers behind one OpenAI-style API.

OpenRouter

A single API that routes requests across many models and providers.

09Foundation Models

Claude (Anthropic)

Anthropic's Claude model family, tuned for reasoning, coding and long context.

GPT (OpenAI)

OpenAI's GPT family of general-purpose frontier models.

Gemini (Google)

Google DeepMind's multimodal Gemini model family.

Meta (Llama)

Meta's open-weight Llama models for self-hosting and fine-tuning.

DeepSeek

Open-weight models known for strong reasoning at low cost.

Qwen

Alibaba's open-weight Qwen model family across sizes and modalities.

10Inference + Compute

Together AI

Inference cloud for running and fine-tuning open models at scale.

Fireworks AI

Fast, cost-efficient inference serving for open models.

vLLM

Open-source high-throughput inference engine for LLM serving.

NVIDIA

Data-center GPUs that dominate AI training and inference.

AMD MI400

AMD's Instinct MI400-series AI accelerators — AMD's datacenter challenge to NVIDIA.

Google TPU

Google's Tensor Processing Units for training and serving on Google Cloud.

AWS

Cloud plus custom Trainium and Inferentia silicon for AI workloads.

Groq

LPU-based inference delivering very low-latency token generation.

MEMBER · FREE

Read the full article or download as PDF

The full article and the PDF are member content. Magic-link login, no credit card, no risk — both available immediately.

Sign up free → Download PDF (sign in) Already a member? Sign in