LiteLLM, DSPy, Strands, Google ADK: Choosing the Right AI Framework

Criado: 2026-05-05 | Tamanho: 20367 bytes

TL;DR

LiteLLM is infrastructure plumbing, a universal API gateway for 140+ LLM providers. DSPy is a prompt optimization compiler from Stanford that automatically tunes your LLM calls against metrics. Strands Agents is AWS's model-driven agent SDK where the LLM decides the workflow at runtime. Google ADK is Google's modular agent framework with explicit workflow control. They operate at different layers of the stack and often complement each other. Here's how to pick.

The Four Layers

Different layers, not direct competitors:

You might use all four in production: ADK or Strands for agent orchestration, DSPy modules for critical LLM calls that need optimized prompts, and LiteLLM to route those calls to whichever provider is cheapest or fastest.

LiteLLM: The Universal Translator

LiteLLM is a Python SDK and proxy server (AI Gateway) that normalizes 140+ LLM providers and 2,600+ models behind an OpenAI-compatible API. You write code using OpenAI's chat.completions format, and LiteLLM translates it to whatever Anthropic, Cohere, Google, or Bedrock expects.

What it does:

Unified API across all major providers
Load balancing, fallback routing, and provider arbitrage (route by latency or per-token cost)
Per-key/team budgets and spend caps with alerting
Cost tracking, guardrails, logging
Proxy mode for centralized gateway deployment

When to use it: You're running multiple models across providers and need a single interface, cost visibility, and automatic failover. If you're locked into one provider, you don't need it.

When to skip it: Single-provider shops, or when you need the provider's native SDK features that LiteLLM doesn't translate.

Detail	Value (as of March 2026)
GitHub	BerriAI/litellm
Stars	~41,400
License	MIT (Enterprise tier for SSO/RBAC)
Language	Python
Providers	140+ providers, 2,600+ models

Security Note

On March 24, 2026, versions 1.82.7 and 1.82.8 were compromised in a supply chain attack. Per LiteLLM's official advisory, the compromise originated from a Trivy dependency in their CI/CD security-scanning workflow, which led to two malicious uploads to PyPI. The backdoored packages exfiltrated environment variables, SSH keys, and cloud-provider credentials (AWS, GCP, Azure). The packages were live in the 10:39–16:00 UTC window — roughly 40 minutes of effective install time before quarantine, per the advisory. If you installed during that window, rotate all credentials immediately. Fixed in v1.83.0 (released from a hardened CI/CD v2 pipeline); versions ≤1.82.6 audited clean with published SHA-256 checksums.

DSPy: The Prompt Compiler

Any sufficiently complicated AI system contains an ad hoc, informally-specified, bug-ridden implementation of half of DSPy.

— adapted from Greenspun's Tenth Rule, via Skylar Payne.

DSPy (Declarative Self-improving Python) is a framework from Stanford NLP that replaces manual prompt engineering with programmatic optimization. The tagline is "programming, not prompting," and it's not marketing. You define what you want (input/output signatures), not how to get it (specific prompt text).

How it works:

Define signatures: input/output specs like "question -> answer" or a typed class with field descriptions
Select modules: strategies like ChainOfThought, ReAct, or Predict that determine how the LLM gets invoked
Write a metric: a function that scores output quality
Compile: DSPy's optimizers (MIPROv2, BootstrapFewShot, etc.) automatically search for the best prompts, few-shot examples, and even fine-tuning weights that maximize your metric

The result: your LLM calls get systematically better without you hand-tuning prompts.

Who's using it: JetBlue (customer feedback classification, predictive maintenance), Databricks (deep MLflow integration), Sephora, Replit, Walmart, VMware, and others in production.

When to use it: Your pipeline's quality matters more than its architecture: classification, extraction, RAG, or any task where you have evaluation data and want the framework to find optimal prompts. Best for teams comfortable with a research-oriented workflow: define metrics → compile → evaluate.

The honest critique

Skylar Payne's analysis frames the adoption gap better than the official docs do. The patterns DSPy enforces are not optional at scale:

Typed I/O. Every LLM call has a pydantic-shaped signature. No raw strings.
Prompts as first-class artifacts. Separated from application code, versioned, testable.
Composable modules. Mockable, chainable units instead of monolithic call sites.
Evaluation infrastructure from day one. Metrics measure improvement, not vibes.
Single-line model swap. GPT-4 → Claude → Gemini without refactoring.

Any production AI system rediscovers these patterns eventually, usually through outage. DSPy is the shortcut. The alternative is to steal the patterns without the framework: implement typed signatures, modular composition, and centralized prompt management by hand.

When to skip it: Quick prototypes without evaluation criteria. Simple one-off LLM calls where the optimization overhead isn't justified. Teams that won't invest in the conceptual shift. The learning curve is real, and DSPy asks you to think differently before you've felt the pain of thinking the same way everyone else does.

Detail	Value (as of March 2026)
GitHub	stanfordnlp/dspy
Stars	~33,250
License	MIT
Language	Python
Latest	v3.1.3

Note: DSPy is primarily an optimization framework, but it now includes agent capabilities too: dspy.ReAct for agent loops, MCP integration, and tutorials for building agents. The optimization layer remains its core differentiator.

Strands Agents: AWS's Model-Driven Approach

Strands Agents is AWS's open-source agent SDK built on a simple philosophy: give the model a prompt and tools, and let it figure out the rest. No workflow graphs, no state machines, no orchestration code. The LLM autonomously decides when to use tools, how to combine them, when to iterate, and when to stop.

This is what AWS calls the model-driven approach, in deliberate contrast to "workflow-driven" frameworks that require developers to write orchestration logic.

What it does:

Three components: model + system prompt + tools
The LLM handles planning, tool use, and reflection autonomously
Multi-agent orchestration (added in 1.0, July 2025)
Native Bedrock integration, plus support for Anthropic, OpenAI, Google, Ollama, LiteLLM, and community-contributed providers
A2A (Agent-to-Agent) protocol support — open cross-framework protocol for agent handoff; a Strands agent and an ADK agent can delegate tasks to each other if both speak it

Production pedigree: Strands powers Amazon Q Developer, AWS Glue, and Amazon VPC Reachability Analyzer internally. AWS reports that "what previously took months to develop now took weeks" for Q Developer teams after switching to Strands.

Strands Labs (February 2026) includes Strands Robots (agents wired to physical robotic hardware), a physics simulation environment, and AI Functions — an @ai_function decorator that generates code from natural-language specs at runtime with pre/post-condition validation.

When to use it: You want simplicity and trust frontier models to plan well. You're on AWS or want the lightest possible framework scaffolding. The production path through Bedrock and Lambda/EKS is well-worn.

When to skip it: You need deterministic, auditable pipelines where the workflow must be predictable and identical every time. Model-driven means the LLM decides the path, which can vary between runs.

Detail	Value (as of March 2026)
GitHub	strands-agents/sdk-python
Stars	~5,400
License	Apache-2.0
Languages	Python (GA), TypeScript (preview since Dec 2025)
PyPI Downloads	18.7M+ (last 180 days)
Latest	v1.33.0

Google ADK: Structured Agent Orchestration

Google Agent Development Kit (ADK) is Google's modular, code-first framework for building agent applications. While optimized for Gemini, it's model-agnostic. ADK is the same framework powering agents within Google products like Agentspace and Google Customer Engagement Suite.

The key differentiator from Strands: ADK gives you explicit structural control. You can define rigid deterministic workflows, let the LLM route dynamically, or mix both in the same hierarchy.

Built-in workflow agents:

SequentialAgent: run sub-agents in order
ParallelAgent: run sub-agents concurrently
LoopAgent: repeat until a condition is met
LlmAgent with transfer: dynamic LLM-driven routing between agents

These aren't LLM-powered. They execute deterministically, giving you predictable pipelines. Combine them with LLM-driven agents for the best of both worlds.

Other capabilities:

Bidirectional audio/video streaming (Gemini Live API)
Nearly 30 partner integrations (GitHub, Atlassian, Hugging Face, MongoDB, Stripe, PayPal, etc.) announced February 2026
A2A protocol support (cross-framework agent handoff, see Strands above)
Session rewind for debugging
Deployable to Cloud Run, Vertex AI Agent Engine, or locally via Docker

When to use it: You need predictable, auditable multi-agent pipelines alongside autonomous behavior. You're on GCP or using Gemini. You want explicit control over agent orchestration patterns.

When to skip it: You want minimal framework overhead and trust the model to self-organize (use Strands instead). You're locked into a non-Google cloud ecosystem with no need for cross-provider support.

Detail	Value (as of March 2026)
GitHub	google/adk-python
Stars	~18,650
License	Apache-2.0
Languages	Python (v1.0.0), TypeScript, Java, Go
Announced	April 2025 (Cloud NEXT)

Model support: ADK supports Gemini natively. For non-Google models, it offers opt-in connectors including LiteLLM, Ollama, vLLM, Apigee, and LiteRT-LM. Note that some built-in tools (like SearchTool) only work with Gemini models.

Head-to-Head: Strands vs ADK

This is the comparison most teams actually need to make. Both are open-source, code-first, and model-agnostic in theory but cloud-native in practice.

Dimension	Strands Agents	Google ADK
Philosophy	Model-driven: LLM decides workflow at runtime	Structured: deterministic workflows + LLM routing
Cloud affinity	AWS (Bedrock, Lambda, EKS)	GCP (Vertex AI, Cloud Run)
Orchestration	Minimal scaffolding, model self-organizes	Explicit workflow agents (Sequential, Parallel, Loop)
Languages	Python, TypeScript (preview)	Python, TypeScript, Java, Go
Streaming	Standard request/response	Bidirectional audio/video streaming
Production users	Amazon Q Developer, AWS Glue	Agentspace, Customer Engagement Suite
Maturity	GA since July 2025	Python v1.0.0, other languages pre-1.0
Predictability	Lower, model chooses the path	Higher, deterministic workflows available
Integrations	AWS services + community providers	30+ partner integrations + Google Cloud services

The tradeoff is control vs simplicity. Strands bets that frontier models are good enough to self-organize. You write less code but accept runtime variability. ADK bets that you'll want guardrails. You write more structure but get predictable, auditable pipelines.

If you're building an internal assistant where occasional variation is fine, Strands' minimalism wins. If you're building a customer-facing workflow where every step must be logged and reproducible, ADK's structured approach is safer. For a deeper look at how multi-agent delegation decisions shape reliability, see Intelligent AI Delegation: Why Multi-Agent Systems Need More Than Heuristics.

The Broader Landscape

Other notable options:

Agent Frameworks

Framework	Key Strength	Best For
LangGraph	Graph-based orchestration, checkpointing, time-travel debugging	Maximum control over agent flow
CrewAI	Role-based multi-agent, fast prototyping	Getting a multi-agent demo running quickly
OpenAI Agents SDK	Clean handoff model, built-in tracing, guardrails	OpenAI-optimized but actually provider-agnostic
Claude Agent SDK	Tool-use-first, safety built into architecture	Regulated industries, Claude-only deployments
Microsoft Agent Framework	Merged Semantic Kernel + AutoGen, .NET and Python	Azure/Microsoft ecosystem
PydanticAI	Type-safe, minimal magic, from the Pydantic team	Developers who want strong typing and no framework bloat
Smolagents	Minimalist, code-first, agents write actions in code	Hugging Face ecosystem, lightweight agent needs

LLM Gateways

Tool	Key Strength	Best For
OpenRouter	Hosted proxy, 500+ models, zero ops	Teams that don't want to self-host a gateway
Portkey	Caching, fallbacks, enterprise observability	Enterprise governance and compliance

Prompt Optimization

Tool	Key Strength	Best For
Instructor	Structured output extraction with retries	One thing done well: reliable structured outputs
AISUITE	Lightweight model switching, by Andrew Ng	Simple provider abstraction without the weight of LiteLLM

Data/Retrieval Layer

Tool	Key Strength	Best For
LlamaIndex	Best-in-class RAG pipeline tooling	Document-heavy applications, enterprise search
Haystack	Pipeline-oriented, enterprise search focus	Modular NLP/LLM pipelines with explicit control

How to Choose

A practical production stack often combines 2-3 of these:

Agent framework (Strands, ADK, LangGraph) for orchestration
Retrieval layer (LlamaIndex, Haystack) if you need RAG
Gateway (LiteLLM, OpenRouter) if you're multi-provider
Optimization (DSPy) for critical LLM calls that need tuned prompts

Pick the layer that hurts most right now. If you're paying three providers manually, gateway first. If your prompts drift in quality run-to-run, optimization first. If your agent code looks like a switch statement of LLM calls glued to a retry loop, framework first. The wrong move is picking all four because the diagram looks tidy — every layer adds dependencies, abstraction debt, and a new way for production to break at 3am.

The right framework is the one whose pain you'd otherwise reinvent badly.