LiteLLM, DSPy, Strands, Google ADK: Choosing the Right AI Framework
Criado: 2026-05-05 | Tamanho: 20367 bytes
TL;DR
LiteLLM is infrastructure plumbing, a universal API gateway for 140+ LLM providers. DSPy is a prompt optimization compiler from Stanford that automatically tunes your LLM calls against metrics. Strands Agents is AWS's model-driven agent SDK where the LLM decides the workflow at runtime. Google ADK is Google's modular agent framework with explicit workflow control. They operate at different layers of the stack and often complement each other. Here's how to pick.
The Four Layers
Different layers, not direct competitors:
You might use all four in production: ADK or Strands for agent orchestration, DSPy modules for critical LLM calls that need optimized prompts, and LiteLLM to route those calls to whichever provider is cheapest or fastest.
LiteLLM: The Universal Translator
LiteLLM is a Python SDK and proxy server (AI Gateway) that normalizes 140+ LLM providers and 2,600+ models behind an OpenAI-compatible API. You write code using OpenAI's chat.completions format, and LiteLLM translates it to whatever Anthropic, Cohere, Google, or Bedrock expects.
What it does:
- Unified API across all major providers
- Load balancing, fallback routing, and provider arbitrage (route by latency or per-token cost)
- Per-key/team budgets and spend caps with alerting
- Cost tracking, guardrails, logging
- Proxy mode for centralized gateway deployment
When to use it: You're running multiple models across providers and need a single interface, cost visibility, and automatic failover. If you're locked into one provider, you don't need it.
When to skip it: Single-provider shops, or when you need the provider's native SDK features that LiteLLM doesn't translate.
| Detail | Value (as of March 2026) |
|---|---|
| GitHub | BerriAI/litellm |
| Stars | ~41,400 |
| License | MIT (Enterprise tier for SSO/RBAC) |
| Language | Python |
| Providers | 140+ providers, 2,600+ models |
Security Note
On March 24, 2026, versions 1.82.7 and 1.82.8 were compromised in a supply chain attack. Per LiteLLM's official advisory, the compromise originated from a Trivy dependency in their CI/CD security-scanning workflow, which led to two malicious uploads to PyPI. The backdoored packages exfiltrated environment variables, SSH keys, and cloud-provider credentials (AWS, GCP, Azure). The packages were live in the 10:39–16:00 UTC window — roughly 40 minutes of effective install time before quarantine, per the advisory. If you installed during that window, rotate all credentials immediately. Fixed in v1.83.0 (released from a hardened CI/CD v2 pipeline); versions ≤1.82.6 audited clean with published SHA-256 checksums.
DSPy: The Prompt Compiler
Any sufficiently complicated AI system contains an ad hoc, informally-specified, bug-ridden implementation of half of DSPy.
— adapted from Greenspun's Tenth Rule, via Skylar Payne.
DSPy (Declarative Self-improving Python) is a framework from Stanford NLP that replaces manual prompt engineering with programmatic optimization. The tagline is "programming, not prompting," and it's not marketing. You define what you want (input/output signatures), not how to get it (specific prompt text).
How it works:
- Define signatures: input/output specs like
"question -> answer"or a typed class with field descriptions - Select modules: strategies like
ChainOfThought,ReAct, orPredictthat determine how the LLM gets invoked - Write a metric: a function that scores output quality
- Compile: DSPy's optimizers (MIPROv2, BootstrapFewShot, etc.) automatically search for the best prompts, few-shot examples, and even fine-tuning weights that maximize your metric
The result: your LLM calls get systematically better without you hand-tuning prompts.
Who's using it: JetBlue (customer feedback classification, predictive maintenance), Databricks (deep MLflow integration), Sephora, Replit, Walmart, VMware, and others in production.
When to use it: Your pipeline's quality matters more than its architecture: classification, extraction, RAG, or any task where you have evaluation data and want the framework to find optimal prompts. Best for teams comfortable with a research-oriented workflow: define metrics → compile → evaluate.
The honest critique
Skylar Payne's analysis frames the adoption gap better than the official docs do. The patterns DSPy enforces are not optional at scale:
- Typed I/O. Every LLM call has a pydantic-shaped signature. No raw strings.
- Prompts as first-class artifacts. Separated from application code, versioned, testable.
- Composable modules. Mockable, chainable units instead of monolithic call sites.
- Evaluation infrastructure from day one. Metrics measure improvement, not vibes.
- Single-line model swap. GPT-4 → Claude → Gemini without refactoring.
Any production AI system rediscovers these patterns eventually, usually through outage. DSPy is the shortcut. The alternative is to steal the patterns without the framework: implement typed signatures, modular composition, and centralized prompt management by hand.
When to skip it: Quick prototypes without evaluation criteria. Simple one-off LLM calls where the optimization overhead isn't justified. Teams that won't invest in the conceptual shift. The learning curve is real, and DSPy asks you to think differently before you've felt the pain of thinking the same way everyone else does.
| Detail | Value (as of March 2026) |
|---|---|
| GitHub | stanfordnlp/dspy |
| Stars | ~33,250 |
| License | MIT |
| Language | Python |
| Latest | v3.1.3 |
Note: DSPy is primarily an optimization framework, but it now includes agent capabilities too: dspy.ReAct for agent loops, MCP integration, and tutorials for building agents. The optimization layer remains its core differentiator.
Strands Agents: AWS's Model-Driven Approach
Strands Agents is AWS's open-source agent SDK built on a simple philosophy: give the model a prompt and tools, and let it figure out the rest. No workflow graphs, no state machines, no orchestration code. The LLM autonomously decides when to use tools, how to combine them, when to iterate, and when to stop.
This is what AWS calls the model-driven approach, in deliberate contrast to "workflow-driven" frameworks that require developers to write orchestration logic.
What it does:
- Three components: model + system prompt + tools
- The LLM handles planning, tool use, and reflection autonomously
- Multi-agent orchestration (added in 1.0, July 2025)
- Native Bedrock integration, plus support for Anthropic, OpenAI, Google, Ollama, LiteLLM, and community-contributed providers
- A2A (Agent-to-Agent) protocol support — open cross-framework protocol for agent handoff; a Strands agent and an ADK agent can delegate tasks to each other if both speak it
Production pedigree: Strands powers Amazon Q Developer, AWS Glue, and Amazon VPC Reachability Analyzer internally. AWS reports that "what previously took months to develop now took weeks" for Q Developer teams after switching to Strands.
Strands Labs (February 2026) includes Strands Robots (agents wired to physical robotic hardware), a physics simulation environment, and AI Functions — an @ai_function decorator that generates code from natural-language specs at runtime with pre/post-condition validation.
When to use it: You want simplicity and trust frontier models to plan well. You're on AWS or want the lightest possible framework scaffolding. The production path through Bedrock and Lambda/EKS is well-worn.
When to skip it: You need deterministic, auditable pipelines where the workflow must be predictable and identical every time. Model-driven means the LLM decides the path, which can vary between runs.
| Detail | Value (as of March 2026) |
|---|---|
| GitHub | strands-agents/sdk-python |
| Stars | ~5,400 |
| License | Apache-2.0 |
| Languages | Python (GA), TypeScript (preview since Dec 2025) |
| PyPI Downloads | 18.7M+ (last 180 days) |
| Latest | v1.33.0 |
Google ADK: Structured Agent Orchestration
Google Agent Development Kit (ADK) is Google's modular, code-first framework for building agent applications. While optimized for Gemini, it's model-agnostic. ADK is the same framework powering agents within Google products like Agentspace and Google Customer Engagement Suite.
The key differentiator from Strands: ADK gives you explicit structural control. You can define rigid deterministic workflows, let the LLM route dynamically, or mix both in the same hierarchy.
Built-in workflow agents:
SequentialAgent: run sub-agents in orderParallelAgent: run sub-agents concurrentlyLoopAgent: repeat until a condition is metLlmAgentwith transfer: dynamic LLM-driven routing between agents
These aren't LLM-powered. They execute deterministically, giving you predictable pipelines. Combine them with LLM-driven agents for the best of both worlds.
Other capabilities:
- Bidirectional audio/video streaming (Gemini Live API)
- Nearly 30 partner integrations (GitHub, Atlassian, Hugging Face, MongoDB, Stripe, PayPal, etc.) announced February 2026
- A2A protocol support (cross-framework agent handoff, see Strands above)
- Session rewind for debugging
- Deployable to Cloud Run, Vertex AI Agent Engine, or locally via Docker
When to use it: You need predictable, auditable multi-agent pipelines alongside autonomous behavior. You're on GCP or using Gemini. You want explicit control over agent orchestration patterns.
When to skip it: You want minimal framework overhead and trust the model to self-organize (use Strands instead). You're locked into a non-Google cloud ecosystem with no need for cross-provider support.
| Detail | Value (as of March 2026) |
|---|---|
| GitHub | google/adk-python |
| Stars | ~18,650 |
| License | Apache-2.0 |
| Languages | Python (v1.0.0), TypeScript, Java, Go |
| Announced | April 2025 (Cloud NEXT) |
Model support: ADK supports Gemini natively. For non-Google models, it offers opt-in connectors including LiteLLM, Ollama, vLLM, Apigee, and LiteRT-LM. Note that some built-in tools (like SearchTool) only work with Gemini models.
Head-to-Head: Strands vs ADK
This is the comparison most teams actually need to make. Both are open-source, code-first, and model-agnostic in theory but cloud-native in practice.
| Dimension | Strands Agents | Google ADK |
|---|---|---|
| Philosophy | Model-driven: LLM decides workflow at runtime | Structured: deterministic workflows + LLM routing |
| Cloud affinity | AWS (Bedrock, Lambda, EKS) | GCP (Vertex AI, Cloud Run) |
| Orchestration | Minimal scaffolding, model self-organizes | Explicit workflow agents (Sequential, Parallel, Loop) |
| Languages | Python, TypeScript (preview) | Python, TypeScript, Java, Go |
| Streaming | Standard request/response | Bidirectional audio/video streaming |
| Production users | Amazon Q Developer, AWS Glue | Agentspace, Customer Engagement Suite |
| Maturity | GA since July 2025 | Python v1.0.0, other languages pre-1.0 |
| Predictability | Lower, model chooses the path | Higher, deterministic workflows available |
| Integrations | AWS services + community providers | 30+ partner integrations + Google Cloud services |
The tradeoff is control vs simplicity. Strands bets that frontier models are good enough to self-organize. You write less code but accept runtime variability. ADK bets that you'll want guardrails. You write more structure but get predictable, auditable pipelines.
If you're building an internal assistant where occasional variation is fine, Strands' minimalism wins. If you're building a customer-facing workflow where every step must be logged and reproducible, ADK's structured approach is safer. For a deeper look at how multi-agent delegation decisions shape reliability, see Intelligent AI Delegation: Why Multi-Agent Systems Need More Than Heuristics.
The Broader Landscape
Other notable options:
Agent Frameworks
| Framework | Key Strength | Best For |
|---|---|---|
| LangGraph | Graph-based orchestration, checkpointing, time-travel debugging | Maximum control over agent flow |
| CrewAI | Role-based multi-agent, fast prototyping | Getting a multi-agent demo running quickly |
| OpenAI Agents SDK | Clean handoff model, built-in tracing, guardrails | OpenAI-optimized but actually provider-agnostic |
| Claude Agent SDK | Tool-use-first, safety built into architecture | Regulated industries, Claude-only deployments |
| Microsoft Agent Framework | Merged Semantic Kernel + AutoGen, .NET and Python | Azure/Microsoft ecosystem |
| PydanticAI | Type-safe, minimal magic, from the Pydantic team | Developers who want strong typing and no framework bloat |
| Smolagents | Minimalist, code-first, agents write actions in code | Hugging Face ecosystem, lightweight agent needs |
LLM Gateways
| Tool | Key Strength | Best For |
|---|---|---|
| OpenRouter | Hosted proxy, 500+ models, zero ops | Teams that don't want to self-host a gateway |
| Portkey | Caching, fallbacks, enterprise observability | Enterprise governance and compliance |
Prompt Optimization
| Tool | Key Strength | Best For |
|---|---|---|
| Instructor | Structured output extraction with retries | One thing done well: reliable structured outputs |
| AISUITE | Lightweight model switching, by Andrew Ng | Simple provider abstraction without the weight of LiteLLM |
Data/Retrieval Layer
| Tool | Key Strength | Best For |
|---|---|---|
| LlamaIndex | Best-in-class RAG pipeline tooling | Document-heavy applications, enterprise search |
| Haystack | Pipeline-oriented, enterprise search focus | Modular NLP/LLM pipelines with explicit control |
How to Choose
A practical production stack often combines 2-3 of these:
- Agent framework (Strands, ADK, LangGraph) for orchestration
- Retrieval layer (LlamaIndex, Haystack) if you need RAG
- Gateway (LiteLLM, OpenRouter) if you're multi-provider
- Optimization (DSPy) for critical LLM calls that need tuned prompts
Pick the layer that hurts most right now. If you're paying three providers manually, gateway first. If your prompts drift in quality run-to-run, optimization first. If your agent code looks like a switch statement of LLM calls glued to a retry loop, framework first. The wrong move is picking all four because the diagram looks tidy — every layer adds dependencies, abstraction debt, and a new way for production to break at 3am.
The right framework is the one whose pain you'd otherwise reinvent badly.
References
- LiteLLM GitHub
- LiteLLM Security Update, March 2026
- Sonatype: Compromised LiteLLM PyPI Package Analysis
- DSPy GitHub
- DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines (ICLR 2024)
- If DSPy is So Great, Why Isn't Anyone Using It? — Skylar Payne
- Strands Agents GitHub
- Introducing Strands Agents
- Strands Agents and the Model-Driven Approach
- Introducing Strands Labs
- Google ADK GitHub
- Agent Development Kit: Easy-to-Build Multi-Agent Applications
- Supercharge Your AI Agents: ADK Integrations Ecosystem
- LangGraph GitHub
- OpenAI Agents SDK Documentation
- Claude Agent SDK Overview
- Microsoft Agent Framework
- PydanticAI Documentation
- Intelligent AI Delegation: Why Multi-Agent Systems Need More Than Heuristics — Daita blog
- Agent Skills: The Paradigm Shift Hiding in Plain Text — Daita blog