We needed to select a primary LLM for our agentic healthcare AI platform handling prior authorization, claims adjudication, and denial management. The system requires high reliability, strong instruction-following, and safe behavior in high-stakes clinical and financial contexts. Multiple providers were evaluated over 6 weeks of production-like testing.
Adopt Claude Sonnet as primary reasoning engine with Claude Haiku for high-volume, low-complexity classification tasks. OpenAI maintained as fallback for specific use cases where GPT-4o demonstrates measurable advantage.
The RCM agent system requires coordinating 6+ specialist agents (clinical, financial, compliance, retrieval, validation, escalation) with explicit state management, conditional routing, and cycle support. Decision: build a custom orchestration layer vs. adopt an existing framework.
Adopt LangGraph as the orchestration layer. The graph model makes agent state transitions explicit and inspectable — a production requirement, not a nice-to-have. Use StateGraph with typed state, conditional edges for routing, and MemorySaver for persistence across multi-step workflows.
Our agentic platform requires integrating with 15+ external systems: EHR, clearinghouses, payer APIs, internal databases, document stores. Each integration previously required bespoke tool definitions and maintenance. We needed a standardized protocol for tool discovery, invocation, and context management.
Adopt MCP as the standard integration protocol for all tool use. Each external system gets an MCP server. The agent layer discovers tools dynamically at runtime via MCP. New integrations become MCP server implementations — not bespoke tool definitions.
Healthcare clients have varying cloud preferences — some mandate AWS, others GCP or Azure. We needed a primary deployment platform while maintaining the ability to serve clients across clouds without forking the codebase.
GCP as primary development and deployment platform. All infrastructure defined in Terraform with cloud-agnostic abstractions. Services containerized via GKE, deployable to EKS or AKS. Cloud-specific services (BigQuery, Vertex AI) wrapped behind abstraction interfaces with AWS/Azure equivalents.
Our system needs to reason over rapidly changing payer policies, clinical guidelines, and billing codes — data that changes monthly. We evaluated whether to encode this knowledge via fine-tuning or retrieve it dynamically at inference time.
Adopt RAG as primary knowledge grounding strategy. Hybrid chunking (semantic + structural for policy documents), dense retrieval via text-embedding-3-large, re-ranking with cross-encoder. Fine-tuning deferred to Phase 2 for style/format adaptation only — not factual grounding.
On ADRs
Architecture Decision Records are the most underused tool in software engineering. They cost 30 minutes to write and save weeks of re-litigating decisions. The format matters less than the habit: capture the context, the options, the choice, and the consequences. Future you will thank present you.