A \ V — Decisions

ADR-001

Adopt Anthropic Claude as Primary Reasoning Engine

Accepted LLM Selection Agentic AI

▶

Context

We needed to select a primary LLM for our agentic healthcare AI platform handling prior authorization, claims adjudication, and denial management. The system requires high reliability, strong instruction-following, and safe behavior in high-stakes clinical and financial contexts. Multiple providers were evaluated over 6 weeks of production-like testing.

Options Considered

✓ Anthropic Claude

Strongest instruction-following, lowest hallucination rate on structured output tasks, Constitutional AI alignment, best tool-use reliability in complex chains.

OpenAI GPT-4o

Strong general capability, large ecosystem. Higher variance on structured outputs, less predictable refusals in edge cases.

Google Gemini

Strong multimodal, native GCP integration. Tool use less mature at evaluation time, context handling inconsistent.

Decision

Adopt Claude Sonnet as primary reasoning engine with Claude Haiku for high-volume, low-complexity classification tasks. OpenAI maintained as fallback for specific use cases where GPT-4o demonstrates measurable advantage.

Consequences

Anthropic API dependency — mitigated by abstraction layer enabling model swap without application changes

Strong alignment properties reduce risk of harmful outputs in clinical context

MCP ecosystem tightly integrated with Claude — native advantage for tool-use patterns

Cost premium vs open-source alternatives — justified by reliability requirements in healthcare context

ADR-002

Use LangGraph for Multi-Agent Orchestration Over Custom Framework

Accepted Orchestration Multi-Agent

▶

Context

The RCM agent system requires coordinating 6+ specialist agents (clinical, financial, compliance, retrieval, validation, escalation) with explicit state management, conditional routing, and cycle support. Decision: build a custom orchestration layer vs. adopt an existing framework.

Options Considered

✓ LangGraph

Graph-based state machine. Explicit state transitions, cycle support, built-in persistence, human-in-the-loop checkpoints. Debuggable by design.

Custom Framework

Maximum control. High build cost, ongoing maintenance burden, no community, reinvents solved problems.

CrewAI / AutoGen

Higher-level abstractions. Less control over state, harder to debug, role-based model doesn't map cleanly to our domain.

Decision

Adopt LangGraph as the orchestration layer. The graph model makes agent state transitions explicit and inspectable — a production requirement, not a nice-to-have. Use StateGraph with typed state, conditional edges for routing, and MemorySaver for persistence across multi-step workflows.

Consequences

State transitions are explicit and auditable — critical for compliance in healthcare

Human-in-the-loop checkpoints are first-class — enables escalation patterns

LangChain/LangGraph versioning can be unstable — pin versions strictly

Learning curve for team members unfamiliar with graph-based state machines

ADR-003

Adopt MCP as Standard Tool Integration Protocol

Accepted MCP Tool Use Integration

▶

Context

Our agentic platform requires integrating with 15+ external systems: EHR, clearinghouses, payer APIs, internal databases, document stores. Each integration previously required bespoke tool definitions and maintenance. We needed a standardized protocol for tool discovery, invocation, and context management.

Options Considered

✓ MCP

Open standard by Anthropic. Standardized tool schemas, server-client architecture, composable. Growing ecosystem. Native Claude integration.

Custom Tool Registry

Full control. High maintenance cost — every new tool requires custom schema, validation, and error handling from scratch.

OpenAI Function Calling

Mature, well-documented. Vendor-locked to OpenAI, doesn't compose across systems cleanly.

Decision

Adopt MCP as the standard integration protocol for all tool use. Each external system gets an MCP server. The agent layer discovers tools dynamically at runtime via MCP. New integrations become MCP server implementations — not bespoke tool definitions.

Consequences

New system integrations reduce to implementing a standard MCP server — dramatically lower per-integration cost

Tool schemas are self-describing — agents can reason about capabilities without hardcoded knowledge

Protocol is young — some rough edges in auth patterns and error semantics

ServiceNow MCP authentication patterns required custom implementation — known gap in current ecosystem

ADR-004

GCP as Primary Cloud, Multi-Cloud via Abstraction Layer

Accepted Infrastructure Multi-Cloud

▶

Context

Healthcare clients have varying cloud preferences — some mandate AWS, others GCP or Azure. We needed a primary deployment platform while maintaining the ability to serve clients across clouds without forking the codebase.

Options Considered

✓ GCP Primary + Abstraction

GCP for primary development. Cloud-agnostic abstraction layer (Terraform + containerized services) enables cross-cloud deployment without codebase changes.

AWS Only

Largest ecosystem. Would exclude GCP-mandated clients. Vendor lock-in risk.

Cloud-Agnostic Only

Maximum portability. Highest complexity, lowest velocity, misses cloud-native advantages.

Decision

GCP as primary development and deployment platform. All infrastructure defined in Terraform with cloud-agnostic abstractions. Services containerized via GKE, deployable to EKS or AKS. Cloud-specific services (BigQuery, Vertex AI) wrapped behind abstraction interfaces with AWS/Azure equivalents.

Consequences

Client cloud mandates accommodated without forking — strategic sales advantage

Vertex AI + Anthropic API gives best-in-class AI infrastructure on primary platform

Abstraction layer adds complexity and requires disciplined enforcement to prevent cloud-specific leakage

ADR-005

RAG Over Fine-tuning for Domain Knowledge Grounding

Accepted RAG Knowledge Grounding

▶

Context

Our system needs to reason over rapidly changing payer policies, clinical guidelines, and billing codes — data that changes monthly. We evaluated whether to encode this knowledge via fine-tuning or retrieve it dynamically at inference time.

Options Considered

✓ RAG

Dynamic retrieval from vector store. Knowledge updates without retraining. Grounded outputs with source citations. Scales to millions of policy documents.

Fine-tuning

Knowledge encoded in weights. Cannot be updated without retraining. Expensive and slow for monthly policy changes. No citation capability.

RAG + Fine-tuning

Best of both but highest complexity. Evaluated for Phase 2 when base model behavior needs domain adaptation.

Decision

Adopt RAG as primary knowledge grounding strategy. Hybrid chunking (semantic + structural for policy documents), dense retrieval via text-embedding-3-large, re-ranking with cross-encoder. Fine-tuning deferred to Phase 2 for style/format adaptation only — not factual grounding.

Consequences

Policy updates reflected in system within hours of indexing — not weeks of retraining

Every output traceable to source documents — critical for audit and compliance

Retrieval quality is the bottleneck — chunking strategy and embedding model selection require ongoing optimization

Latency overhead of retrieval step — mitigated by async pre-fetching and caching for common query patterns