A \ V · Architecture Decision Records

Decisions

The real trade-off reasoning behind architectural choices. Not the polished post-hoc narrative — the actual context, options considered, and why we chose what we chose.

● Accepted ● Deprecated ● Proposed ● Superseded
ADR-001
Adopt Anthropic Claude as Primary Reasoning Engine
Accepted LLM Selection Agentic AI

We needed to select a primary LLM for our agentic healthcare AI platform handling prior authorization, claims adjudication, and denial management. The system requires high reliability, strong instruction-following, and safe behavior in high-stakes clinical and financial contexts. Multiple providers were evaluated over 6 weeks of production-like testing.

✓ Anthropic Claude

Strongest instruction-following, lowest hallucination rate on structured output tasks, Constitutional AI alignment, best tool-use reliability in complex chains.

OpenAI GPT-4o

Strong general capability, large ecosystem. Higher variance on structured outputs, less predictable refusals in edge cases.

Google Gemini

Strong multimodal, native GCP integration. Tool use less mature at evaluation time, context handling inconsistent.

Adopt Claude Sonnet as primary reasoning engine with Claude Haiku for high-volume, low-complexity classification tasks. OpenAI maintained as fallback for specific use cases where GPT-4o demonstrates measurable advantage.

Anthropic API dependency — mitigated by abstraction layer enabling model swap without application changes
Strong alignment properties reduce risk of harmful outputs in clinical context
MCP ecosystem tightly integrated with Claude — native advantage for tool-use patterns
Cost premium vs open-source alternatives — justified by reliability requirements in healthcare context
ADR-002
Use LangGraph for Multi-Agent Orchestration Over Custom Framework
Accepted Orchestration Multi-Agent

The RCM agent system requires coordinating 6+ specialist agents (clinical, financial, compliance, retrieval, validation, escalation) with explicit state management, conditional routing, and cycle support. Decision: build a custom orchestration layer vs. adopt an existing framework.

✓ LangGraph

Graph-based state machine. Explicit state transitions, cycle support, built-in persistence, human-in-the-loop checkpoints. Debuggable by design.

Custom Framework

Maximum control. High build cost, ongoing maintenance burden, no community, reinvents solved problems.

CrewAI / AutoGen

Higher-level abstractions. Less control over state, harder to debug, role-based model doesn't map cleanly to our domain.

Adopt LangGraph as the orchestration layer. The graph model makes agent state transitions explicit and inspectable — a production requirement, not a nice-to-have. Use StateGraph with typed state, conditional edges for routing, and MemorySaver for persistence across multi-step workflows.

State transitions are explicit and auditable — critical for compliance in healthcare
Human-in-the-loop checkpoints are first-class — enables escalation patterns
LangChain/LangGraph versioning can be unstable — pin versions strictly
Learning curve for team members unfamiliar with graph-based state machines
ADR-003
Adopt MCP as Standard Tool Integration Protocol
Accepted MCP Tool Use Integration

Our agentic platform requires integrating with 15+ external systems: EHR, clearinghouses, payer APIs, internal databases, document stores. Each integration previously required bespoke tool definitions and maintenance. We needed a standardized protocol for tool discovery, invocation, and context management.

✓ MCP

Open standard by Anthropic. Standardized tool schemas, server-client architecture, composable. Growing ecosystem. Native Claude integration.

Custom Tool Registry

Full control. High maintenance cost — every new tool requires custom schema, validation, and error handling from scratch.

OpenAI Function Calling

Mature, well-documented. Vendor-locked to OpenAI, doesn't compose across systems cleanly.

Adopt MCP as the standard integration protocol for all tool use. Each external system gets an MCP server. The agent layer discovers tools dynamically at runtime via MCP. New integrations become MCP server implementations — not bespoke tool definitions.

New system integrations reduce to implementing a standard MCP server — dramatically lower per-integration cost
Tool schemas are self-describing — agents can reason about capabilities without hardcoded knowledge
Protocol is young — some rough edges in auth patterns and error semantics
ServiceNow MCP authentication patterns required custom implementation — known gap in current ecosystem
ADR-004
GCP as Primary Cloud, Multi-Cloud via Abstraction Layer
Accepted Infrastructure Multi-Cloud

Healthcare clients have varying cloud preferences — some mandate AWS, others GCP or Azure. We needed a primary deployment platform while maintaining the ability to serve clients across clouds without forking the codebase.

✓ GCP Primary + Abstraction

GCP for primary development. Cloud-agnostic abstraction layer (Terraform + containerized services) enables cross-cloud deployment without codebase changes.

AWS Only

Largest ecosystem. Would exclude GCP-mandated clients. Vendor lock-in risk.

Cloud-Agnostic Only

Maximum portability. Highest complexity, lowest velocity, misses cloud-native advantages.

GCP as primary development and deployment platform. All infrastructure defined in Terraform with cloud-agnostic abstractions. Services containerized via GKE, deployable to EKS or AKS. Cloud-specific services (BigQuery, Vertex AI) wrapped behind abstraction interfaces with AWS/Azure equivalents.

Client cloud mandates accommodated without forking — strategic sales advantage
Vertex AI + Anthropic API gives best-in-class AI infrastructure on primary platform
Abstraction layer adds complexity and requires disciplined enforcement to prevent cloud-specific leakage
ADR-005
RAG Over Fine-tuning for Domain Knowledge Grounding
Accepted RAG Knowledge Grounding

Our system needs to reason over rapidly changing payer policies, clinical guidelines, and billing codes — data that changes monthly. We evaluated whether to encode this knowledge via fine-tuning or retrieve it dynamically at inference time.

✓ RAG

Dynamic retrieval from vector store. Knowledge updates without retraining. Grounded outputs with source citations. Scales to millions of policy documents.

Fine-tuning

Knowledge encoded in weights. Cannot be updated without retraining. Expensive and slow for monthly policy changes. No citation capability.

RAG + Fine-tuning

Best of both but highest complexity. Evaluated for Phase 2 when base model behavior needs domain adaptation.

Adopt RAG as primary knowledge grounding strategy. Hybrid chunking (semantic + structural for policy documents), dense retrieval via text-embedding-3-large, re-ranking with cross-encoder. Fine-tuning deferred to Phase 2 for style/format adaptation only — not factual grounding.

Policy updates reflected in system within hours of indexing — not weeks of retraining
Every output traceable to source documents — critical for audit and compliance
Retrieval quality is the bottleneck — chunking strategy and embedding model selection require ongoing optimization
Latency overhead of retrieval step — mitigated by async pre-fetching and caching for common query patterns

On ADRs

Architecture Decision Records are the most underused tool in software engineering. They cost 30 minutes to write and save weeks of re-litigating decisions. The format matters less than the habit: capture the context, the options, the choice, and the consequences. Future you will thank present you.