A \ V — Chief Architect, AI/ML

01

Experience

2022 — Present

Enterprise

Full-time

Chief Architect, AI/ML Engineering

Leading enterprise AI/ML strategy at the intersection of technical architecture and organisational transformation. Designing agentic AI systems at production scale — orchestrating LLMs across multi-cloud and on-premises environments via Model Context Protocol. Responsible for AI platform governance, cost attribution, safety guardrails, and cloud-neutral architecture patterns adopted across 12+ business units. Serving as executive liaison between engineering, product, and C-suite on all AI/ML initiatives.

Claude APIMCP ServersAgentic AI GCP Vertex AIAzure OpenAIAWS Bedrock LangGraphRAG ArchitectureHuggingFace AI GovernanceFinOps for AI

2019 — 2022

Enterprise

Full-time

Principal AI/ML Architect

Architected platform-agnostic ML pipelines across GCP, AWS, and Azure. Led design of feature stores, model registries, and serving infrastructure supporting 20+ production ML models with combined annual inference load exceeding 2B predictions. Established enterprise MLOps practices — CI/CD for model lifecycle, drift detection, automated retraining pipelines, and A/B deployment patterns. Early adopter of HuggingFace Transformers and PEFT techniques for domain-specific fine-tuning of open-source LLMs on proprietary data.

MLOpsKubeflowVertex AI SageMakerHuggingFaceLoRA / PEFT TensorFlowPyTorchKubernetesSeldon

2015 — 2019

Enterprise

Full-time

Senior Solutions Architect, Data & Analytics

Designed and delivered large-scale data architecture solutions for Fortune 500 clients — cloud-native data lakes, real-time streaming pipelines processing 50M+ events/day, and self-serve analytics platforms. Built the data foundations (feature engineering pipelines, data contracts, semantic layers) that would later accelerate ML adoption across multiple organisations. Deep expertise in cost-optimised multi-cloud storage architectures and Lakehouse patterns.

Data ArchitectureApache KafkaSpark BigQuerySnowflakedbt Delta LakeAirflowTerraform

02

Model Ecosystem

Proprietary API

Anthropic Claude

claude-sonnet-4-6, opus-4, haiku-4-5. Deep expertise in tool use, MCP integration, agentic loops, and constitutional AI patterns.

Primary agentic & enterprise reasoning

Proprietary API

OpenAI GPT-4o / o1 / o3

Multimodal reasoning, function calling, Assistants API, and structured outputs. Used in multi-LLM routing architectures.

Multimodal · reasoning chains

Proprietary API

Google Gemini 2.0

Gemini Flash and Pro via Vertex AI. Long-context processing, code execution, grounding with Google Search.

Long-context · GCP-native workloads

Cloud-managed

AWS Bedrock Models

Titan, Nova, and third-party models (Claude, Llama, Mistral) via Bedrock. Guardrails, Knowledge Bases, and Agents APIs.

AWS-native enterprise deployments

Open Source

Meta Llama 3.x / 4

Llama 3.1 405B, 70B, 8B. Fine-tuned with LoRA/QLoRA on proprietary datasets via HuggingFace PEFT. vLLM serving.

Self-hosted · fine-tuning · cost control

Open Source

Mistral / Mixtral

Mistral 7B, Mixtral 8x7B MoE. Exceptional performance-per-dollar. Used for high-throughput classification and extraction tasks.

High-throughput · batch inference

Open Source / HuggingFace

Qwen 2.5 / DeepSeek

Alibaba Qwen2.5-72B and DeepSeek-V3 for multilingual and code-heavy workloads. Deployed via HuggingFace Inference Endpoints.

Multilingual · code generation

Private / Self-hosted

Domain Fine-tunes

Custom LoRA adapters on Llama / Mistral base models trained on enterprise-proprietary corpora. Deployed on Red Hat OpenShift AI with vLLM or Ollama.

Air-gapped · compliance · IP protection

Private / Self-hosted

Ollama On-Prem Fleet

Ollama-managed inference for developer tooling, internal chat, and agentic prototyping on-premises — zero egress, maximum control.

Developer tooling · on-prem inference

Embedding Models

HuggingFace Embeddings

BGE-M3, E5-large, nomic-embed-text. Used in RAG pipelines for multilingual dense retrieval and semantic search at scale.

RAG · semantic search · reranking

Cloud-managed

Azure OpenAI Service

GPT-4o and o1 via Azure OpenAI — RBAC, VNet integration, compliance boundaries. Primary for Microsoft ecosystem clients.

Azure-native · compliance-bound

Open Source

Phi-3 / Phi-4 (Microsoft)

Small language models for edge deployment and cost-sensitive inference. Impressive capability-per-parameter ratio for structured tasks.

Edge inference · constrained environments

03

Model Context Protocol — Cloud Integrations

⬡

Amazon Web Services

MCP Server Suite

Bedrock Knowledge BasesRAG

Bedrock AgentsAgentic

S3 + AthenaData

Lambda FunctionsCompute

DynamoDB / RDSStorage

CloudWatch LogsObservability

Production — 4 active integrations

G

Google Cloud Platform

MCP Server Suite

Vertex AI SearchRAG

BigQuery AnalyticsData

Cloud Run / FunctionsCompute

Firestore / SpannerStorage

Pub/Sub MessagingStreaming

Gemini via VertexLLM

Production — 6 active integrations

⬡

Microsoft Azure

MCP Server Suite

Azure OpenAI ServiceLLM

AI Search (Cognitive)RAG

Azure FunctionsCompute

Cosmos DBStorage

Service Bus / Event HubStreaming

Microsoft Graph APIEnterprise

Production — 5 active integrations

Enterprise SaaS

ServiceNow MCP

Incident management, ITSM automation, CMDB queries, and change workflows — all accessible via agentic tool calls.

ITSM · incidents · workflows

Enterprise SaaS

Salesforce MCP

CRM data access, opportunity management, contact queries, and workflow triggers through MCP tool interface.

CRM · sales · customer data

Enterprise SaaS

Slack MCP

Channel messaging, thread summarisation, notification routing, and agentic notification pipelines integrated with Claude.

Comms · notifications · summaries

Enterprise SaaS

Jira / Confluence MCP

Issue tracking, sprint data, documentation retrieval, and project intelligence surfaced through AI-native tool calls.

Project mgmt · docs · planning

04

OSS Agentic Frameworks

🦜

Graph-based Orchestration

LangGraph

Stateful, multi-actor agent graphs with full control over cycles, branching, and human-in-the-loop checkpoints. Primary choice for complex enterprise agentic workflows requiring auditability and deterministic control flow.

PythonStateful agentsHITLStreaming

🤖

Multi-agent Collaboration

AutoGen (Microsoft)

Conversational multi-agent patterns with AssistantAgent / UserProxyAgent. Used for automated code generation, debugging pipelines, and multi-model consensus workflows where agents debate before committing to an answer.

Multi-agentCode genGroup chat

⚙️

Role-based Agent Teams

CrewAI

Declarative crew definitions with specialised agents (Researcher, Analyst, Writer) coordinated by a manager agent. Excellent for structured document generation, competitive intelligence, and report automation pipelines.

Role-basedTask delegationSequential / Parallel

🔗

LLM Application Framework

LangChain

Foundational chains, document loaders, text splitters, and retriever abstractions. Used as the backbone for RAG pipelines and as LangGraph's underlying toolkit. LCEL for composable, streamable chain expression.

RAGChainsLCELRetrievers

🦙

Data-centric Orchestration

LlamaIndex

Advanced data ingestion, indexing strategies (summary, tree, keyword, vector), and sub-question decomposition for complex RAG. Primary for structured data agents and enterprise knowledge base construction over heterogeneous sources.

IndexingData agentsSub-question

🌊

Workflow Automation

Prefect / Apache Airflow

Production scheduling and orchestration for ML pipelines, batch inference jobs, and data processing workflows. Prefect for modern cloud-native flows; Airflow for legacy enterprise DAG management requiring tight compliance audit trails.

SchedulingDAGsML pipelines

🤗

Model Hub & Fine-tuning

HuggingFace Ecosystem

Transformers, PEFT (LoRA, QLoRA, AdaLoRA), TRL for RLHF/DPO, Accelerate for distributed training, and Inference Endpoints for managed hosting. Full fine-tuning pipeline from dataset curation to model evaluation and deployment.

LoRA/QLoRARLHF/DPOModel HubInference Endpoints

⚡

High-performance Inference

vLLM / Ollama / TGI

vLLM's PagedAttention for throughput-optimised serving of large open-source models in production. Ollama for developer-friendly local inference. TGI (HuggingFace Text Generation Inference) for containerised enterprise deployment with tensor parallelism.

vLLMOllamaTGITensor parallelism

05

Signature Architectures

Architecture 01

Enterprise MCP Orchestration Platform

Multi-server MCP architecture connecting Claude to ServiceNow, Salesforce, Jira, internal APIs, and data warehouses. Full agentic loop with tool approval gates, session replay, PII redaction middleware, and compliance audit logging. Supports 200+ concurrent agent sessions across 8 business units.

Claude APIMCP SDKServiceNow Node.jsRedisPostgreSQL

Architecture 02

Multi-Cloud LLM Router with Intelligent Fallback

Cost-aware, latency-sensitive routing layer dispatching requests across Claude, GPT-4o, Gemini Flash, and self-hosted Llama based on task classification, budget caps, compliance rules, and provider SLA. Includes semantic caching (60% cache hit rate), per-team cost attribution dashboards, and automatic fallback chains on provider outages.

PythonFastAPIAWS Lambda Redis Semantic CachePrometheus

Architecture 03

Hybrid RAG Intelligence Platform

Enterprise-grade retrieval augmented generation over 500K+ internal documents across SharePoint, Confluence, and S3. Hybrid search combining dense vector retrieval (BGE-M3) with BM25 sparse retrieval, cross-encoder re-ranking, and HyDE query expansion. Citation grounding with source provenance displayed per sentence.

PineconeBGE-M3LlamaIndex Vertex AIGKE

Architecture 04

Private LLM Platform on Red Hat OpenShift AI

Air-gapped enterprise LLM serving stack running domain fine-tuned Llama-3.1-70B models on-premises. vLLM backend with tensor parallelism across 8× A100s, OpenAI-compatible API surface, RBAC-gated model access, and full inference observability via Prometheus + Grafana. Zero data egress — complete IP and compliance control.

Red Hat OpenShift AIvLLMLoRA adapters NVIDIA A100Grafana

Architecture 05

LangGraph Multi-Agent Workflow Engine

Stateful graph-based agent orchestration for complex enterprise workflows — research → analysis → synthesis → review → approval. Each node is a specialised sub-agent (Claude, GPT-4o, or domain fine-tune) with typed state passing, deterministic branching, and human-in-the-loop checkpoints at critical decision nodes.

LangGraphClaude + GPT-4oFastAPI PostgreSQL checkpointerWebSockets

Architecture 06

AI Service Catalog & FinOps Platform

Organisation-wide AI governance layer tracking adoption, token spend, risk classification, and compliance status across all cloud providers, models, and internal systems. Real-time dashboards per team/project, budget alerting, model deprecation tracking, and automated cost anomaly detection.

ReactFastAPIPostgreSQL TerraformGrafanaOpenTelemetry

06

Open Source Contributions

mcp-cloud-bridge

Unified MCP server bridging AWS Bedrock, GCP Vertex AI, and Azure OpenAI behind a single tool interface for Claude agents.

⭐ Author TypeScriptMCP SDK

llm-router-oss

Open-source multi-LLM routing engine with semantic caching, fallback chains, cost attribution, and OpenAI-compatible API surface.

⭐ Author PythonFastAPI

HuggingFace / PEFT

Contributor — QLoRA training utilities and enterprise fine-tuning documentation for large-scale LoRA adapter management.

⚡ Contributor PythonPyTorch

openshift-ai-templates

Helm charts and Kustomize overlays for deploying vLLM, TGI, and KServe on Red Hat OpenShift AI with GPU operator configuration.

⭐ Author HelmKubernetes

07

Speaking & Publications

Conference Talk

Model Context Protocol in Production: Lessons from Enterprise-Scale Agentic Deployments

AI Engineering Summit · 2025

2025 MCPAgentic AIEnterprise

Technical Article

Designing Platform-Agnostic LLM Architectures: A Practitioner's Guide to Multi-Cloud AI

Published on Medium / Towards Data Science · 2024

2024 ArchitectureMulti-cloud

Workshop

Fine-tuning Llama with QLoRA on Enterprise Data: From Dataset Curation to vLLM Serving

HuggingFace Community Workshop · 2024

2024 HuggingFaceLoRAvLLM

Podcast Guest

The Future of Agentic AI in the Enterprise — Architecture, Governance, and the MCP Ecosystem

Practical AI Podcast · 2025

2025 PodcastAI Strategy

08

Architecture Philosophy

Principle 01

Platform-agnostic by design.

No vendor lock-in. Every architecture I design can migrate between Claude, GPT, Gemini, Llama, or any future model without a rewrite. Abstraction layers are not overhead — they are the product.

Principle 02

Agentic is the new default.

Static RAG is a stepping stone. The real value is autonomous agents that plan, act, verify, and iterate — with humans in the loop at the decisions that matter, not as a bottleneck on every step.

Principle 03

Governance is not optional.

Every AI system I deploy has audit trails, cost attribution, safety guardrails, and clear ownership. Production AI without governance isn't production AI — it's a liability.

Principle 04

Open source is a strategic asset.

HuggingFace, vLLM, LangGraph, and the OSS ecosystem are not fallbacks — they are first-class options. The ability to self-host, fine-tune, and own your models is a competitive moat.

Principle 05

Context is everything.

MCP exists because AI is only as good as its access to the right information at the right moment. Building excellent context pipelines — retrieval, memory, tools, state — is the core engineering challenge of the agentic era.

Principle 06

Ship, measure, iterate.

Perfect architecture is the enemy of deployed architecture. Build for observability from day one, instrument everything, and let production telemetry drive the next design decision — not theoretical elegance.