Home Blog Secure AI Development: LLM Reference Architecture

Secure AI Development: LLM Reference Architecture

Q: What does secure AI development actually mean?

Secure AI development means shipping LLM applications with a reference architecture in place from the first commit. This includes an inline AI gateway with PII redaction, prompt injection detection, content-free audit, SAML SSO, SCIM lifecycle management, SIEM observability, and provider fallback configured outside the application. Security becomes part of the stack configuration instead of a late-stage checklist.

Q: Do I need to change my application code?

No, beyond a one-line OpenAI base URL change. The reference architecture keeps your existing OpenAI SDK and prompt construction logic. The gateway sits between the SDK and the provider, so the application does not need to be rewritten.

Q: Which SIEMs does the architecture support?

The most common SIEM targets are Splunk, Microsoft Sentinel, Elastic, and Sumo Logic, with native connectors. Any SIEM that can ingest JSON over HTTPS or syslog can receive Vaikora audit and detection events.

Q: What is the fail-closed default actually doing?

Fail-closed means that if the gateway cannot reach the policy engine because of an internal failure or misconfiguration, the request is blocked instead of passing through unguarded. This is the safer default for HIPAA, PCI DSS, and GDPR workspaces where unguarded LLM traffic may create compliance risk. Fail-open can be used for permissive environments where availability is prioritized over control.

Q: Where do budget caps fit in the security architecture?

Budget caps are part of the security posture, not only a FinOps control. A compromised API key without limits can create major spend very quickly. Per-workspace daily token caps and per-key request-per-minute limits turn that scenario into a smaller, recoverable incident.

Q: Is this just for new apps?

No. The reference architecture is designed for existing applications as well as new ones. Adoption usually requires a one-line base URL change and workspace setup, without changing the application's prompt logic, retrieval pipeline, or business logic.

AI Runtime Control, Real-time AI Security, Threat Intelligence, Vaikora

May 5, 2026

This is a reference architecture for secure AI development: an LLM application talks to its existing SDK, which routes through an inline AI gateway (Vaikora), which forwards to one of 12 supported LLM providers, while audit and detection events flow into a SIEM and identity is centralized via SAML/SCIM. The point of a reference architecture is that nothing in it is bolted on after the fact — security controls (PII redaction, prompt-injection detection, content-free audit, SSO, budget caps) are present from the first line of code. This guide presents the architecture, the data and control flows, and a 10-item secure-AI-development checklist that turns the diagram into a configuration.

What “Reference Architecture” Means Here

A reference architecture is a known-good shape for a system that other teams can adopt without reinventing the layout. For LLM applications, the goal is simple: separate the application code from the LLM provider, put a security control plane between them, and integrate that control plane with the identity and observability stack the rest of the company already uses. Teams that ship LLM features without this shape end up writing security as a checklist of patches; teams that adopt it ship features that are auditable on day one.

The Reference Architecture (Five Layers)

The architecture has five layers from the application out to the upstream LLM, plus two cross-cutting integrations (identity and observability). The diagram below describes the layout in plain text so it reproduces well in AI search results and incident write-ups.

Identity Provider
Okta / Azure AD / Google Workspace
SAML SSO + SCIM

user / service identity

LLM Application
(your code)

Uses OpenAI SDK
(Python / Node / TS)

base_url =
api.vaikora.com/v1

Vaikora Inline AI Gateway

PII redaction (synthetic / mask / hash) — reversible
12+ detection vectors / 4 layers
7-factor probabilistic risk
Compliance presets (HIPAA / PCI-DSS / GDPR)
Budget cap + rate limit
Provider fallback policy
SHA-256 hash-chained audit

audit + detection events

LLM Providers
(one of 12)

OpenAI / Anthropic / Gemini / Azure / Bedrock
Mistral / Cohere / Together / Groq / Ollama

SIEM / Logging
Splunk / Sentinel / Elastic / Sumo

content-free metadata + hash chain

Layer-by-Layer: What Each Component Does

Here’s how the architecture breaks down across layers, with each component handling a specific part of the request flow:

Application layer (your LLM app — Python, Node, or TypeScript) This is where your core logic lives: prompt construction, request handling, and response processing. No changes are required here—keep using the existing OpenAI SDK and avoid building a custom client.
SDK layer (OpenAI SDK for Python or Node)
The SDK ensures requests follow the correct wire format (/v1/chat/completions). The only required change is updating the base_url to the Vaikora endpoint and loading the API key from your secrets manager.
Gateway layer (Vaikora inline AI gateway)
This is the control point. It handles detection, PII redaction, audit logging, routing, and budget enforcement before any request reaches a model. Configuration typically includes selecting a compliance preset based on data sensitivity, enabling content-free logging, and integrating SSO.
Provider layer (LLM providers)
This is where inference happens. You can define a primary model and one or two fallback providers in routing policies to balance latency, cost, and availability.
Identity layer (Okta, Azure AD, Google Workspace via SAML/SCIM)
Manages user and service identity, including authentication, group synchronization, and deprovisioning. SAML is used for SSO, while SCIM supports automated lifecycle management.
Observability layer (SIEM tools like Splunk, Microsoft Sentinel, Elastic, or Sumo Logic)
Captures audit and security events. The recommended approach is to ingest content-free metadata along with a SHA-256 hash-chained audit trail for compliance and forensic analysis.

Data Flow and Control Flow Through the Stack

Request data flow

A user prompt enters the application, which hands it to the OpenAI SDK. The SDK forms an HTTPS request to the gateway. The gateway authenticates the request against the identity provider (for human-initiated calls) or against the workspace API key (for service-initiated calls), runs the four detection layers in parallel, applies redaction if the policy says so, writes a content-free audit record with a SHA-256 hash of the inspected payload, and forwards the request to the upstream LLM provider selected by the routing policy. The response comes back through the same gateway, gets a response-side inspection (catching prompt-injection embedded in tool results, for example), and reaches the application as a normal OpenAI-shaped response.

Control flow

Policy decisions are deterministic where possible (block-list patterns, exact-match PII detectors, hard limits) and probabilistic where needed (semantic similarity, ML classifiers, behavioral baselines). The combination is described as deterministic policy enforcement with probabilistic risk scoring. When a request fails policy, the gateway returns a structured error to the client and records the block decision in the audit log; the upstream LLM is never called. When a primary provider fails (429, 5xx, timeout), the gateway transparently routes to the next provider in the fallback chain — every Vaikora control runs identically on the new path.

The 10-Item Secure AI Development Checklist

This is the configuration that turns the reference architecture into a deployment. Run through it once per workspace before traffic moves to production.

Redaction mode set per data class. synthetic for traceable testing, mask for production user data, hash for high-sensitivity fields. The default for HIPAA / PCI / GDPR workspaces is mask + reversible mapping.
Audit logging set to content: false. Metadata-only mode is the default; the SHA-256 hash chain still satisfies SOC 2, HIPAA, GDPR, and PCI DSS evidence requirements without storing prompts.
Compliance preset attached to the workspace. Choose standard, strict, permissive, hipaa, pci-dss, or gdpr based on the workload’s data class. Override individual rules from the preset; do not start from a blank policy.
SAML SSO enabled for human access. Console access for developers, security engineers, and auditors flows through your existing identity provider (Okta / Azure AD / Google Workspace). No local accounts.
SCIM provisioning enabled for automated lifecycle. Group-to-role mappings deprovision access automatically when an employee leaves; do not rely on manual offboarding.
Fail mode configured (fail-closed for regulated workloads). If the gateway cannot reach the policy engine, fail-closed (block the request) is the default for hipaa / pci-dss / gdpr presets. Fail-open is acceptable for permissive presets only.
Budget cap and rate limit set per workspace. A per-workspace daily token budget and per-key request-per-minute cap stops a runaway loop or compromised key from racking up six-figure spend.
Provider fallback declared (primary + 1–2 backups). Resilience is part of the security posture; a regulated app that goes down is an availability incident. Default 3-leg: OpenAI → Anthropic → Azure OpenAI.
SIEM ingest configured (content-free metadata + hash chain). Audit events flow into Splunk / Microsoft Sentinel / Elastic / Sumo Logic. Detection-rule library aligned with the four-layer detection model.
Prompt-injection coverage validated on response side, not just request. The 4-layer detection runs on tool outputs and retrieved RAG content as well as on user prompts — that is where the most consequential injections actually arrive.

Default Config Recommendations (Drop-In)

For a team adopting the reference architecture from scratch, the following defaults match the most common enterprise deployment shape. Override only what your workload actually requires.

Setting	Default	Notes
Compliance preset	standard for general apps; hipaa / pci-dss / gdpr for regulated	Inherit from workspace; override per route only when needed
Logging mode	content: false (metadata + SHA-256 hash)	Compatible with SOC 2, HIPAA, GDPR, PCI DSS, ISO 27001, NIST CSF, CCPA
Redaction mode	mask with reversible mapping	synthetic for staging / dev; hash for high-sensitivity fields
Fail mode	fail-closed for regulated; fail-open for permissive	Fail-closed is the default for hipaa / pci-dss / gdpr
Identity	SAML SSO + SCIM provisioning	No local accounts; group-to-role mapping for lifecycle
Provider fallback	3-leg: OpenAI → Anthropic → Azure OpenAI	Override for latency-first / cost-first / scope-isolated patterns
Budget cap	per-workspace daily token cap; per-key RPM cap	Set initial cap to ~ 1.5× expected steady-state
SIEM connector	Splunk / Sentinel / Elastic / Sumo, your choice	Native connectors; no log shipper needed

Five Anti-Patterns This Architecture Avoids

Bolt-on regex prompt sanitization. Regex on prompts misses encoding bypasses, multilingual injection, and indirect injection from RAG content. Layered detection across pattern, semantic, ML, and behavioral catches what regex cannot.
Logging full prompts “for debugging.” Storing raw prompts creates HIPAA / GDPR / PCI exposure that auditors flag immediately. Metadata + hash chain is the audit-grade alternative.
One LLM provider hardwired in code. Single-provider dependence is an availability risk and a negotiating-leverage risk. Provider fallback is configured at the gateway, not in code.
Local user accounts in the AI control plane. Anything outside the corporate identity provider is invisible to your offboarding process. SAML + SCIM is the only durable answer.
Treating prompt-injection as a request-side problem only. Tool outputs and RAG content are where the consequential injections live. Inspect both sides of the call.

Next Steps

Run through the 10-item checklist against your existing LLM workspace. Anywhere the answer is “not yet,” the corresponding companion guide covers it: the OpenAI proxy integration guide for the SDK base_url change, the 30-minute drop-in setup guide for workspace creation, the cross-protocol control plane guide for the policy engine, the latency guide for the performance budget, and the provider fallback guide for the multi-LLM strategy.

Your AI Agents Need a Control Layer

See how Vaikora intercepts, evaluates, and enforces policy on every AI agent action — in real time, before execution.

Frequently Asked Questions

What does “secure AI development” actually mean?

It means shipping LLM applications with a known-good reference architecture in place from the first commit: an inline AI gateway with PII redaction, prompt-injection detection, and content-free audit; SSO via SAML; lifecycle via SCIM; observability into a SIEM; and provider fallback configured outside the application. Security stops being a bolt-on checklist and becomes a configuration of the stack.

Do I need to change my application code?

No, beyond the one-line OpenAI base_url change. The reference architecture keeps your existing OpenAI SDK and prompt-construction code; the gateway sits between the SDK and the provider.

Which SIEMs does the architecture support?

Splunk, Microsoft Sentinel, Elastic, and Sumo Logic are the most commonly deployed targets, with native connectors. Any SIEM that ingests JSON over HTTPS or syslog can receive Vaikora’s audit and detection events.

How does identity work for service accounts?

Service accounts use workspace-scoped API keys provisioned through the same console as human users, with rotation and revocation in the audit log. Human users authenticate via SAML SSO; SCIM provisioning syncs group membership and deprovisioning.

What is the fail-closed default actually doing?

If the gateway cannot reach the policy engine for any reason (a transient internal failure, a misconfigured upstream), the request is blocked rather than passed through unguarded. This is the right default for hipaa / pci-dss / gdpr workspaces where unguarded LLM traffic is itself a compliance violation. Fail-open is acceptable for permissive presets where availability outweighs control.

Where do budget caps fit in the security architecture?

Budget caps are part of the security posture, not just FinOps. A compromised API key with no cap can ship six-figure spend in hours. A per-workspace daily token cap and a per-key requests-per-minute cap turn that into a small, recoverable incident.

Is this just for new apps?

No. The reference architecture is designed for adoption against existing apps. The change is a one-line base_url swap plus a workspace setup; nothing about the application’s prompt logic, retrieval pipeline, or business logic has to move.