AI Across the Full SDLC:
A Practitioner's Map

This is not a framework. It is a reference. Eight structural forces — covered in depth across Articles 07 through 10 — mapped against all nine phases of the SDLC, with an honest annotation for each intersection: what the impact level is, what specifically happens, and which tools are relevant. Keep it open in another tab.

// TL;DR — how to use this map
  • Eight forces × nine SDLC phases = 72 annotated intersections: impact level, what happens, and the relevant tools at each.
  • Use the sticky phase navigation to jump; each force pair links to its deep-dive article (07–10).
  • Leverage concentrates: DDD peaks at design, AI eval at test, AI-aware SRE at monitor. Invest first where your weakest phase meets a high-impact force.

Not every force matters equally at every phase. The leverage of Domain-Driven Design is highest at design and changes — not at monitoring. The leverage of AI-Aware SRE is almost entirely at monitor — with modest contributions at design and deploy. Understanding where each force creates its highest value tells you where to invest first.

overview matrix
Force ↓ / Phase →
Analyze

Design

Develop

Test

Build

Deploy

Monitor

Deliver

Change
F01 · 👥 Stakeholders
F02 · 🏛️ DDD
F03 · ⚡ Full-Stack
F04 · 🗄️ Polyglot Data
F05 · 🔄 Five-Track CICD
F06 · 🔗 Middleware
F07 · 🧪 Test Suite
F08 · 🔭 AI-Aware SRE
IMPACT:  Low  Medium  High
phase by phase
Phase 01 / 09
Analyze
Requirements · Stakeholders · Feasibility
F01 · 👥 Three-Stakeholder●●● High

The analysis phase now produces three artifacts from one session: user stories (UX), structured agent specs (implementation), and compliance checklists (reviewers). Most teams produce one. AI can generate all three — but only if the process is designed for it.

ClaudeLlamaIndexLinear AI
F02 · 🏛️ DDD & Design●●○ Medium

Domain discovery during analysis is the foundation for AI agent delegation. Bounded context mapping here prevents the Big Ball of Mud from being AI-generated at scale in development.

EventStormingDomain Storytelling
F03 · ⚡ Full-Stack●●○ Medium

Analysis must cover all six tracks: FE, BE, Middleware, Data, AI/ML, Platform. Requirements scoped to one track create integration surprises in development.

ClaudeMiroLinear
F04 · 🗄️ Polyglot Data●●○ Medium

Data requirements analysis must identify which data is transactional (relational), document-oriented (NoSQL), semantic (vector), or real-time (streaming). Getting this wrong here means the wrong DB in design.

Data requirements workshopEventStorming
F05 · 🔄 Five-Track CICD●○○ Low

Identifying which artifact types a feature touches is underrated. "Add semantic search to mobile" is a five-artifact feature. Many teams discover this in build, not analysis.

LinearMiro
F06 · 🔗 Middleware●○○ Low

Cache strategy and queue topology should be identified in analysis — which operations are expensive enough to cache? — but rarely are. The most consistently skipped analysis artifact.

Cache requirements workshop
F07 · 🧪 Test Suite●○○ Low

Test strategy as a requirement artifact: defining which of the four test layers applies to each requirement, and what AI eval acceptance criteria look like, is an analysis task few teams do.

Test strategy doc
F08 · 🔭 AI-Aware SRE●○○ Low

Reliability requirements — uptime targets, hallucination rate SLOs, agent audit requirements — are requirements, not engineering footnotes. Rarely captured in analysis, consistently regretted in production.

SLO design workshop
Phase 02 / 09
Design
Architecture · DDD · UI/UX · Data Design
F01 · 👥 Three-Stakeholder●●○ Medium

Use cases must be written for users AND agents. Agents need unambiguous boundary conditions. Reviewers need traceable design decisions. Design docs that serve all three are rare and valuable.

EventStormingConfluence AI
F02 · 🏛️ DDD & Design●●● High

DDD design artifacts — context maps, aggregate definitions, domain event catalogs, ubiquitous language glossaries — become the prompt context for every AI agent in development. The quality of design is the ceiling of AI output quality.

MiroAsyncAPIFigma AI
F03 · ⚡ Full-Stack●●● High

Six-track design produces six artifact sets: component library, API contracts, middleware topology, data models, embedding strategy, and pipeline design. Each artifact enables parallel AI agent execution in development.

FigmaOpenAPIAsyncAPIdbt
F04 · 🗄️ Polyglot Data●●● High

The polyglot data architecture decision: what data lives in PostgreSQL vs MongoDB vs Redis vs vector store vs streaming — and how it flows between them. This is now as important as API design. Getting it right enables every AI feature downstream.

pgvectorKafkadbt
F05 · 🔄 Five-Track CICD●●○ Medium

CICD pipeline design is a design-phase artifact alongside API design. Which tracks exist? What are their build dependencies? How does a cross-artifact feature coordinate release? Decided in design, not discovered in production.

GitHub Actions designTurborepo config
F06 · 🔗 Middleware●●○ Medium

Cache strategy design: what data is hot (Redis), what LLM responses are cacheable semantically (GPTCache), what operations are async (BullMQ), what workflows need durable execution (Temporal). Designed in phase 2 saves weeks in phase 3.

RedisBullMQTemporal
F07 · 🧪 Test Suite●●○ Medium

Test architecture design: what golden datasets exist for AI eval? What performance SLOs are required? What contract tests define bounded context boundaries? Designed upfront, not retrofitted after the first production incident.

RagasGreat ExpectationsPact
F08 · 🔭 AI-Aware SRE●●○ Medium

Observability as a design artifact: what traces, metrics, and logs does each component emit? What are the AI SLOs (hallucination rate, retrieval precision, cost-per-workflow)? What does the runbook for a qualitative AI failure look like?

OpenTelemetryLangFuseArize
Phase 03 / 09
Develop
All six tracks: FE · BE · Middleware · Data · AI/ML · Platform
F01 · 👥 Three-Stakeholder●●○ Medium

Developers implement for user needs, within agent-generated code that must respect domain boundaries, while leaving audit trails that reviewers can trace to original requirements.

Claude CodeNotion AI
F02 · 🏛️ DDD & Design●●● High

Each bounded context is implemented by an AI agent given the domain model as explicit context. Ubiquitous language prevents naming drift. Domain invariants must be made explicit or AI will violate them. In teams I have worked with, clear DDD models cut AI code review by half or more.

Claude CodeDDD tactical patternsOpenAPI
F03 · ⚡ Full-Stack●●● High

Six tracks run in parallel with AI agents on isolated git worktrees. Senior engineers own the interface contracts between tracks. Code review is primarily contract and domain invariant review, not syntax review.

Claude CodeCursorgit worktreesdbt
F04 · 🗄️ Polyglot Data●●● High

Data engineering development across all four stores: schema design, CDC configuration, embedding pipeline, streaming topology, transformation models. AI agents write dbt models and pipeline code but require explicit data contracts to operate within.

dbtLlamaIndexKafka Streams
F05 · 🔄 Five-Track CICD●●○ Medium

Pipeline-as-code developed alongside application code. Engineers write GitHub Actions workflows, Fastlane configs, and dbt pipeline definitions as first-class development artifacts, not DevOps afterthoughts.

GitHub ActionsFastlanedbt
F06 · 🔗 Middleware●●● High

Semantic cache implementation: cache LLM responses by embedding similarity, not exact string match. AI task queue with context-aware retry — failed AI tasks carry the failure context into the next attempt's prompt. The highest-ROI layer for AI cost reduction.

Redis + GPTCacheBullMQTemporal
F07 · 🧪 Test Suite●●○ Medium

Test code developed alongside feature code: unit, integration, contract, AI eval, and data quality tests written as part of the development definition of done — not as a separate QA phase.

VitestPlaywrightRagas
F08 · 🔭 AI-Aware SRE●○○ Low

Instrumentation code — LLM call traces, agent action logs, PII redaction middleware — developed alongside feature code. Platform engineering abstractions reduce the burden on product engineers.

OTel SDKsLangFuse SDKGuardrails AI
Phase 04 / 09
Test
Functional · Non-Functional · AI Eval · Data Quality
F01 · 👥 Three-Stakeholder●●○ Medium

Test cases must validate behavior for human users, correctness for agent consumers, and traceability for compliance reviewers. Three distinct test artifact types from the same test suite.

LangSmith EvalsTestRail
F02 · 🏛️ DDD & Design●●○ Medium

Domain invariant testing: does this operation violate an aggregate rule? Domain event tests validate that the right events fire for state changes. Contract tests validate bounded context boundaries.

PactDomain event tests
F03 · ⚡ Full-Stack●●○ Medium

Each track has its own test primitives: component tests (FE), contract tests (BE), throughput tests (Middleware), data quality tests (Data), eval tests (AI/ML), chaos tests (Platform). All integrated into one CI pipeline.

Playwrightk6Ragas
F04 · 🗄️ Polyglot Data●●○ Medium

Data quality tests validate each store: relational integrity, document schema validation, vector index freshness tests, streaming latency SLOs. Failures here cause silent AI quality degradation downstream.

Great Expectationsdbt testsRagas
F05 · 🔄 Five-Track CICD●●● High

All four test layers integrated into CICD as mandatory gates. Not three separate tools managed by three teams — one unified pipeline with functional, non-functional, AI eval, and data quality running on every PR.

k6Ragasdbt testsSnyk
F06 · 🔗 Middleware●●○ Medium

Cache hit rate under load, semantic cache precision, queue throughput under AI task spike, Temporal workflow reliability under failure — middleware has unique non-functional test requirements.

Redis test containersBullMQ test utilsk6
F07 · 🧪 Test Suite●●● High

The core phase. Four layers: Functional (unit, integration, E2E, contract), Non-Functional (load, security, accessibility, chaos), AI Eval (faithfulness, relevance, hallucination rate), Data Quality (schema, freshness, distribution). All mandatory gates.

Vitest/JestPlaywrightDeepEvalGreat Expectations
F08 · 🔭 AI-Aware SRE●●○ Medium

Chaos engineering tests: what happens when the LLM API is slow? When the vector index is stale? When an agent loops? SRE reliability requires testing failure modes invisible in happy-path tests.

Chaos Monkeyk6LangSmith Evals
Phase 05 / 09
Build
Five artifact types coordinated in CI
F01 · 👥 Three-Stakeholder●○○ Low

Build artifacts should include reviewer-facing changelogs, agent-facing API specs, and user-facing release notes — generated automatically from the same commit.

GitHub Actions
F02 · 🏛️ DDD & Design●●○ Medium

Each bounded context is its own independently deployable artifact. The domain model is the build boundary. DDD-structured codebases have cleaner build dependency graphs for AI to reason about.

DockerTurborepo
F03 · ⚡ Full-Stack●●● High

Five artifact types in one pipeline: Web (Vite), Mobile (Xcode/Gradle via Fastlane), Backend (Docker/Helm), Data Pipeline (dbt Cloud), AI/ML (MLflow). Coordinated release with cross-artifact dependency awareness is the unsolved frontier.

TurborepoFastlaneMLflowdbt Cloud
F04 · 🗄️ Polyglot Data●●○ Medium

Data pipeline is a build artifact with its own versioning, testing, and deployment. Embedding model versions must be pinned and packaged alongside application artifacts — they're not infrastructure, they're code.

dbt CloudAirflowMLflow
F05 · 🔄 Five-Track CICD●●● High

The core phase for this trend. Five artifact types plus infrastructure: Web (Vite), Mobile (Fastlane), Backend (Docker/Helm), Data Pipeline (dbt Cloud), AI/ML (MLflow) — plus cloud infrastructure as a sixth artifact type, provisioned via Terraform/Terragrunt and applied on merge through Atlantis. Cross-artifact dependency management for coordinated release remains the industry's genuinely unsolved problem.

Turborepo/NxFastlaneDockerBuildkiteTerraform + Atlantis
F06 · 🔗 Middleware●○○ Low

Middleware configuration is infrastructure-as-code. Cache topology, queue configuration, and workflow definitions are versioned build artifacts with the same rigor as application code.

Docker ComposeRedis Docker
F07 · 🧪 Test Suite●●● High

Test pipeline optimization: parallel execution, smart test selection (only tests affected by changes), result caching, and sharding for long AI eval suites. Four test layers in CI requires deliberate pipeline engineering to keep feedback loops fast.

GitHub ActionsTest parallelizationTest sharding
F08 · 🔭 AI-Aware SRE●○○ Low

Observability collectors, exporters, and sampling configurations are build artifacts deployed as sidecars. Zero-gap telemetry from first deployment — not added after the first production incident.

OTel CollectorPrometheus exporters
Phase 06 / 09
Deploy
Multi-track release for humans and agents
F01 · 👥 Three-Stakeholder●○○ Low

Staged rollouts serve users. MCP endpoint versioning serves agents. Compliance-gated deploys serve reviewers. Each consumer needs different deployment controls.

LaunchDarklyArgo CD
F02 · 🏛️ DDD & Design●●○ Medium

Bounded contexts deploy independently. Domain event schema changes require careful versioning — a domain event is a public API for other contexts, not an internal detail.

KubernetesArgo CD
F03 · ⚡ Full-Stack●●○ Medium

Each track has different deployment mechanics: CDN (Web), App Store (Mobile), Kubernetes (Backend), Airflow (Data), serverless GPU (AI/ML). Feature flags manage the staggered reality of coordinated multi-track release.

KubernetesFastlaneModal
F04 · 🗄️ Polyglot Data●●○ Medium

Vector index warm-up before traffic switch, CDC replication lag monitoring during DB migrations, streaming topic migration strategies — data deployment has uniquely complex rollout requirements that differ from application deployment.

Qdrant CloudRisingWave CloudMongoDB Atlas
F05 · 🔄 Five-Track CICD●●● High

Five artifact types plus infrastructure: cloud provisioning (Terraform/Terragrunt via Atlantis) completes first — application services deploy into the infrastructure it defines. Each artifact type has its own deployment mechanism and rollback strategy. Feature flags coordinate the staggered reality of multi-track release when App Store review (2–7 days) blocks synchronisation.

Argo CDFastlane deliverLaunchDarklyTerraform + Atlantis
F06 · 🔗 Middleware●●○ Medium

Managed middleware services reduce operational burden. Cache warm-up before traffic cutover prevents degradation on new deployments. Queue draining before shutdown prevents task loss.

UpstashRedis CloudTemporal Cloud
F07 · 🧪 Test Suite●●○ Medium

Smoke tests validate basic functionality post-deploy. Canary deployments with automated rollback triggered by AI eval metric degradation — not just error rate — protect against qualitative regressions traditional health checks miss.

Smoke testsCanary analysisArgo Rollouts
F08 · 🔭 AI-Aware SRE●●○ Medium

Progressive delivery with AI-aware rollback triggers: if hallucination rate SLO breaches during canary deployment, automated rollback fires — not just if HTTP error rate increases.

Argo RolloutsGrafana SLOPagerDuty
Phase 07 / 09
Monitor
AI-aware observability · SLOs · Runbooks
F01 · 👥 Three-Stakeholder●●○ Medium

Monitor user satisfaction (NPS, error rates), agent reliability (API contract violations, MCP health), and reviewer metrics (compliance coverage, audit trail completeness) as three separate SLO tracks.

DatadogLangFuse
F02 · 🏛️ DDD & Design●○○ Low

Domain event throughput per bounded context provides business-meaningful monitoring metrics beyond infrastructure metrics. Domain-aligned dashboards make on-call faster because incidents map to business concepts.

OpenTelemetryGrafana
F03 · ⚡ Full-Stack●●○ Medium

Six-track monitoring: FE (Core Web Vitals, crash rates), BE (latency, error rates), Middleware (hit rates, queue depth), Data (freshness, quality), AI/ML (eval metrics), Platform (CICD success rate, infra costs).

DatadogLangFuseGreat Expectations
F04 · 🗄️ Polyglot Data●●● High

Data freshness (how old is retrieved data?), embedding drift (are old embeddings diverging from current data?), retrieval precision (is RAG finding the right context?), query latency per store — these data layer SLOs determine AI feature quality.

Arize PhoenixGreat Expectationsdbt tests in prod
F05 · 🔄 Five-Track CICD●●○ Medium

CICD health monitoring: build success rates per track, deployment frequency, lead time for changes, MTTR — the DORA metrics applied to all five tracks independently and as a coordinated system.

GitHub Actions metricsBuildkite analytics
F06 · 🔗 Middleware●●● High

Cache hit rate, miss rate, eviction rate, cost-per-hit — the economic metrics for semantic cache ROI. Queue depth, processing time, dead letter rate — the AI task reliability story. These are the metrics that justify AI infrastructure investment.

Redis MonitorBullMQ dashboardTemporal UI
F07 · 🧪 Test Suite●●○ Medium

Test pass rate trends, flaky test rates, AI eval metric trends over time, and data quality score trends as production monitoring complements. Test degradation in CI often predicts production incidents.

Test result trendsLangSmith
F08 · 🔭 AI-Aware SRE●●● High

The core phase — three monitoring layers, all active. Layer 1: cloud infra & APM — CPU, memory, network, service throughput (Datadog APM, Grafana+Prometheus, CloudWatch/Azure Monitor/GCP Cloud Monitoring). Layer 2: AI observability — LLM traces, retrieval precision, hallucination signals, cost per workflow, prompt version drift (LangFuse, Arize, OpenTelemetry). Layer 3: business value — task completion rate, user satisfaction, AI-assisted resolution rate. AI observability extends above APM — not instead of it. The SRE runbook for qualitative AI failure — where the AI is confident and wrong — remains the gap nobody has filled.

LangFuseArize PhoenixDatadog LLM Obs.PagerDutyCloudWatchGrafana + Prometheus
Phase 08 / 09
Deliver
Value to users · agents · reviewers
F01 · 👥 Three-Stakeholder●●● High

Delivery is complete only when all three stakeholders are satisfied: users have a working feature, agents have stable contracts, reviewers have a compliance trail. Most teams declare done after one.

MCP SDKSwaggerConfluence
F02 · 🏛️ DDD & Design●●○ Medium

Delivered features documented in ubiquitous language are understandable to domain experts, improving feedback quality and enabling non-technical reviewers to validate correctness — not just engineers.

Domain docsAsyncAPI
F03 · ⚡ Full-Stack●●○ Medium

Delivered features that work on web but not mobile, or have AI features dependent on stale data, are not done. Six-track delivery criteria is the new definition of feature completeness.

MCP SDKSwaggerFastlane
F04 · 🗄️ Polyglot Data●●● High

The delivered product's AI features are only as good as the data infrastructure beneath them. Fresh, high-quality polyglot data architecture enables features impossible with single-store approaches: semantic search, personalization, real-time AI context.

RAG APIsSemantic search endpoints
F05 · 🔄 Five-Track CICD●●○ Medium

Coordinated delivery across five artifact types — ensuring backend API is deployed before the mobile app that depends on it, embedding index is warm before semantic search is enabled — is a delivery orchestration problem.

Feature flagsLaunchDarkly
F06 · 🔗 Middleware●●○ Medium

Intelligent middleware enables features that would be unusably slow synchronously: background document processing, batch embedding generation, async RAG pipeline execution. Delivered as async-first APIs.

Semantic search APIsAsync AI endpoints
F07 · 🧪 Test Suite●●○ Medium

Test coverage reports, AI eval metric baselines, and data quality scores are delivery artifacts — evidence for reviewers that the feature was validated against all four quality dimensions before shipping.

Test coverage reportsAI eval dashboards
F08 · 🔭 AI-Aware SRE●●○ Medium

Delivered alongside the feature: agent audit trails (compliance and debugging), per-workflow cost reports (ROI visibility), and AI SLO dashboards (stakeholder trust). Enterprise buyers ask for all three in procurement now.

SLO dashboardsAudit trailsCost reports
Phase 09 / 09
Change
Iteration · Prompt Versioning · Domain Evolution
F01 · 👥 Three-Stakeholder●●○ Medium

Change requests from users, agents, and reviewers must all route through the same structured requirement process — not three separate channels that create three inconsistent requirement artifacts.

LinearNotion AI
F02 · 🏛️ DDD & Design●●● High

DDD provides structured change management: which bounded contexts are affected? Which domain events need versioning? Which consumers need migration? Without DDD, AI-assisted refactoring produces unpredictable cascades.

Domain event versioningStrangler Fig
F03 · ⚡ Full-Stack●●○ Medium

Large-scale changes spanning multiple tracks — new data model that changes the API that changes the UI that changes the mobile app — can be delegated to parallel AI agents operating on isolated branches with explicit cross-track contracts.

Claude Codegit worktreesdbt versioning
F04 · 🗄️ Polyglot Data●●○ Medium

Continuous embedding refresh via CDC means vector indices stay current as source data changes — automatically. Schema evolution across four store types requires careful sequencing and rollback planning traditional migration tooling wasn't designed for.

RisingWave CDCdbt versioning
F05 · 🔄 Five-Track CICD●●● High

Pipeline changes are themselves software that goes through the same review and testing as application code. Prompt version changes in AI/ML pipeline trigger automated eval runs before promotion — change is data-driven, not intuition-driven.

Feature flagsMLflowFastlane
F06 · 🔗 Middleware●●○ Medium

Semantic cache invalidation when underlying data or embedding models change is harder than TTL-based cache. Cache-breaking migrations for AI workloads require careful coordination between data pipeline and cache layer.

Redis cluster migrationKafka topic migration
F07 · 🧪 Test Suite●●○ Medium

Every change to an AI feature requires updating the golden dataset and re-running evals. Prompt changes, model version changes, and embedding model changes all trigger eval pipelines before promotion.

Eval regression testsGolden dataset updates
F08 · 🔭 AI-Aware SRE●●○ Medium

Every prompt change, model version change, or agent workflow change is treated as an experiment with an eval gate. AI SRE owns the promotion criteria: what eval improvement justifies the change? What condition triggers reversion?

Feature flagsA/B eval frameworksPrompt versioning
// the pattern

High-impact intersections concentrate in three zones: Design (DDD, Full-Stack, Polyglot Data), Test/Build (Five-Track CICD, Test Suite), and Monitor (Polyglot Data, Middleware, AI-Aware SRE). These are the phases where the investment compounds — or where the debt accumulates. Everything else is medium. There are very few genuine lows.

// tool references last reviewed · June 2026