8 Forces Reshaping How Software Gets Built

Chapter 8 of 18 Practitioner · 12 min

End-to-end, from requirements to reliability: eight structural shifts in software development that are real and happening now. No tool roundups. No productivity statistics. Honest trade-offs and the specific stack for each – from the standpoint of someone building production systems, not selling a course.

// the crux

The AI era puts eight specific pressures on how software gets built – paired across the SDLC from planning to production. This is the map: eight forces, four pairs, each one the ceiling for the next. The four chapters after this walk them in detail.

// in one breath

Eight forces, one continuous arc – from how a requirement is written to how a live AI system is kept honest in production.
Each force card covers: what it is, the key insight, advantages, trade-offs, and a 2026 tool stack.
Chapters 9–12 go deep on each force pair; Chapter 13 maps all eight forces across the nine SDLC phases.

For twenty years, the hardest part of shipping software was writing it. That is no longer where the difficulty lives. AI now sits in every phase of delivery – discovery, design, the build, the test suite, the 3am page – and I keep watching two kinds of team pull apart. One bought the tools and bolted them onto the same process they always ran. The other tore the process up and rebuilt it around how these systems actually behave. The gap between them widens every quarter. These eight forces are the blueprint the second kind is building from.

01 · Three-Stakeholder Analysis

02 · DDD as AI Language

03 · Six-Track Full-Stack

04 · Polyglot Data

05 · Five-Track CICD

06 · Smart Middleware

07 · Unified Test Suite

08 · AI-Aware SRE

👥 Paradigm Shift Requirements & Discovery 01 / 08

Three-Stakeholder Analysis: Users, Agents & Reviewers

Every requirement now serves three distinct audiences – the humans who will use it, the AI agents who will build it, and the reviewers who will validate quality and compliance. Writing for one is no longer enough.

// Key insight

The most expensive bugs are not logic errors – they are requirement errors that survive into production because the spec was written for only one reader. In 2026 requirements must be multi-modal: user stories for UX clarity, structured agent specs for implementation precision (unambiguous boundaries, explicit edge cases, machine-parseable acceptance criteria), and compliance checklists for reviewer traceability. AI can generate all three from a single discovery session – but only if engineers consciously design the analysis process for three outputs, not one. The upstream quality of your analysis is now the ceiling of your AI-generated implementation quality.

Advantages

AI agents given structured specs produce dramatically more accurate implementations
Reviewer traceability from requirement → test → deploy reduces audit burden to minutes
Three-audience approach surfaces conflicting assumptions before design starts
RAG over historical decisions prevents reinventing solutions to solved problems

Trade-offs

Three-artifact process takes more upfront time – teams under pressure skip it
Requires engineers to think simultaneously as analysts, agents, and auditors
Legacy orgs resistant to structured requirement formats for AI consumption
AI-surfaced conflicts between new and existing requirements can paralyze teams

// Stack

Claude / Sonnet

Multi-modal requirement generation

LlamaIndex + RAG

Historical decision retrieval

Linear / Notion AI

Structured requirement management

EventStorming

Domain discovery facilitation

Perplexity Agents

Competitive & technical research

Confluence AI

Reviewer-facing documentation

🏛️ Strategic Architecture & Domain Design 02 / 08

DDD as the Language of AI Delegation

Domain-Driven Design has quietly become the most critical AI engineering skill – not philosophically, but practically. Bounded contexts are the natural unit of AI agent assignment, and ambiguous domains produce ambiguous AI output.

// Key insight

When you give an AI agent an ambiguous problem space, you get an ambiguous implementation – passing all tests, violating all intentions. When you give it a clearly defined bounded context – with aggregates, domain events, ubiquitous language, and explicit boundaries – you get a service you can ship. In teams I have worked with, clear DDD models cut AI code review by half or more – sometimes closer to two-thirds. Teams without domain models find AI accelerates the creation of a Big Ball of Mud. DDD precision is prompt engineering at the architectural level. Beyond domain modeling, 2026 design spans wireframes, UI/UX systems, use cases, cache strategy, and polyglot DB selection – all artifacts that upstream the quality of everything AI builds downstream. The documentation standards carry the same weight: arc42, the C4 model, and UML give the design deliverable a defined form, one an agent can draft and a reviewer can grade.

Advantages

Clear bounded contexts map directly to AI agent task assignment
Ubiquitous language eliminates ambiguity that causes AI to hallucinate architecture
Domain events become natural inter-service communication contracts
DDD models serve as living documentation agents can query for context

Trade-offs

DDD upfront investment is high – wrong model is architecturally expensive to undo
Most codebases were not DDD-designed – retrofitting requires careful migration
AI agents can violate domain invariants if bounded context is not explicit
Domain experts (not engineers alone) must participate in modeling sessions

// Stack

EventStorming

Domain discovery & event mapping

arc42 + C4 + UML

Design documentation standards

Miro / Mural

Context mapping workspace

OpenAPI + AsyncAPI

Contract-first API design

Figma AI

Wireframe & UI/UX design

Claude Code

DDD-aware implementation generation

PostgreSQL + Redis

Relational + cache design baseline

Deep dive → Chapter 9 · Forces 01–02: Before the First Line of Code

⚡ Mainstream Now Full-Stack Development 03 / 08

Six-Track Full-Stack AI Orchestration

Full-stack in 2026 is six parallel tracks: Frontend (Web + Mobile), Backend (APIs + DDD microservices), Middleware (Cache + Queues), Data Engineering, AI/ML, and Platform (CICD + SRE). Most teams run two.

// Key insight

Nearly all the AI dev tool adoption I have seen happens at the individual level. The teams winning are those where FE, BE, Data, and ML agents know about each other’s contracts. The senior engineer’s role has fundamentally shifted from implementor to contract owner – defining the interfaces between tracks and letting AI handle implementation within those contracts. Code review is now primarily contract review and domain invariant validation. The biggest productivity gains come not from individual AI tool adoption but from AI-coordinated multi-track development with shared context and explicit interface contracts between every layer.

Advantages

Parallel track development compresses feature delivery from weeks to days
AI maintains consistency across tracks when given shared interface contracts
Data engineering track elevates quality of AI/ML, which elevates all other tracks
Platform track (CICD + SRE) can be AI-generated from infra and reliability specs

Trade-offs

Cross-track contract ownership requires new engineering disciplines few teams have
AI-generated code in one track can silently break another without contract tests
Mobile track (App Store cycles) does not accelerate – it remains the bottleneck
Data engineering is the unsexy, unglamorous blocker for every AI feature

// Stack

Claude Code + worktrees

Parallel multi-track agent orchestration

React / Next.js + RN

Web + mobile frontend

FastAPI / Go + DDD

Domain-bounded backend services

Kafka + Redis + BullMQ

Middleware event + cache layer

dbt + Airflow

Data engineering pipeline

LangChain / LlamaIndex

AI/ML integration track

🗄️ Foundational Data Architecture 04 / 08

Polyglot Data: SQL + NoSQL + Vector + Streaming

The database decision is now four-way. Relational for consistency, document for flexibility, vector for semantic search, streaming for freshness. Most teams pick one and suffer with the others. The skill is knowing which data lives where.

// Key insight

AI features are only as good as the data infrastructure beneath them – and most teams discover this after shipping. The modern AI-augmented data stack has four distinct layers solving distinct problems: PostgreSQL for transactional consistency and complex queries, MongoDB for flexible schema evolution, pgvector or Qdrant for semantic retrieval and RAG context, and Kafka + RisingWave for real-time event streaming and fresh materialized views via CDC. The design decision is not which database to use – it is how data flows between all four, and what transformation logic (dbt) sits between them. Getting this wrong in design means paying for it in every subsequent phase.

Advantages

Each data type optimised for its access pattern unlocks performance impossible with one DB
Vector layer enables semantic search over private enterprise data
Streaming CDC keeps AI retrieval context fresh without expensive batch reindexing
Polyglot design makes the architecture evolvable as AI data needs change

Trade-offs

Operational complexity of four database systems requires a mature platform team
Data consistency across stores requires careful eventual consistency boundary design
Embedding model selection and RAG chunking strategy require genuine expertise
Cost of four managed database services at enterprise scale is significant

// Stack

PostgreSQL + pgvector

Relational + vector unified store

MongoDB

Document store for flexible schema

Redis

Cache + semantic response cache

Kafka + RisingWave

Streaming + real-time materialized views

dbt

Cross-store transformation layer

Qdrant / Pinecone

Dedicated vector store at scale

Deep dive → Chapter 10 · Forces 03–04: The Build and the Store

🔄 Unsolved CICD & Build Orchestration 05 / 08

Five-Track Coordinated CICD Pipelines

Modern delivery produces five artifact types: Web, Mobile, Backend (DDD microservices), Data pipelines, and AI/ML pipelines. Coordinating their build, test, and release as one coherent system is the industry’s genuinely unsolved problem.

// Key insight

A single feature can span all five artifact types simultaneously. “Add semantic search to the mobile app” requires: mobile build, web build, backend service build, data pipeline (embedding generation), and AI/ML pipeline (vector index warm-up). Each has different build tools, different test frameworks, different deployment mechanisms, and different rollback strategies. Turborepo handles monorepo coordination within a track. GitHub Actions handles orchestration between tracks. But the cross-artifact dependency graph for coordinated release – ensuring data pipeline artifacts are valid before AI/ML pipeline artifacts are promoted – is genuinely unsolved at the tooling level in 2026. Teams that crack this compound their delivery velocity dramatically.

Advantages

Coordinated release eliminates cross-track integration failures discovered in production
AI-generated pipeline config reduces DevOps toil and pipeline maintenance burden
Independent track pipelines allow fast iteration where store review is not the constraint
Five-track visibility gives engineering leads true delivery pipeline insight

Trade-offs

App Store review (2–7 days) is the hard wall for any coordinated release
Cross-artifact dependency management requires custom orchestration tooling today
Data pipeline failures silently corrupt AI/ML pipeline correctness downstream
Five build contexts multiply infrastructure cost and pipeline maintenance complexity

// Stack

GitHub Actions

Multi-track pipeline orchestration

Turborepo / Nx

Monorepo build coordination

Fastlane

Mobile CI/CD and store automation

dbt Cloud

Data pipeline CI with lineage

MLflow / DVC

ML pipeline versioning + tracking

Buildkite

Enterprise parallel CI execution

🔗 Underrated Middleware & Integration 06 / 08

Intelligent Middleware: Semantic Cache + Event Intelligence

Redis is no longer just a cache. Queues are no longer just buffers. In AI-augmented systems, the middleware layer has become the most important cost and latency lever – and the least invested-in.

// Key insight

Semantic caching deduplicates LLM calls by meaning, not exact string match. At enterprise scale, deployments I have seen report inference cost reductions of 40–70%. But the deeper shift is in event-driven queues – they now carry AI task payloads with context, priority levels, retry-with-context semantics, and result caching. Dead letter queues for failed AI tasks need fundamentally different retry logic: the context that caused the failure is often as valuable as the retry, because it informs the next attempt’s prompt. Teams designing middleware for AI workloads discover that cache strategy and queue design are first-class architectural decisions. Most still treat them as infrastructure footnotes. The gap between these two positions can be close to half your AI inference bill.

Advantages

Semantic caching can cut LLM inference costs by a reported 40–70% at meaningful scale
Event-driven AI tasks decouple expensive inference from user-facing latency
Queue priority enables real-time and batch AI processing in one unified system
Middleware observability surfaces AI cost attribution at task granularity

Trade-offs

Semantic cache invalidation is architecturally harder than TTL-based cache
AI task queue monitoring requires new tooling beyond standard queue depth metrics
Context-aware retry logic for AI tasks is complex to implement and test
Middleware adds operational complexity teams consistently understaff

// Stack

Redis + GPTCache

Semantic LLM response caching

BullMQ / Kafka

AI task queue with context

Temporal

Durable AI workflow execution

Dapr

Distributed middleware runtime

Upstash

Serverless Redis for edge AI

RabbitMQ

Service message bus

Deep dive → Chapter 11 · Forces 05–06: The Pipeline and the Cache

🧪 Critical Gap Quality & Testing 07 / 08

The Unified Test Suite: Functional + Non-Functional + AI Eval

The traditional test pyramid needs a fourth layer: AI evaluation. Functional, non-functional, and AI eval tests belong in one integrated pipeline – not three separate tools owned by three different teams who rarely talk.

// Key insight

The AI eval gap is the most dangerous blind spot in 2026 software delivery. Your load balancer knows if your API is slow. Your error tracker knows if your API crashes. Nothing tells you if your AI feature is confidently wrong. Ragas, DeepEval, and LangSmith provide the primitives – faithfulness, relevance, hallucination rate, context precision – but adoption is low and ownership is unclear. Equally missing: the data quality test suite. AI features are downstream of data pipelines. Data quality issues surface as AI quality issues, silently, in production, affecting real users for days before anyone connects the symptom (bad AI output) with the cause (stale embedding index, schema drift, pipeline failure). The deterministic side has its own neglected standard: orthogonal arrays size the functional test matrix by derivation instead of instinct, a published method an agent can execute and a reviewer can audit.

Advantages

AI eval in CI catches hallucination regressions before production impact
Non-functional tests in every PR prevent silent performance regressions at merge time
Unified four-layer suite gives one dashboard view of all quality dimensions
Data quality tests protect AI features from silent upstream pipeline failures

Trade-offs

AI eval requires curated golden datasets that must be maintained as behaviour evolves
Four-layer test suite increases CI runtime – smart parallelization is mandatory
No industry standard for AI eval metrics – teams must define their own SLOs
Security and load testing in CI requires careful environment isolation

// Stack

Vitest / Playwright / Pact

Functional + contract testing

Orthogonal arrays / NIST ACTS

Functional test matrix sizing

k6 / Locust

Load & performance testing in CI

Snyk / OWASP ZAP

Security scanning in pipeline

Ragas / DeepEval

AI evaluation framework

Great Expectations / dbt

Data quality validation

LangSmith

LLM eval tracking + golden datasets

🔭 Frontier Reliability & Platform Engineering 08 / 08

AI-Aware SRE: When Runbooks Meet Hallucination

SRE runbooks assume deterministic systems. AI systems are not. The discipline needs a new layer – AI Reliability Engineering – with new SLOs, new failure modes, and new runbooks for failures your current monitoring cannot see.

// Key insight

The qualitative failure mode is SRE’s blind spot: AI output is confidently wrong, an agent took a bad autonomous action, model quality degraded over time – but latency is normal, error rate is normal, and no alert fires. This incident hits every team shipping AI features, typically 3–6 months after launch when novelty wears off and edge cases accumulate in production. The answer is not more dashboards – it is defining what “healthy” means for an AI system: hallucination rate SLO, retrieval precision floor, agent action audit rate, cost-per-delivered-value metric. Platform engineering in 2026 means building infrastructure for both human-facing and agent-facing reliability – including MCP server health checks, agent decision traceability, and autonomous action rollback mechanisms that work when things go wrong at 3am.

Advantages

Hallucination SLOs create organisational accountability for AI feature quality
Agent action audit trails enable rollback of harmful autonomous decisions
Platform abstractions let product teams ship AI features safely without deep AI expertise
Cost-per-value metrics surface ROI of AI features with evidence, not assumptions

Trade-offs

No industry standard for AI SLOs – every org defines theirs from scratch
AI observability tooling is 2–3 years behind infrastructure observability maturity
On-call for AI systems requires different debugging skills than traditional SRE
Preventing bad agent autonomous actions is the only strategy – rollback is often impossible

// Stack

LangFuse / LangSmith

LLM trace observability

Arize Phoenix

AI eval in production

OpenTelemetry

Unified distributed trace standard

Datadog LLM Obs.

Enterprise AI monitoring

Guardrails AI / Llama Guard

Agent safety + PII filtering

PagerDuty + custom SLOs

AI-aware alerting and runbooks

Deep dive → Chapter 12 · Forces 07–08: The Eval and the Runbook

// The intersection point

These eight forces are not independent. Weak requirements cap the domain model; a shaky domain model corrupts everything AI generates from it; thin data infrastructure starves the AI features built on top; and weak monitoring decides how long you stay blind when one of them fails. Each force upstream sets the ceiling for every force downstream.

Eight forces, four pairs, one continuous arc – from how a requirement is written to how a live AI system is kept honest in production. That is the whole map, and the cost of being wrong rises as you move down it: each force is the ceiling for the next.

// carry forward

Now the territory, two forces at a time. Chapter 9 starts before the first line of code – planning and architecture, where AI quietly rewrites who the requirements are even written for.

// tool references last reviewed · 2026