Some years ago I worked on a real-time, mission-critical system that had to answer in milliseconds, under global load. We shipped it to production and it held – not because the cloud was generous with resources, but because the design underneath it was right: stateless services, in-memory caching layers, graceful degradation, isolated read and write paths, bounded queues with controlled retries. It was the first time I watched, in production, what a genuinely good architecture can do. Most of what I believe about cloud design traces back to that system.
What kept it standing was not one clever trick but a handful of decisions made on purpose. Canary releases, so a bad deploy could singe one corner instead of burning the whole system down. Health checks tied to real business logic, not merely to a process still being alive. Defensive timeouts and circuit breakers, so that one slow dependency could not drag everything upstream down with it. Autoscaling helped – but autoscaling was never the thing that made it resilient. The design was. Stateless services recover and scale without sticky-session drama; microservices, drawn along real boundaries, contain a failure instead of spreading it. Scale, it turned out, is something you earn by building a system worth scaling.
It taught me where the real work of a system lives. Features are what users touch – the screens, the flows, the outcomes – and features are what get a system launched. The qualities underneath them, the non-functional requirements, are what decide whether it survives once it is live. Functional requirements keep users happy; non-functional requirements keep the business alive. And they are not one team's job in one diagram – they run through every layer, from the cloud infrastructure to the architecture to the code to the pipeline that ships it.
Always available. Fast. Cost-effective. Sustainable. Scalable. Resilient. Every stakeholder can recite what they want from a cloud system, and the list barely changes from one project to the next. Naming the qualities was never the hard part. The hard part is turning each one into a number a system can be held to.
The demand for cloud is no longer about keeping the lights on. It is about enabling innovation at scale – systems that grow without being rebuilt, absorb traffic nobody forecast, and stay honest under cost and regulatory pressure at the same time. Designing for that means making decisions that optimise cost, performance, and flexibility today, while deliberately leaving room for the decisions you have not made yet.
So before I design anything, I run it through four lenses: throughput, load, capacity, and scale. Each one gets defined honestly. Then – this is the step most teams skip – each one gets mapped to a service level indicator, a service level objective, and a service level agreement, with the system's real limits and constraints kept in plain sight. The lenses tell you what to think about. The service levels turn that thinking into a contract.
I have written separately about the wider principles that surround this work – the right mindset, the readable code, the recoverable system. This piece is about the part that has to be measured.
Four Lenses Before Any Design
From a distance these four look like the same question asked four ways. They are not. Conflating them is how a system ends up over-built in one dimension and falls over in another.
The rate of successful processing
Throughput is the volume of transactions, requests, or messages a system completes per second. It is about flow, not the ability to hold. A system can have ample resources and still bottleneck on throughput when a single stage in the path cannot keep up. Measured in TPS, QPS, or RPS, it is the first number I want to know about any path that matters.
The demand arriving at once
Load is the pressure placed on the system – peak traffic, concurrent requests, the busiest minute of the busiest day. It is the question of limits: how much arrives together, and how the system behaves as that number climbs, rather than how it behaves on a quiet average afternoon. Design for the average and the peak will find you.
How resources meet demand, now and later
Capacity is how resources are allocated to serve current and future demand, and it lives in CPU, memory, storage, and connections. It is the headroom question: how close to the ceiling you run, and how quickly you can raise that ceiling when the data or the user base grows. Capacity is what runs out quietly, long before anything throws an error.
Handling the spike beyond steady load
Scale is the ability to support dynamic demand that exceeds consistent load, adjusting to real-time needs. It is throughput, load, and capacity in motion – the system adding or shedding resources, horizontally or vertically, so that a surge becomes a line on the bill instead of an outage. Scale is the difference between a system that survives success and one that is broken by it.
Make the Promise Measurable: SLI, SLO, SLA
A quality you cannot measure is a hope, not a requirement. Three terms turn each lens into something you can hold a system to. An SLI (service level indicator) is what you actually measure. An SLO (service level objective) is the internal target you hold that measurement to. An SLA (service level agreement) is the external promise, usually with consequences attached when it is missed. If you want the foundational treatment of the three, Google's Site Reliability Engineering team wrote the SRE primer on SLIs, SLOs, and SLAs.
Here is what that looks like when the four lenses are written down honestly. The figures below are illustrative – the discipline is not the specific numbers, it is that every system has its own version of this table, decided early rather than discovered under load.
| Lens | SLI – what we measure | SLO – the target | SLA – the promise |
|---|---|---|---|
| Throughput | Transactions completed per second | Sustains 10,000 TPS, with 99% completing within 50 ms | Guaranteed 1 Gbps for all data transfer |
| Load | 99th-percentile latency under peak concurrency | ≤ 200 ms at 1 million QPS; 95% served under 100 ms during spikes | Handles up to 500,000 QPS at 99.95% success |
| Capacity | CPU and storage utilisation | CPU ≤ 70% in normal operation; storage below 80% for 95% of the time | 1 TB provisioned within 5 minutes of request |
| Scale | Time to adjust allocation as demand moves | Auto-scales within 5 minutes; 99% of scale events with no workload impact | Scales to 2× load within 10 minutes, or credits apply |
None of these numbers are universal, and that is the point. What matters is that someone decided them on purpose, wrote them down, and can now tell whether the system is meeting them. A team that does not know its SLOs cannot tell the difference between a healthy system and one bad deploy away from a breach. A system without an SLA has no contract with the people who depend on it.
And a contract you cannot prove you kept is barely a contract at all. On the regulated systems I have worked on, that meant every transaction traceable end to end, every error correlated across services in seconds rather than hours, audit logs that genuinely met audit-trail standards, and a rollback you could demonstrate on demand instead of merely hope for. Miss those and it is not technical debt you are carrying. It is customer trust, or compliance, and both cost far more than the engineering ever would have.
Five Commitments That Keep It Honest
The lenses and the service levels describe a system at a single point in time. These five commitments are what keep it trustworthy as it grows into traffic and obligations it has not met yet.
Scalability beyond limits
A system should scale horizontally or vertically with no visible impact on the customer. The user should never be able to tell that you just doubled the fleet behind them. If scaling is something they can feel, it is not finished.
Resilience by design
Failure is not an edge case. It is inevitable. Self-healing mechanisms and globally distributed architectures let a system absorb a failure and finish the journey instead of going dark. Resilience belongs in the first diagram, not bolted on after the first incident.
Data-driven sustainability
Green cloud practices and energy-efficient resource allocation are not only an environmental position. They are a cost position. The same idle capacity that inflates a carbon footprint inflates the bill. Measuring and trimming what you do not use serves both at once.
Uptime as a commitment
Four nines – 99.99% availability – and beyond are not a marketing line. They are the product of deep redundancy, automated failover, fault isolation, and real-time monitoring. Every nine you add costs more than the last, which is exactly why uptime is a decision, not a default.
Security with zero trust
Zero Trust Architecture assumes no request is trusted by default, inside the perimeter or out. In regulated domains – payments, banking, anything touching personal data under GDPR or DORA – this is also how you satisfy data sovereignty, compliance, and privacy without slowing delivery to a crawl.
What This Adds Up To
Put together, this is a way of working, not a checklist. The four lenses decide what matters. The service levels make those things measurable. The five commitments keep them true as the system grows. Skip the measurement and you are left with adjectives – fast, scalable, resilient – that nobody can verify and everybody will argue about during the incident.
The best cloud decisions optimise for cost, performance, and flexibility while leaving deliberate room for the decisions still to come. That room is not indecision. It is the difference between a system that ages well and one that has to be rebuilt the first time reality disagrees with the plan.
Built with that intention, cloud architecture is transformative. Built without it, it quietly works against the people it was meant to serve.