300 Transactions Per Second – Then Silence

There is a particular kind of frustration that comes from a well-designed system that still underperforms. The architecture is sound. The code is clean. Everything scales. And yet – the numbers tell a different story. In 2010, I found out why.

Everything Looked Right on Paper

The system was a core Java-based online transaction processing platform – OLTP – processing 300 transactions per second at peak load. The architecture was multi-tiered, with each layer capable of scaling independently based on demand. It handled spikes. It was resilient. From a design standpoint, there was very little to fault.

300

TPS at peak

Java

Core stack

JVM settings tuned

2010

Production system

But system speed and stability were consistently below expectation. Response times were erratic under load. Transaction processing would stall intermittently – not long enough to trigger an outage, but long enough to accumulate into a real performance problem in a system where every millisecond of delay compounds across hundreds of concurrent transactions.

When I looked deeper, the answer was both obvious in retrospect and completely invisible in the moment. The JVM configuration and garbage collection settings had never been touched. The system was running entirely on defaults. In a high-performance application processing hundreds of transactions per second, this is roughly equivalent to precision-engineering a racing engine and leaving the fuel mixture at factory spec.

the investigation

Switching to G1GC

The first meaningful decision was switching to the G1 Garbage Collector (G1GC). It was relatively new at the time – introduced as experimental in Java 6 and not yet mainstream – but its design philosophy was exactly right for an OLTP workload.

Traditional garbage collectors treated the heap as a monolithic block, cleaning it in sweeping passes that could halt all application threads. G1GC takes a different approach: it divides the heap into equal-sized regions and works on them incrementally, concurrently, in small controlled bursts. Rather than triggering one large Full GC that stops the world, it cleans sections of the old generation continuously, often deferring the need for a Full GC entirely.

For a system where even a 200ms pause means dozens of delayed transactions, this was the right trade-off. The catch: more tuning knobs is not the same as simpler tuning. G1GC gave us more control and more complexity in equal measure. Finding the right configuration took sustained effort across four distinct dimensions.

the four challenges

Four Dimensions of the Challenge

// 01

Latency vs. Throughput

OLTP systems demand that each individual transaction completes quickly. Any pause – even a brief one – compounds across hundreds of concurrent transactions, creating backlogs that ripple through the entire system. Maximising throughput and minimising latency pull in opposite directions. Tuning for one degrades the other. Finding the right balance for a specific workload profile requires measurement, not intuition.

// 02

Memory Management at Scale

At 300 TPS, the application required large heap sizes to hold the working set of concurrent transactions. Large heaps make garbage collection more expensive – more memory to scan, more objects to evaluate. Without precise tuning, extended pause times during GC cycles would periodically halt transaction processing entirely. Getting the young-to-old generation ratio right was critical and non-obvious.

// 03

Predicting Load Changes

OLTP systems do not run at constant load. Traffic varies by time of day, day of week, month-end cycles, promotional events. GC settings that perform well at average load may degrade severely under peak load – or waste resources during quiet periods. The configuration needed to remain stable and responsive across the full load spectrum, which took significant trial and error to achieve.

// 04

There Is No Universal Configuration

Garbage collection has no one-size-fits-all solution. Every system has a unique object allocation pattern, memory profile, and latency requirement. With G1GC specifically, the number of configurable parameters – pause time targets, region sizes, initiating heap occupancy thresholds – means that configuration from another system is only a starting point. The right settings have to be found through continuous monitoring and evidence-based adjustment.

the mechanism

Minor GC vs. Full GC: The Distinction That Matters

To tune effectively, you have to understand the two fundamentally different collection events that G1GC – like all generational collectors – performs. They are not just different in scale. They are different in cause, behaviour, predictability, and impact on a running system.

// Minor GC

Young Generation Collection

Focus Targets the young generation – the memory region where new objects are allocated. Most objects here are short-lived and die young.

Pauses Shorter and more frequent. Involves a smaller memory region, so the stop-the-world pause is brief and bounded.

Control Highly predictable with good tuning. By collecting young objects regularly, Minor GCs keep pauses brief and the system running smoothly.

// Full GC

Old Generation Collection

Focus Cleans the entire heap – both young and old generations. A far more intensive operation that processes significantly more memory.

Pauses Longer and highly disruptive. Stops all application threads during cleanup – the classic "stop-the-world" event that stalls transaction processing.

Control Harder to control. Triggered by low memory or fragmentation. With G1GC, good configuration can reduce frequency significantly – but never eliminate it entirely.

G1GC's core contribution to this problem is its ability to defer Full GC by incrementally collecting the old generation in concurrent background threads. By processing small regions of old-generation memory continuously, it keeps the heap clean enough that a Full GC becomes an infrequent last resort rather than a regular occurrence. In our OLTP system, shifting from frequent Full GCs to rare ones was where most of the performance improvement came from.

the outcome

Finding the Balance

After sustained monitoring, adjustment, and evidence-based tuning – iterating on young-to-old generation ratios, pause time targets, region sizes, and heap occupancy thresholds – the system stabilised. Do the arithmetic and you understand what stabilised actually means here: at 300 transactions per second, every second of stop-the-world pause is three hundred transactions sitting in flight – and Full GC pauses on a large heap are measured in seconds, not milliseconds. Frequent Full GCs meant that backlog was being created, drained, and recreated all day long. That was the erratic behaviour our users felt.

Sixteen years on, I no longer have those dashboards, and I will not quote precise latency numbers I cannot stand behind. What I can tell you is what changed shape. Before tuning, the pause profile was spiky and unpredictable – dominated by Full GC events that arrived without warning. After tuning, Full GCs became a rare event instead of a routine one; the work shifted to Minor GCs, whose brief, bounded pauses disappeared into the noise of normal latency. The backlog stopped forming. Transaction processing became smooth – not faster at the median, but honest at the tail, which in payment processing is the only place it counts.

But the deeper lesson was about process, not configuration. JVM tuning is not a one-time exercise you complete and move on from. The right configuration today may need revisiting when transaction volume grows, when the data model changes, when a new release alters the object allocation pattern. Monitoring is not an afterthought – it is the ongoing work.

What this experience made clear is that architectural correctness and runtime correctness are separate concerns. A well-designed system running on misconfigured infrastructure is still a poorly performing system. The two have to be right together.

In high-performance systems, the most expensive problems accumulate quietly, beneath the architecture. The work of a performance engineer is to find them before the system does.

The JVM is not a detail. It is the runtime environment your code lives in – every object allocation, every method call, every pause happens inside it. Leaving its configuration at defaults is a reasonable starting point in development. It is not a reasonable production decision for a system that processes hundreds of transactions per second.

Understand your garbage collector. Choose it deliberately. Monitor it continuously. The defaults were designed for the general case. Your system is not the general case.