1 min read By Vamsi Karuturi · Senior Backend Engineer at Salesforce

Production Performance Tuning

The difference between "works in dev" and "survives Black Friday" — real numbers, real formulas, real configs.

Real-World Analogy

Tuning a production Spring Boot app is like tuning a race car. The engine (JVM) needs the right fuel mixture (heap size), the transmission (thread pool) needs proper gear ratios for the track (workload type), the cooling system (GC) must prevent overheating under sustained load, and the tires (connection pool) must match road conditions. Tuning one component without understanding the others makes things worse.

JVM Tuning for Spring Boot

Heap Sizing

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    subgraph Heap["JVM Heap"]
        Y["Young Gen<br/>(Eden + Survivor)"] -->|"promotion"| O["Old Gen<br/>(long-lived objects)"]
    end
    M["Metaspace<br/>(class metadata)"] -.->|"outside heap"| Heap

    style Y fill:#ECFDF5,stroke:#6EE7B7,color:#1E40AF
    style O fill:#FEF3C7,stroke:#FCD34D,color:#1E40AF
    style M fill:#EFF6FF,stroke:#DBEAFE,color:#1E40AF

Setting	Recommendation	Why
`-Xms` = `-Xmx`	Set both equal	Avoids heap resize pauses (costly GC during growth)
Heap size	50-75% of container memory	Leave room for metaspace, thread stacks, native memory, OS
Container 2GB	`-Xmx1400m`	Leaves ~600MB for non-heap
Container 4GB	`-Xmx3g`	Sweet spot for most microservices

Container Memory Kills

If JVM uses more memory than the container limit, Kubernetes OOMKills the pod — no graceful shutdown, no heap dump, just gone. Always leave 25-30% headroom.

Bash

# Container-aware JVM settings (Java 17+)
java -XX:+UseContainerSupport \
     -XX:MaxRAMPercentage=75.0 \
     -XX:InitialRAMPercentage=75.0 \
     -XX:+UseG1GC \
     -jar app.jar

GC Algorithm Selection

GC Algorithm	Latency (p99)	Throughput	Heap Range	Best For
G1GC	50-200ms pauses	Good	2-8GB	General purpose APIs (default)
ZGC	<1ms pauses	Slightly lower	8GB-16TB	Low-latency (trading, real-time)
Shenandoah	<10ms pauses	Good	2-16GB	Similar to ZGC, RedHat JDKs
Parallel GC	200-500ms pauses	Highest	2-8GB	Batch jobs, throughput-critical

Bash

# For a typical REST API microservice (G1GC with tuning)
java -XX:+UseG1GC \
     -XX:MaxGCPauseMillis=100 \
     -XX:G1HeapRegionSize=16m \
     -XX:+ParallelRefProcEnabled \
     -Xmx3g -Xms3g \
     -jar app.jar

# For low-latency service (ZGC — Java 21+)
java -XX:+UseZGC \
     -XX:+ZGenerational \
     -Xmx4g -Xms4g \
     -jar app.jar

Rule of Thumb

If your p99 latency SLA is <50ms → use ZGC. If it's <200ms → G1GC is fine. If you're running batch jobs → Parallel GC gives best throughput.

Connection Pool Tuning (HikariCP)

The Golden Formula

Pool Size Formula

Text Only

pool_size = (core_count * 2) + effective_spindle_count

For SSDs (spindle_count = 0): a 4-core machine needs a pool of ~10 connections.
For most microservices: 10-20 connections is the sweet spot.

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    subgraph App["Application (20 Tomcat threads)"]
        T1["Thread 1"] & T2["Thread 2"] & T3["Thread ..."] & T4["Thread 20"]
    end

    subgraph Pool["HikariCP Pool (max=10)"]
        C1["Conn 1 (active)"] & C2["Conn 2 (active)"] & C3["Conn 3 (idle)"]
    end

    subgraph DB["PostgreSQL (max_connections=100)"]
        D1[("Connections")]
    end

    App -->|"borrow"| Pool
    Pool -->|"actual DB"| DB
    T1 -.->|"waiting..."| Pool

    style T1 fill:#FEE2E2,stroke:#FCA5A5,color:#1E40AF
    style C1 fill:#ECFDF5,stroke:#6EE7B7,color:#1E40AF
    style C2 fill:#ECFDF5,stroke:#6EE7B7,color:#1E40AF
    style C3 fill:#FEF3C7,stroke:#FCD34D,color:#1E40AF

Production Configuration

YAML

spring:
  datasource:
    hikari:
      # Pool sizing
      maximum-pool-size: 10          # max connections in pool
      minimum-idle: 5                # keep 5 ready connections
      # Timeouts
      connection-timeout: 30000      # fail fast if no connection in 30s
      idle-timeout: 600000           # close idle connections after 10min
      max-lifetime: 1800000          # recycle connections every 30min
      # Leak detection
      leak-detection-threshold: 60000  # warn if held > 60s
      # Validation
      connection-test-query: SELECT 1
      validation-timeout: 5000

The Biggest Mistake: Pool Too Large

More connections ≠ more performance. Each PostgreSQL connection costs ~10MB of RAM. 100 connections across 10 pods = 1000 connections, potentially exhausting the database's max_connections. Keep pools small and let threads queue.

Pool Sizing by Workload

Workload	Pool Size	Reasoning
Simple CRUD API	5-10	Quick queries, low contention
Report generation	3-5	Long queries, fewer concurrent
Mixed (API + batch)	10-15	Separate pools recommended
High-throughput event processor	15-20	Many concurrent writes

Thread Pool Configuration

Tomcat Thread Pool

YAML

server:
  tomcat:
    threads:
      max: 200         # max worker threads (default: 200)
      min-spare: 25    # keep 25 threads ready
    max-connections: 8192  # max TCP connections queued
    accept-count: 100      # OS-level backlog queue
    connection-timeout: 20000  # close idle connections after 20s

Sizing Formula

Text Only

Optimal threads = Number of CPUs * Target CPU utilization * (1 + Wait time / Service time)

Workload Type	Wait/Service Ratio	Formula (4 cores)	Threads
CPU-bound (computation)	0	4 * 1.0 * (1 + 0)	4-8
Balanced (typical API)	1	4 * 0.8 * (1 + 1)	~6-12
I/O-bound (DB calls)	5-10	4 * 0.8 * (1 + 5)	~20-50
Highly I/O-bound (external APIs)	10-50	4 * 0.8 * (1 + 20)	~70-200

Virtual Threads (Java 21+ / Spring Boot 3.2+)

YAML

# Enable virtual threads — replaces Tomcat thread pool with virtual threads
spring:
  threads:
    virtual:
      enabled: true

Java

// Before: @Async with bounded thread pool
@Async("taskExecutor")
public CompletableFuture<Result> processAsync() { ... }

// After: Virtual threads — unbounded, lightweight, cheap to create
// No pool sizing needed — each request gets its own virtual thread

When to Use Virtual Threads

Use when: Your workload is I/O-bound (HTTP calls, database queries, file I/O). Virtual threads shine because blocking is cheap.
Avoid when: Your workload is CPU-bound (encryption, compression, computation). Virtual threads won't help here — you're limited by cores.

Startup Time Optimization

Benchmark Comparison

Optimization	Startup Time	Memory	Trade-off
Default Spring Boot	3-8s	~250MB	None
+ Lazy initialization	1.5-4s	~200MB	First request slower
+ Class Data Sharing (CDS)	2-5s	~220MB	Requires warmup step
+ AppCDS	1.5-3s	~180MB	Rebuild on dependency change
+ Spring AOT (ahead-of-time)	1-3s	~180MB	Less dynamic features
GraalVM Native Image	0.05-0.5s	~50-80MB	Longer build, limited reflection

Configuration

YAML

# Lazy initialization — beans created on first use
spring:
  main:
    lazy-initialization: true

# Exclude auto-configurations you don't need
  autoconfigure:
    exclude:
      - org.springframework.boot.autoconfigure.mail.MailSenderAutoConfiguration
      - org.springframework.boot.autoconfigure.quartz.QuartzAutoConfiguration

Bash

# Class Data Sharing (CDS) — dump class list, then use archive
# Step 1: Generate class list
java -XX:DumpLoadedClassList=classes.lst -jar app.jar &
# Wait for startup, then kill

# Step 2: Create shared archive
java -Xshare:dump -XX:SharedClassListFile=classes.lst \
     -XX:SharedArchiveFile=app-cds.jsa -jar app.jar

# Step 3: Use archive on startup
java -Xshare:on -XX:SharedArchiveFile=app-cds.jsa -jar app.jar

Caching Strategy

Multi-Layer Cache Architecture

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    R(("Request")) --> L1{"L1: In-Process<br/>(Caffeine)"}
    L1 -->|"miss"| L2{"L2: Distributed<br/>(Redis)"}
    L2 -->|"miss"| DB[("Database")]
    DB -->|"populate L2"| L2
    L2 -->|"populate L1"| L1
    L1 -->|"hit (1μs)"| R
    L2 -->|"hit (1-3ms)"| R

    style L1 fill:#ECFDF5,stroke:#6EE7B7,color:#1E40AF
    style L2 fill:#FEF3C7,stroke:#FCD34D,color:#1E40AF
    style DB fill:#EFF6FF,stroke:#DBEAFE,color:#1E40AF

Caffeine Configuration (L1)

YAML

spring:
  cache:
    type: caffeine
    caffeine:
      spec: maximumSize=10000,expireAfterWrite=5m,recordStats

Java

@Configuration
@EnableCaching
public class CacheConfig {

    @Bean
    public CacheManager cacheManager() {
        CaffeineCacheManager manager = new CaffeineCacheManager();
        manager.setCaffeine(Caffeine.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(Duration.ofMinutes(5))
            .recordStats());  // exposes hit/miss metrics
        return manager;
    }
}

When NOT to Cache

Scenario	Why
Frequently updated data (< 5s)	Cache invalidation complexity exceeds benefit
User-specific data with high cardinality	Cache fills with millions of entries, low hit rate
Data that MUST be real-time (stock prices)	Stale data = wrong business decisions
Large objects (> 1MB per entry)	Eats heap space, causes GC pressure

HTTP & Network Optimization

WebClient vs RestTemplate

Java

// ✅ WebClient with connection pooling (non-blocking)
@Bean
public WebClient webClient() {
    HttpClient httpClient = HttpClient.create()
        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 5000)
        .responseTimeout(Duration.ofSeconds(10))
        .doOnConnected(conn -> conn
            .addHandlerLast(new ReadTimeoutHandler(10))
            .addHandlerLast(new WriteTimeoutHandler(5)));

    return WebClient.builder()
        .clientConnector(new ReactorClientHttpConnector(httpClient))
        .build();
}

Response Compression

YAML

server:
  compression:
    enabled: true
    mime-types: application/json,application/xml,text/html,text/plain
    min-response-size: 1024  # only compress responses > 1KB

HTTP/2

YAML

server:
  http2:
    enabled: true  # multiplexing, header compression, server push

Monitoring & Alerting

Key Metrics to Watch

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart TD
    subgraph RED["RED Method (for services)"]
        R["Rate<br/>requests/sec"]
        E["Errors<br/>error rate %"]
        D["Duration<br/>p50, p95, p99 latency"]
    end

    subgraph USE["USE Method (for resources)"]
        U["Utilization<br/>% capacity used"]
        S["Saturation<br/>queue depth"]
        Er["Errors<br/>failure count"]
    end

    style R fill:#ECFDF5,stroke:#6EE7B7,color:#1E40AF
    style E fill:#FEE2E2,stroke:#FCA5A5,color:#1E40AF
    style D fill:#FEF3C7,stroke:#FCD34D,color:#1E40AF

Alert Thresholds That Matter

Metric	Warning	Critical	Action
p99 latency	> 500ms	> 2s	Check DB, downstream services
Error rate	> 1%	> 5%	Check logs, recent deployments
CPU utilization	> 70%	> 90%	Scale horizontally
Heap utilization	> 80%	> 95%	GC tuning, memory leak hunt
Connection pool active	> 80%	> 95%	Increase pool or fix leaks
Thread pool queue depth	> 50	> 200	More threads or backpressure
GC pause time	> 200ms	> 1s	Switch to ZGC or tune G1

Micrometer + Prometheus Setup

Java

// Custom business metrics
@Component
@RequiredArgsConstructor
public class OrderMetrics {
    private final MeterRegistry registry;

    public void recordOrderPlaced(String region, double amount) {
        registry.counter("orders.placed", "region", region).increment();
        registry.summary("orders.amount", "region", region).record(amount);
    }

    public void recordOrderLatency(String operation, Duration duration) {
        registry.timer("orders.latency", "operation", operation)
                .record(duration);
    }
}

Production Checklist

Category	Setting	Recommended	Why
JVM	Heap size	50-75% container RAM	Avoid OOMKill
JVM	GC algorithm	G1GC (general) / ZGC (low-latency)	Latency requirements
JVM	`-Xms` = `-Xmx`	Yes	No resize pauses
Pool	HikariCP `maximum-pool-size`	10-20	More connections ≠ more perf
Pool	`leak-detection-threshold`	60s	Catch connection leaks early
Threads	Tomcat `max-threads`	200 (I/O) / CPU*2 (compute)	Match workload type
Threads	Virtual Threads (Java 21+)	Enable for I/O	Removes thread pool sizing
Cache	L1 (Caffeine)	5min TTL, 10K entries	Reduce DB calls
Network	Compression	Enable for JSON > 1KB	60-80% smaller responses
Network	HTTP/2	Enable	Multiplexing, less overhead
Startup	Lazy init	Enable in containers	Faster pod scaling
Monitoring	Actuator + Prometheus	Expose `/metrics`	Know before users complain
Resilience	Graceful shutdown	`server.shutdown=graceful`	Drain in-flight requests
Security	Actuator endpoints	Secure with Spring Security	Don't expose `/heapdump` publicly
Logging	Log level	WARN in prod, DEBUG via env	Reduce I/O overhead
JPA	`open-in-view`	false	Prevent N+1 in controllers
JPA	Show SQL	false in prod	Logging overhead

Production Performance Tuning

JVM Tuning for Spring Boot

Heap Sizing

GC Algorithm Selection

Connection Pool Tuning (HikariCP)

The Golden Formula

Production Configuration

Pool Sizing by Workload

Thread Pool Configuration

Tomcat Thread Pool

Sizing Formula

Virtual Threads (Java 21+ / Spring Boot 3.2+)

Startup Time Optimization

Benchmark Comparison

Configuration

Caching Strategy

Multi-Layer Cache Architecture

Caffeine Configuration (L1)

When NOT to Cache

HTTP & Network Optimization

WebClient vs RestTemplate

Response Compression

HTTP/2

Monitoring & Alerting

Key Metrics to Watch

Alert Thresholds That Matter

Micrometer + Prometheus Setup

Production Checklist

5-Minute System Design — Weekly