3 min read By Vamsi Karuturi · Senior Backend Engineer at Salesforce

Java Profiling Tools

Production War Story

A payment service hit 5-second latency spikes every 30 minutes. Traditional monitoring showed only "high CPU." Attaching async-profiler in production revealed 80% of CPU was spent in a single regex pattern (Pattern.compile called inside a hot loop). One line fix — pre-compile the pattern — dropped P99 latency from 5s to 12ms.

Profiling Categories

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
graph LR
    P[Java Profiling] --> CPU[CPU Profiling]
    P --> MEM[Memory Profiling]
    P --> THR[Thread Profiling]
    P --> IO[I/O Profiling]
    P --> GC[GC Profiling]

    CPU --> C1[Hot methods]
    CPU --> C2[Flame graphs]
    MEM --> M1[Allocation sites]
    MEM --> M2[Heap dumps]
    THR --> T1[Contention]
    THR --> T2[Deadlocks]
    IO --> I1[File & socket]
    IO --> I2[Latency tracing]
    GC --> G1[Pause analysis]
    GC --> G2[Promotion rates]

    style P fill:#DBEAFE,stroke:#93C5FD,color:#1E40AF
    style CPU fill:#D1FAE5,stroke:#6EE7B7,color:#065F46
    style MEM fill:#FEF3C7,stroke:#FCD34D,color:#92400E
    style THR fill:#FEE2E2,stroke:#FCA5A5,color:#991B1B
    style IO fill:#EFF6FF,stroke:#93C5FD,color:#1E40AF
    style GC fill:#ECFDF5,stroke:#6EE7B7,color:#065F46
    style C1 fill:#D1FAE5,stroke:#6EE7B7,color:#065F46
    style C2 fill:#D1FAE5,stroke:#6EE7B7,color:#065F46
    style M1 fill:#FEF3C7,stroke:#FCD34D,color:#92400E
    style M2 fill:#FEF3C7,stroke:#FCD34D,color:#92400E
    style T1 fill:#FEE2E2,stroke:#FCA5A5,color:#991B1B
    style T2 fill:#FEE2E2,stroke:#FCA5A5,color:#991B1B
    style I1 fill:#EFF6FF,stroke:#93C5FD,color:#1E40AF
    style I2 fill:#EFF6FF,stroke:#93C5FD,color:#1E40AF
    style G1 fill:#ECFDF5,stroke:#6EE7B7,color:#065F46
    style G2 fill:#ECFDF5,stroke:#6EE7B7,color:#065F46

Category	What It Reveals	Key Metrics
CPU	Hot methods, inefficient algorithms	Samples/sec, self-time, total-time
Memory	Allocation pressure, leaks	Allocation rate (MB/s), live set size
Thread	Contention, deadlocks, parking	Lock wait time, blocked thread count
I/O	Slow disk/network calls	Bytes read/written, latency per call
GC	Pause frequency, promotion rates	GC pause ms, throughput %, tenuring

Java Flight Recorder (JFR)

JFR is a zero-overhead event-based profiling framework built into the JVM (HotSpot). It was made free and open-source in Java 11 (backported to 8u262+). JFR is designed to be always-on in production with less than 1% overhead.

Architecture

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
graph LR
    APP[Application Code] --> EVT[JFR Events]
    JVM[JVM Internals] --> EVT
    OS[OS Events] --> EVT
    EVT --> BUF[Thread-Local\nBuffers]
    BUF --> GLOB[Global Buffer]
    GLOB --> DISK[.jfr File\nor Stream]
    DISK --> JMC[JMC Analysis]
    DISK --> API[JFR Event\nStreaming API]

    style APP fill:#DBEAFE,stroke:#93C5FD,color:#1E40AF
    style JVM fill:#D1FAE5,stroke:#6EE7B7,color:#065F46
    style OS fill:#FEF3C7,stroke:#FCD34D,color:#92400E
    style EVT fill:#EFF6FF,stroke:#93C5FD,color:#1E40AF
    style BUF fill:#ECFDF5,stroke:#6EE7B7,color:#065F46
    style GLOB fill:#FFFBEB,stroke:#FCD34D,color:#92400E
    style DISK fill:#FEF2F2,stroke:#FCA5A5,color:#991B1B
    style JMC fill:#D1FAE5,stroke:#6EE7B7,color:#065F46
    style API fill:#DBEAFE,stroke:#93C5FD,color:#1E40AF

JFR Event Categories

Category	Example Events	Use Case
CPU	`jdk.ExecutionSample`, `jdk.CPULoad`	Find hot methods
Allocation	`jdk.ObjectAllocationInNewTLAB`, `jdk.ObjectAllocationOutsideTLAB`	Allocation profiling
I/O	`jdk.FileRead`, `jdk.FileWrite`, `jdk.SocketRead`	Identify slow I/O
GC	`jdk.GarbageCollection`, `jdk.GCPhasePause`	GC pause analysis
Thread	`jdk.ThreadPark`, `jdk.JavaMonitorWait`	Contention detection
Custom	User-defined `@Name` events	Business-specific metrics

Enabling JFR

JVM Flags (startup)jcmd (live, no restart)Programmatic (Java API)

Bash

# Start recording at JVM launch — continuous ring buffer
java -XX:StartFlightRecording=duration=60s,filename=recording.jfr \
     -jar myapp.jar

# Always-on with max-age ring buffer (production)
java -XX:StartFlightRecording=disk=true,maxage=12h,maxsize=1g,\
dumponexit=true,filename=/var/log/jfr/app.jfr \
     -jar myapp.jar

Bash

# Start recording on a running JVM
jcmd <pid> JFR.start name=profile duration=60s filename=profile.jfr

# Dump current ring buffer without stopping
jcmd <pid> JFR.dump name=profile filename=snapshot.jfr

# Stop recording
jcmd <pid> JFR.stop name=profile

# Check active recordings
jcmd <pid> JFR.check

Java

import jdk.jfr.*;

// Start a recording programmatically
try (Recording recording = new Recording()) {
    recording.enable("jdk.CPULoad").withPeriod(Duration.ofSeconds(1));
    recording.enable("jdk.ExecutionSample").withPeriod(Duration.ofMillis(10));
    recording.enable("jdk.ObjectAllocationInNewTLAB");
    recording.setDestination(Path.of("app-profile.jfr"));
    recording.start();

    // ... application runs ...
    Thread.sleep(Duration.ofMinutes(1));
}

Custom JFR Events

Java

import jdk.jfr.*;

@Name("com.myapp.OrderProcessed")
@Label("Order Processed")
@Category({"Application", "Orders"})
@StackTrace(false)
public class OrderEvent extends Event {
    @Label("Order ID")
    public String orderId;

    @Label("Processing Time (ms)")
    @Timespan(Timespan.MILLISECONDS)
    public long processingTime;

    @Label("Item Count")
    public int itemCount;
}

// Usage in application code
OrderEvent event = new OrderEvent();
event.begin();
// ... process order ...
event.orderId = order.getId();
event.processingTime = elapsed;
event.itemCount = order.getItems().size();
event.commit();  // recorded only if event is enabled

JFR Event Streaming (Java 14+)

Java

import jdk.jfr.consumer.*;

// Stream events in real-time without writing to disk
try (RecordingStream rs = new RecordingStream()) {
    rs.enable("jdk.CPULoad").withPeriod(Duration.ofSeconds(1));
    rs.enable("jdk.GarbageCollection");

    rs.onEvent("jdk.CPULoad", event -> {
        float machineTotal = event.getFloat("machineTotal");
        if (machineTotal > 0.8) {
            System.out.printf("HIGH CPU: %.1f%%%n", machineTotal * 100);
        }
    });

    rs.onEvent("jdk.GarbageCollection", event -> {
        Duration pause = event.getDuration("longestPause");
        if (pause.toMillis() > 200) {
            System.out.printf("LONG GC PAUSE: %d ms%n", pause.toMillis());
        }
    });

    rs.startAsync();  // non-blocking
}

JFR Event Streaming Use Cases

Custom alerting: Trigger PagerDuty when allocation rate exceeds threshold
Live dashboards: Stream JFR events to Prometheus/Grafana via Micrometer
Adaptive tuning: Automatically adjust thread pool size based on contention events

Java Mission Control (JMC)

JMC is the GUI analysis tool for JFR recordings. It provides rich visualizations for identifying performance bottlenecks.

Key Analysis Views

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
graph LR
    JFR[.jfr Recording] --> JMC[Java Mission\nControl]
    JMC --> FG[Flame Graph]
    JMC --> HM[Hot Methods]
    JMC --> ALLOC[Allocation\nSites]
    JMC --> GCV[GC Analysis]
    JMC --> THR[Thread\nAnalysis]
    JMC --> IO[I/O\nTimeline]

    style JFR fill:#FEF3C7,stroke:#FCD34D,color:#92400E
    style JMC fill:#DBEAFE,stroke:#93C5FD,color:#1E40AF
    style FG fill:#D1FAE5,stroke:#6EE7B7,color:#065F46
    style HM fill:#D1FAE5,stroke:#6EE7B7,color:#065F46
    style ALLOC fill:#FEF3C7,stroke:#FCD34D,color:#92400E
    style GCV fill:#ECFDF5,stroke:#6EE7B7,color:#065F46
    style THR fill:#FEE2E2,stroke:#FCA5A5,color:#991B1B
    style IO fill:#EFF6FF,stroke:#93C5FD,color:#1E40AF

View	What It Shows	Look For
Flame Graph	Call stack depth + sample counts	Wide plateaus = CPU hogs
Hot Methods	Top methods by self-time	Methods consuming >5% CPU
Allocation	Object allocation by type & site	High allocation rate = GC pressure
GC	Pause durations, causes, generations	Long pauses, frequent full GCs
Threads	Thread states over time	Blocked/waiting threads
I/O	File/socket reads/writes timeline	High latency I/O calls

Flame Graph Interpretation

Text Only

┌──────────────────────────────────────────────────────────────────┐
│                         main()                                    │
├───────────────────────────────┬──────────────────────────────────┤
│      processOrders()          │       handleMetrics()            │
├──────────────────┬────────────┤                                  │
│  validateOrder() │ saveOrder()│                                  │
├──────────────────┤            │                                  │
│ REGEX MATCH (80%)│            │                                  │
└──────────────────┴────────────┴──────────────────────────────────┘
  ← wider = more CPU time →

Reading Flame Graphs

X-axis: Proportion of samples (NOT time order)
Y-axis: Call stack depth (bottom = entry point)
Wide bars at top: Self-time hogs (optimize these first)
Narrow deep stacks: Deep call chains but little CPU each

VisualVM

VisualVM is a free, lightweight profiling tool that combines JConsole, jstack, jmap, and jstat in a single GUI. Good for development and staging but has higher overhead than JFR.

Capabilities

Feature	Description	Overhead
CPU Sampling	Periodic stack sampling (configurable rate)	Low-medium
CPU Instrumentation	Bytecode injection for exact call counts	High (2-10x slowdown)
Memory Sampling	Track allocations by class	Medium
Heap Dump	Full snapshot of live objects	Pause (seconds for large heaps)
Thread Dump	Snapshot of all thread stacks	Instant
GC Monitoring	Real-time heap usage & GC activity	Negligible
MBean Browser	JMX MBean inspection	Negligible

Usage Patterns

Bash

# Launch VisualVM (ships with JDK or download standalone)
jvisualvm

# Connect to remote JVM via JMX
# (target JVM needs these flags)
java -Dcom.sun.management.jmxremote \
     -Dcom.sun.management.jmxremote.port=9010 \
     -Dcom.sun.management.jmxremote.authenticate=false \
     -Dcom.sun.management.jmxremote.ssl=false \
     -jar myapp.jar

When to Use VisualVM

Development: Quick CPU/memory profiling during coding
Heap dump analysis: Finding memory leaks (retained size, GC roots)
Thread analysis: Diagnosing deadlocks and contention
Remote monitoring: Watching JMX metrics on staging servers

VisualVM Limitations

CPU instrumentation causes significant overhead (not for production)
Sampling suffers from safe-point bias (only samples at safe points)
Cannot profile native methods or JIT-compiled code accurately
Not recommended for production use — prefer JFR or async-profiler

async-profiler

async-profiler is a low-overhead sampling profiler that does NOT suffer from safe-point bias. It uses Linux perf_events and macOS dtrace to collect stack traces at any point in execution, not just at JVM safe points.

Why async-profiler Over JMC/VisualVM

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
graph LR
    subgraph Safe-Point Bias
        direction LR
        SP1[JVM pauses all\nthreads at safe point] --> SP2[Samples only at\nsafe points]
        SP2 --> SP3[Misses hot code\nbetween safe points]
    end
    subgraph async-profiler
        direction LR
        AP1[OS signal\ninterrupts thread] --> AP2[Samples at ANY\ninstruction]
        AP2 --> AP3[Accurate picture\nof all hot code]
    end

    style SP1 fill:#FEE2E2,stroke:#FCA5A5,color:#991B1B
    style SP2 fill:#FEE2E2,stroke:#FCA5A5,color:#991B1B
    style SP3 fill:#FEE2E2,stroke:#FCA5A5,color:#991B1B
    style AP1 fill:#D1FAE5,stroke:#6EE7B7,color:#065F46
    style AP2 fill:#D1FAE5,stroke:#6EE7B7,color:#065F46
    style AP3 fill:#D1FAE5,stroke:#6EE7B7,color:#065F46

Profiling Modes

Mode	Flag	What It Captures
CPU	`-e cpu`	On-CPU execution samples
Wall-clock	`-e wall`	All threads (including blocked/sleeping)
Allocation	`-e alloc`	Object allocation by size and site
Lock	`-e lock`	Lock contention (monitor enter)
Cache misses	`-e cache-misses`	CPU cache miss events

Usage

Bash

# Download and extract
wget https://github.com/async-profiler/async-profiler/releases/latest/download/async-profiler-linux-x64.tar.gz
tar xzf async-profiler-linux-x64.tar.gz

# CPU profiling for 30 seconds — output flame graph HTML
./asprof -d 30 -f flamegraph.html <pid>

# Wall-clock profiling (includes sleeping/blocked threads)
./asprof -e wall -d 30 -f wall-flamegraph.html <pid>

# Allocation profiling — find allocation hot spots
./asprof -e alloc -d 60 -f alloc-flamegraph.html <pid>

# Lock contention profiling
./asprof -e lock -d 30 -f lock-flamegraph.html <pid>

# Start/stop mode (profile a specific operation)
./asprof start -e cpu <pid>
# ... trigger the operation ...
./asprof stop -f result.html <pid>

# Profile from JVM startup (as a Java agent)
java -agentpath:/path/to/libasyncProfiler.so=start,event=cpu,file=startup.html \
     -jar myapp.jar

Flame Graph Output

async-profiler generates interactive HTML flame graphs directly — no external tools needed.

async-profiler Best Practices

Use -e wall for latency issues (includes time waiting on I/O, locks, sleep)
Use -e cpu for throughput issues (only on-CPU time)
Use -e alloc when GC is the bottleneck (find allocation-heavy code)
Combine with --jfrsync to correlate with JFR events
Safe for production: typical overhead <2%

CLI Tools Quick Reference

jcmd — Swiss Army Knife

Bash

# List all Java processes
jcmd -l

# Thread dump (better than jstack — no ptrace needed)
jcmd <pid> Thread.print

# Heap dump
jcmd <pid> GC.heap_dump /tmp/heap.hprof

# Heap histogram (top memory consumers)
jcmd <pid> GC.class_histogram | head -20

# Force garbage collection
jcmd <pid> GC.run

# Get VM flags
jcmd <pid> VM.flags

# Get system properties
jcmd <pid> VM.system_properties

# JFR operations (see JFR section above)
jcmd <pid> JFR.start name=rec duration=60s filename=out.jfr

jstack — Thread Dumps

Bash

# Basic thread dump
jstack <pid>

# Force thread dump (even if JVM is hung)
jstack -F <pid>

# Include locks info
jstack -l <pid>

# Detect deadlocks
jstack <pid> | grep -A 5 "deadlock"

jmap — Memory Inspection

Bash

# Heap summary
jmap -heap <pid>

# Heap histogram (live objects only)
jmap -histo:live <pid> | head -30

# Generate heap dump
jmap -dump:live,format=b,file=heap.hprof <pid>

# Class loader stats
jmap -clstats <pid>

jstat — GC Statistics

Bash

# GC summary every 1 second, 10 samples
jstat -gcutil <pid> 1000 10

# Output:
#   S0     S1     E      O      M     CCS    YGC  YGCT   FGC  FGCT   CGC  CGCT    GCT
#  0.00  98.12  45.67  72.34  95.21  92.14   234  1.45     3  0.89    12  0.12    2.46

# GC capacity
jstat -gccapacity <pid> 1000 5

# New generation stats
jstat -gcnew <pid> 1000

# Compilation stats
jstat -compiler <pid>

Column	Meaning
`S0`, `S1`	Survivor space 0/1 utilization (%)
`E`	Eden space utilization (%)
`O`	Old generation utilization (%)
`M`	Metaspace utilization (%)
`YGC` / `YGCT`	Young GC count / total time (s)
`FGC` / `FGCT`	Full GC count / total time (s)

When to Use Which Tool

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
graph LR
    Q1{Production\nenvironment?}
    Q1 -->|Yes| Q2{Need always-on?}
    Q1 -->|No| Q3{Deep analysis\nneeded?}
    Q2 -->|Yes| JFR[JFR + JMC]
    Q2 -->|No| AP[async-profiler]
    Q3 -->|Yes| VVM[VisualVM]
    Q3 -->|No| CLI[jcmd / jstat]

    style Q1 fill:#DBEAFE,stroke:#93C5FD,color:#1E40AF
    style Q2 fill:#EFF6FF,stroke:#93C5FD,color:#1E40AF
    style Q3 fill:#FFFBEB,stroke:#FCD34D,color:#92400E
    style JFR fill:#D1FAE5,stroke:#6EE7B7,color:#065F46
    style AP fill:#D1FAE5,stroke:#6EE7B7,color:#065F46
    style VVM fill:#FEF3C7,stroke:#FCD34D,color:#92400E
    style CLI fill:#FEE2E2,stroke:#FCA5A5,color:#991B1B

Scenario	Tool	Reason
Production CPU spike	async-profiler	Low overhead, no safe-point bias, instant flame graph
Always-on monitoring	JFR (continuous)	<1% overhead, ring buffer, event streaming
Memory leak investigation	jcmd heap dump + JMC	Full object graph, GC root analysis
GC tuning	JFR + jstat	Detailed GC event data + real-time stats
Deadlock detection	jstack / jcmd Thread.print	Instant thread state snapshot
Development profiling	VisualVM	Easy GUI, no setup, good enough for dev
Allocation profiling	async-profiler `-e alloc`	Shows exact allocation sites without bias
Lock contention	async-profiler `-e lock`	Identifies contested monitors
Startup performance	async-profiler (agent mode)	Profile from JVM bootstrap
Latency debugging	async-profiler `-e wall`	Includes waiting/blocked time

Production Profiling Best Practices

Golden Rules

Always-on JFR — Enable JFR with ring buffer on every production JVM. Zero cost until you need it.
Keep overhead < 2% — Never use instrumentation-based profiling in production.
Profile under load — A profiler on an idle system tells you nothing useful.
Compare baselines — Save recordings from "healthy" state to diff against incidents.
Automate collection — Trigger JFR dumps automatically on high CPU or GC alerts.

Continuous Profiling Architecture

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
graph LR
    JVM1[JVM Pod 1\nJFR always-on] --> AGENT[Profiling Agent]
    JVM2[JVM Pod 2\nJFR always-on] --> AGENT
    JVM3[JVM Pod 3\nJFR always-on] --> AGENT
    AGENT --> STORE[Profile Store\nPyroscope/Datadog]
    STORE --> DASH[Dashboard\nFlame Graph Diff]
    DASH --> ALERT[Alert on\nRegression]

    style JVM1 fill:#DBEAFE,stroke:#93C5FD,color:#1E40AF
    style JVM2 fill:#DBEAFE,stroke:#93C5FD,color:#1E40AF
    style JVM3 fill:#DBEAFE,stroke:#93C5FD,color:#1E40AF
    style AGENT fill:#D1FAE5,stroke:#6EE7B7,color:#065F46
    style STORE fill:#FEF3C7,stroke:#FCD34D,color:#92400E
    style DASH fill:#ECFDF5,stroke:#6EE7B7,color:#065F46
    style ALERT fill:#FEE2E2,stroke:#FCA5A5,color:#991B1B

Production-Ready JFR Configuration

Bash

# Always-on JFR for every production JVM
java -XX:StartFlightRecording=disk=true,\
maxage=6h,maxsize=500m,\
dumponexit=true,\
filename=/var/log/jfr/app.jfr,\
settings=profile \
     -XX:FlightRecorderOptions=stackdepth=256 \
     -jar myapp.jar

Incident Response Playbook

Step	Command	Purpose
1	`jcmd <pid> JFR.dump filename=incident.jfr`	Capture current state
2	`jcmd <pid> Thread.print`	Thread dump for deadlocks
3	`jstat -gcutil <pid> 1000 5`	Check GC pressure
4	`jcmd <pid> GC.class_histogram`	Top memory consumers
5	`./asprof -d 30 -f cpu.html <pid>`	CPU flame graph
6	`./asprof -e wall -d 30 -f wall.html <pid>`	Full latency picture

Quick Recall

Concept	Key Point
JFR	Built-in, <1% overhead, always-on in production, event-based
JMC	GUI for analyzing .jfr files — flame graphs, allocations, GC
VisualVM	Dev-time profiler, easy GUI, suffers from safe-point bias
async-profiler	No safe-point bias, uses OS perf_events, production-safe
Safe-point bias	JVM can only sample at safe points — misses hot code between them
Wall-clock	Profiles ALL time (CPU + waiting) — use for latency issues
CPU profiling	Profiles only on-CPU time — use for throughput issues
jcmd	Preferred over jstack/jmap — no ptrace, same JVM user
Continuous profiling	Always-on JFR + agent sends to Pyroscope/Datadog
Flame graph	X = sample proportion (not time), Y = stack depth, wide top bars = targets

Interview Template

Sample Question: How would you diagnose a production latency spike?

1. State the approach: "I would follow a structured incident profiling workflow: observe, hypothesize, profile, validate."

2. Immediate triage:

Bash

# Check if it's CPU, GC, or I/O
jstat -gcutil <pid> 1000 5       # GC pressure?
jcmd <pid> Thread.print           # Threads blocked?
jcmd <pid> JFR.dump filename=incident.jfr  # Capture state

3. Deep profiling: "If CPU is high, I attach async-profiler with -e cpu for a flame graph. If latency is high but CPU is low, I use -e wall to capture blocked/waiting time. For memory issues, I dump the heap and analyze in JMC."

4. Key differentiators to mention: - async-profiler has no safe-point bias (unlike VisualVM/JMC sampling) - JFR is always-on in production with <1% overhead - Wall-clock profiling reveals I/O and lock contention that CPU profiling misses - Continuous profiling (Pyroscope/Datadog) enables diffing healthy vs. degraded states

5. Resolution pattern: "Once I identify the hot method or contention point, I fix the root cause (pre-compile regex, add caching, reduce lock scope, batch I/O) and validate with a before/after flame graph comparison."

Java Profiling Tools

Profiling Categories

Java Flight Recorder (JFR)

Architecture

JFR Event Categories

Enabling JFR

Custom JFR Events

JFR Event Streaming (Java 14+)

Java Mission Control (JMC)

Key Analysis Views

Flame Graph Interpretation

VisualVM

Capabilities

Usage Patterns

When to Use VisualVM

async-profiler

Why async-profiler Over JMC/VisualVM

Profiling Modes

Usage

Flame Graph Output

CLI Tools Quick Reference

jcmd — Swiss Army Knife

jstack — Thread Dumps

jmap — Memory Inspection

jstat — GC Statistics

When to Use Which Tool

Production Profiling Best Practices

Golden Rules

Continuous Profiling Architecture

Production-Ready JFR Configuration

Incident Response Playbook

Quick Recall

Interview Template

5-Minute System Design — Weekly