A practical, step‑by‑step playbook to go from “it’s slow” to a confident fix—built for modern .NET 8 services.
Quick Takeaways
- Start with a baseline (clear repro, SLIs, and a stable test) before touching code.
- Use dotnet-counters to triage, then dotnet-trace/speedscope for CPU and dotnet-gcdump/dotnet-dump for memory.
- Watch allocation rate & GC first—most perf issues are allocation pressure, not raw CPU.
- Actionable alerts & SLOs keep you honest; error budgets guide when to ship vs. stabilize.
- Validate fixes with the same SLIs and load profile; add guards to prevent regressions.
End‑to‑End Workflow
Move from observation to action with a reliable sequence. Each step narrows the search space and protects production.
- Confirm impact: Which SLI regressed (latency, error rate, saturation)? Why: Ensures you solve what users feel.
- Baseline: Reproduce under controlled load; record CPU%, allocation rate, GC, p95. Why: You need a “before” to validate an “after.”
- Triage: Use dotnet-counters to see hotspots (CPU, GC time, threadpool). Why: Quick signal without heavy overhead.
- Deep dive:
- CPU: dotnet-trace + speedscope or PerfView.
- Allocations: dotnet-gcdump.
- Leaks: dotnet-dump + SOS commands.
- Fix & verify: Apply minimal change(s); re‑run the same baseline; watch SLIs. Why: Detect overfitting and regressions.
- Prevent regressions: Add tests, alerts, and dashboards. Why: Turn the incident into guardrails.
Observability Foundation (metrics, logs, traces)
Invest here first so you can diagnose with confidence:
- Metrics (fast signals): request rate, latency p50/p95/p99, error %, saturation (CPU, memory, threadpool queue), GC % time, allocation rate.
- Logs (context): structure them (JSON), add correlation IDs, avoid high‑cardinality spam.
- Traces (causal path): distributed trace from ingress through dependencies; tag with endpoint, tenant, DB/HTTP spans, and payload sizes.
Why: Metrics tell that something’s wrong, traces tell where, logs tell why.
Dashboards (service health, runtime, dependencies)
Build three layers; link panels to drill down.
- Service health: RPS, p50/p95/p99 latency, error rate, timeouts, saturation.
- Runtime: CPU, working set, GC % time, gen0/1/2, LOH size, allocation rate, thread count, threadpool queue length, exceptions/sec.
- Dependencies: per‑DB and per‑HTTP service p95, error %; connection pool usage; retry counts.
Why: Quickly isolate “is it us or a dependency?” and then “is it CPU, memory, or I/O?”
Alerting Strategy (actionable alerts)
- Alert on SLO burn, not raw metrics. Example: p95 > 300 ms for 10 minutes and consuming > 2% error budget/hr.
- Symptom + supporting signals: pair latency with CPU>80% or GC% time>30% to avoid noise.
- Runbooks: Each alert links to a dashboard & a first‑response checklist.
- Dedup & silence: short quiet windows to prevent paging storms.
Why: Actionable alerts reduce fatigue and lead directly to the fix path.
SLIs, SLOs & Error Budgets
- SLIs: Availability, success rate, p95 latency, saturation (CPU, memory), dependency p95.
- SLOs: e.g., 99.9% success, 95th < 250 ms over a rolling 30 days.
- Error budget: 0.1% of minutes may be bad; if burn rate is high, freeze feature work and improve reliability.
Why: Aligns engineering effort to user impact, prioritizing fixes over new features when needed.
Load Testing & Capacity Planning
- Define target profile: normal, peak, and stress RPS; payload sizes; mix of endpoints.
- Ramp tests: slowly increase RPS; watch p95/p99, CPU, GC% time, allocation rate, and errors.
- Soak tests: multi‑hour run to expose leaks, fragmentation, or connection churn.
- Capacity model: pick a saturation threshold (e.g., CPU ≤ 70%, GC% time ≤ 10%) and compute headroom.
- Autoscale policies: scale out on RPS per instance or latency; scale in conservatively.
Why: Validates that fixes hold under real‑world conditions—not just microbenchmarks.
Cold Starts & Startup Profiling
- Measure: log time to process start → app ready; trace DI container build and EF model init.
- Optimize: enable
ReadyToRun(R2R), trim unused code, lazy‑init heavy singletons, pre‑warm caches. - NativeAOT (optional): for minimal APIs & tools, NativeAOT can reduce startup and memory footprint.
Why: Faster cold starts reduce time to recover and improve scale‑out latency.
Step 1: Baseline the Problem
Goal: Create a repeatable scenario with clear success criteria.
- Pick 1–3 critical endpoints; define input sizes and headers; fix RPS patterns.
- Warm up for 2–5 min to stabilize tiered JIT and caches.
- Collect: RPS, p50/95/99 latency, CPU%, GC% time, allocation rate (MB/s), exceptions/sec, thread count.
Why: Without a trustworthy baseline, you can’t validate improvements.
Step 2: CPU Bottlenecks
Signals: high CPU%, low GC% time, high p95 latency. Often due to inefficient algorithms, heavy JSON, or excessive LINQ.
- Triage with dotnet-counters: confirm CPU is the limiting factor.
- Profile with dotnet-trace and open in speedscope.
- Fixes: reduce allocations in hot paths, use
System.Text.Jsonsource‑gen, cache expensive results, micro‑optimize critical methods.
Why: Flame graphs reveal where time is truly spent, not where you suspect.
Step 3: Allocation Pressure
Signals: high allocation rate, elevated GC% time, frequent gen0/1; latency spikes when GC triggers.
- Use dotnet-counters to watch Allocation Rate, GC Heap Size, GC% Time.
- Capture dotnet-gcdump to see top types and stacks allocating most bytes.
- Fixes: reuse buffers (
ArrayPool<T>), avoid boxing, useSpan<T>/Memory<T>, pool serializers, preferStringBuilderover concatenation in loops.
Why: Reducing allocations lowers GC pressure and latency variability.
Step 4: Memory Leaks
Signals: working set climbs over time, LOH grows, full GCs increase, OOM kills.
- Detect: run a soak test; watch heap size trend.
- Capture a dump with dotnet-dump or take periodic gcdump snapshots.
- Analyze: look for static roots, event handlers not unsubscribed, caches without TTL, long‑lived timers, pinned buffers.
- Fixes: weak references for caches,
IDisposablefor native handles, unsubscribe events, bound collections.
Why: Leaks hide in object graphs; root analysis surfaces the culprit holders.
GC Tuning & LOH Considerations
- Server GC for services (
DOTNET_GCServer=1or inruntimeconfig.json): better throughput on multi‑core servers. - Latency: set
GCSettings.LatencyMode(SustainedLowLatency during critical windows) judiciously. - LOH (> ~85 KB objects): minimize large object churn; chunk big buffers; reuse via pools.
- Heap limits in containers: rely on cgroups; optionally set
DOTNET_GCHeapHardLimitto cap.
Why: Balanced GC settings reduce pause times without starving throughput.
ThreadPool & Sync‑over‑Async
- Watch: threadpool queue length, worker threads, Requests in Application Queue.
- Avoid:
.Result/.Wait()on async work; it causes deadlocks and thread starvation. - Set minimums:
ThreadPool.SetMinThreads(worker, io)for bursty loads after measuring. - Use async end‑to‑end across the call graph; use
ConfigureAwait(false)for library code.
Why: Starved threadpools manifest as high latency with low CPU—a classic trap.
I/O and External Dependencies
- HTTP: use
IHttpClientFactory, set timeouts, enable HTTP/2 when possible; monitor connection pools. - gRPC: prefer unary/streaming appropriately; tune max streams per connection.
- Retries: apply jittered backoff and timeouts (avoid retry storms on brownouts).
- Bulkheads & circuit breakers: confine blast radius when a dependency degrades.
Why: Many “CPU problems” are actually waiting on slow I/O or congested pools.
Database Query Analysis
- Surface slow queries via tracing and per‑DB dashboards; tag with endpoint and tenant.
- EF Core: log generated SQL, use
AsNoTracking(),.TagWith(), avoid N+1, prefer projections (Select). - Indexes: validate via
EXPLAIN/EXPLAIN ANALYZEor execution plans; ensure selectivity. - Pooling: watch pool usage, timeouts, and waits; right‑size connection pools.
Why: Query shape and indexes dominate latency at scale.
Caching Strategy
- What to cache: pure computations, stable lookups, rendered fragments, tokens/claims.
- Where:
IMemoryCachefor per‑node hot data;IDistributedCache/Redis for shared state. - Policy: TTLs, size limits, eviction, stampede protection (single‑flight), cache versioning.
- Validation: expose cache hit rate and latency; treat cache misses as first‑class metrics.
Why: Effective caching can be the single largest throughput multiplier.
Containers & Runtime Limits
- CPU limits: .NET respects cgroups; choose requests for baseline and limits to prevent noisy neighbors.
- Memory limits: monitor working set vs. limit; avoid close‑to‑cap thrash; consider
DOTNET_GCHeapHardLimitwith care. - GC threads: ensure enough cores; server GC scales with CPU count.
- Startup: trim images, pre‑pull, and use R2R to minimize cold start in orchestrators.
Why: Right‑sized limits prevent OOM kills and CPU throttling that masquerade as app bugs.
Production‑Safe Diagnostics
- dotnet-counters & EventCounters: low overhead, safe to run during incidents.
- dotnet-trace (short windows): sample CPU; keep collections brief to limit overhead.
- dotnet-gcdump: heap stats with minimal disruption.
- dotnet-dump: targeted dumps on cgroup pressure or OOM signals.
- dotnet-monitor sidecar: on‑demand traces, dumps, metrics, and collection rules.
Why: Safe tools let you learn from production without making it worse.
Validate Fixes & Prevent Regression
- Re‑run the baseline with identical load; compare SLIs and counters.
- Canary the change (5–10% traffic) and watch SLO burn & dependency p95.
- Add tests: microbenchmarks (BenchmarkDotNet), load tests in CI, and synthetic probes.
- Guardrails: alerts on allocation rate spikes, threadpool queue growth, and error rates.
Why: The best fix includes proof and protection.
Tooling Map
| Symptom | Primary Tool | Overhead | What You Get | When to Use |
|---|---|---|---|---|
| High latency, unknown cause | dotnet-counters |
Very low | CPU, GC% time, allocation rate, threads | First triage |
| Suspected CPU hot path | dotnet-trace + speedscope |
Low–moderate (short) | Flame graph of hot stacks | Identify heavy methods |
| High allocations/GC churn | dotnet-gcdump |
Low | Top allocating types & callsites | Reduce allocation pressure |
| Leak/OOM investigation | dotnet-dump |
Moderate (point‑in‑time) | Object graph, roots, LOH usage | Find leaks & roots |
| Prod capture automation | dotnet-monitor |
Low | Collection rules, export traces/dumps | Safe production diagnostics |
OpenTelemetry Setup Snippet
Add metrics & tracing to illuminate bottlenecks end‑to‑end.
// Program.cs (.NET 8)
using OpenTelemetry;
using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddOpenTelemetry()
.ConfigureResource(rb => rb.AddService(
serviceName: "MyService",
serviceVersion: "1.0.0"))
.WithMetrics(mb => mb
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddRuntimeInstrumentation()
.AddProcessInstrumentation()
.AddMeter("Microsoft.AspNetCore.Hosting", "System.Net.Http", "System.Runtime")
.AddOtlpExporter())
.WithTracing(tb => tb
.AddAspNetCoreInstrumentation(o => { o.RecordException = true; o.Enrich = (activity, eventName, obj) => { /* add tags */ }; })
.AddHttpClientInstrumentation()
.AddSqlClientInstrumentation(o => { o.SetDbStatementForText = true; o.RecordException = true; })
.AddOtlpExporter());
var app = builder.Build();
app.MapGet("/health", () => Results.Ok(new { status = "ok" }));
app.Run();
// Environment (deployment)
// OTEL_EXPORTER_OTLP_ENDPOINT=https://otel-collector:4317
// OTEL_RESOURCE_ATTRIBUTES=service.name=MyService,service.version=1.0.0,service.instance.id=$(HOSTNAME)
Why: Standard telemetry makes your diagnostics repeatable and tool‑agnostic.
CLI Tooling Snippets
dotnet-counters
Triage runtime health in real time.
# List processes
dotnet-counters ps
# Monitor key providers
dotnet-counters monitor -p <pid> \
System.Runtime Microsoft.AspNetCore.Hosting System.Net.Http
# Example: CSV collection for 2 minutes
dotnet-counters collect -p <pid> --duration 00:02:00 -o counters.csv
dotnet-trace
Capture CPU profiles and open in speedscope.
# List processes
dotnet-trace ps
# CPU profile; save as speedscope JSON
dotnet-trace collect -p <pid> --profile cpu --format speedscope -o trace.speedscope.json
# Open trace.speedscope.json in https://www.speedscope.app
dotnet-gcdump
Analyze allocation patterns and heap composition.
# Collect a GC dump
dotnet-gcdump collect -p <pid> -o heap.gcdump
# Analyze (top types, stacks)
dotnet-gcdump analyze heap.gcdump
dotnet-dump
Investigate leaks and OOMs via object graphs & roots.
# Capture a memory dump
dotnet-dump collect -p <pid> -o crash.dmp
# Interactive analysis
dotnet-dump analyze crash.dmp
# Inside the analyzer (SOS):
# > clrstack
# > dumpheap -stat
# > dumpheap -type MyType
# > gcroot <object-address>
Optional: Startup attach via diagnostic port (containers)
# In container spec
ENV DOTNET_DiagnosticPorts=/diag/port,suspend
# Then attach from sidecar/host:
dotnet-trace collect --diagnostic-port /diag/port --profile cpu -o startup.speedscope.json
0 Comments