Deep‑Dive Guide: Identifying Performance Bottlenecks & Memory Leaks in .NET 8

Deep‑Dive Guide: Identifying Performance Bottlenecks & Memory Leaks in .NET 8

A practical, step‑by‑step playbook to go from “it’s slow” to a confident fix—built for modern .NET 8 services.

.NET 8 Performance Diagnostics Observability SRE

Quick Takeaways

  • Start with a baseline (clear repro, SLIs, and a stable test) before touching code.
  • Use dotnet-counters to triage, then dotnet-trace/speedscope for CPU and dotnet-gcdump/dotnet-dump for memory.
  • Watch allocation rate & GC first—most perf issues are allocation pressure, not raw CPU.
  • Actionable alerts & SLOs keep you honest; error budgets guide when to ship vs. stabilize.
  • Validate fixes with the same SLIs and load profile; add guards to prevent regressions.
End-to-End Performance Diagnostics Workflow Baseline SLIs & Repro Triage dotnet-counters Deep Dive trace/gcdump/dump Fix Minimal change Validate & Guard Re‑measure & Alerts

End‑to‑End Workflow

Move from observation to action with a reliable sequence. Each step narrows the search space and protects production.

  1. Confirm impact: Which SLI regressed (latency, error rate, saturation)? Why: Ensures you solve what users feel.
  2. Baseline: Reproduce under controlled load; record CPU%, allocation rate, GC, p95. Why: You need a “before” to validate an “after.”
  3. Triage: Use dotnet-counters to see hotspots (CPU, GC time, threadpool). Why: Quick signal without heavy overhead.
  4. Deep dive:
    • CPU: dotnet-trace + speedscope or PerfView.
    • Allocations: dotnet-gcdump.
    • Leaks: dotnet-dump + SOS commands.
    Why: Choose the right lens for the suspected bottleneck.
  5. Fix & verify: Apply minimal change(s); re‑run the same baseline; watch SLIs. Why: Detect overfitting and regressions.
  6. Prevent regressions: Add tests, alerts, and dashboards. Why: Turn the incident into guardrails.

Observability Foundation (metrics, logs, traces)

Invest here first so you can diagnose with confidence:

  • Metrics (fast signals): request rate, latency p50/p95/p99, error %, saturation (CPU, memory, threadpool queue), GC % time, allocation rate.
  • Logs (context): structure them (JSON), add correlation IDs, avoid high‑cardinality spam.
  • Traces (causal path): distributed trace from ingress through dependencies; tag with endpoint, tenant, DB/HTTP spans, and payload sizes.

Why: Metrics tell that something’s wrong, traces tell where, logs tell why.

Dashboards (service health, runtime, dependencies)

Build three layers; link panels to drill down.

  1. Service health: RPS, p50/p95/p99 latency, error rate, timeouts, saturation.
  2. Runtime: CPU, working set, GC % time, gen0/1/2, LOH size, allocation rate, thread count, threadpool queue length, exceptions/sec.
  3. Dependencies: per‑DB and per‑HTTP service p95, error %; connection pool usage; retry counts.

Why: Quickly isolate “is it us or a dependency?” and then “is it CPU, memory, or I/O?”

Alerting Strategy (actionable alerts)

  • Alert on SLO burn, not raw metrics. Example: p95 > 300 ms for 10 minutes and consuming > 2% error budget/hr.
  • Symptom + supporting signals: pair latency with CPU>80% or GC% time>30% to avoid noise.
  • Runbooks: Each alert links to a dashboard & a first‑response checklist.
  • Dedup & silence: short quiet windows to prevent paging storms.

Why: Actionable alerts reduce fatigue and lead directly to the fix path.

SLIs, SLOs & Error Budgets

  • SLIs: Availability, success rate, p95 latency, saturation (CPU, memory), dependency p95.
  • SLOs: e.g., 99.9% success, 95th < 250 ms over a rolling 30 days.
  • Error budget: 0.1% of minutes may be bad; if burn rate is high, freeze feature work and improve reliability.

Why: Aligns engineering effort to user impact, prioritizing fixes over new features when needed.

Load Testing & Capacity Planning

  1. Define target profile: normal, peak, and stress RPS; payload sizes; mix of endpoints.
  2. Ramp tests: slowly increase RPS; watch p95/p99, CPU, GC% time, allocation rate, and errors.
  3. Soak tests: multi‑hour run to expose leaks, fragmentation, or connection churn.
  4. Capacity model: pick a saturation threshold (e.g., CPU ≤ 70%, GC% time ≤ 10%) and compute headroom.
  5. Autoscale policies: scale out on RPS per instance or latency; scale in conservatively.

Why: Validates that fixes hold under real‑world conditions—not just microbenchmarks.

Cold Starts & Startup Profiling

  • Measure: log time to process start → app ready; trace DI container build and EF model init.
  • Optimize: enable ReadyToRun (R2R), trim unused code, lazy‑init heavy singletons, pre‑warm caches.
  • NativeAOT (optional): for minimal APIs & tools, NativeAOT can reduce startup and memory footprint.

Why: Faster cold starts reduce time to recover and improve scale‑out latency.

Step 1: Baseline the Problem

Goal: Create a repeatable scenario with clear success criteria.

  1. Pick 1–3 critical endpoints; define input sizes and headers; fix RPS patterns.
  2. Warm up for 2–5 min to stabilize tiered JIT and caches.
  3. Collect: RPS, p50/95/99 latency, CPU%, GC% time, allocation rate (MB/s), exceptions/sec, thread count.

Why: Without a trustworthy baseline, you can’t validate improvements.

Step 2: CPU Bottlenecks

Signals: high CPU%, low GC% time, high p95 latency. Often due to inefficient algorithms, heavy JSON, or excessive LINQ.

  1. Triage with dotnet-counters: confirm CPU is the limiting factor.
  2. Profile with dotnet-trace and open in speedscope.
  3. Fixes: reduce allocations in hot paths, use System.Text.Json source‑gen, cache expensive results, micro‑optimize critical methods.

Why: Flame graphs reveal where time is truly spent, not where you suspect.

Step 3: Allocation Pressure

Signals: high allocation rate, elevated GC% time, frequent gen0/1; latency spikes when GC triggers.

  1. Use dotnet-counters to watch Allocation Rate, GC Heap Size, GC% Time.
  2. Capture dotnet-gcdump to see top types and stacks allocating most bytes.
  3. Fixes: reuse buffers (ArrayPool<T>), avoid boxing, use Span<T>/Memory<T>, pool serializers, prefer StringBuilder over concatenation in loops.

Why: Reducing allocations lowers GC pressure and latency variability.

Step 4: Memory Leaks

Signals: working set climbs over time, LOH grows, full GCs increase, OOM kills.

  1. Detect: run a soak test; watch heap size trend.
  2. Capture a dump with dotnet-dump or take periodic gcdump snapshots.
  3. Analyze: look for static roots, event handlers not unsubscribed, caches without TTL, long‑lived timers, pinned buffers.
  4. Fixes: weak references for caches, IDisposable for native handles, unsubscribe events, bound collections.

Why: Leaks hide in object graphs; root analysis surfaces the culprit holders.

GC Tuning & LOH Considerations

  • Server GC for services (DOTNET_GCServer=1 or in runtimeconfig.json): better throughput on multi‑core servers.
  • Latency: set GCSettings.LatencyMode (SustainedLowLatency during critical windows) judiciously.
  • LOH (> ~85 KB objects): minimize large object churn; chunk big buffers; reuse via pools.
  • Heap limits in containers: rely on cgroups; optionally set DOTNET_GCHeapHardLimit to cap.

Why: Balanced GC settings reduce pause times without starving throughput.

ThreadPool & Sync‑over‑Async

  • Watch: threadpool queue length, worker threads, Requests in Application Queue.
  • Avoid: .Result/.Wait() on async work; it causes deadlocks and thread starvation.
  • Set minimums: ThreadPool.SetMinThreads(worker, io) for bursty loads after measuring.
  • Use async end‑to‑end across the call graph; use ConfigureAwait(false) for library code.

Why: Starved threadpools manifest as high latency with low CPU—a classic trap.

I/O and External Dependencies

  • HTTP: use IHttpClientFactory, set timeouts, enable HTTP/2 when possible; monitor connection pools.
  • gRPC: prefer unary/streaming appropriately; tune max streams per connection.
  • Retries: apply jittered backoff and timeouts (avoid retry storms on brownouts).
  • Bulkheads & circuit breakers: confine blast radius when a dependency degrades.

Why: Many “CPU problems” are actually waiting on slow I/O or congested pools.

Database Query Analysis

  • Surface slow queries via tracing and per‑DB dashboards; tag with endpoint and tenant.
  • EF Core: log generated SQL, use AsNoTracking(), .TagWith(), avoid N+1, prefer projections (Select).
  • Indexes: validate via EXPLAIN/EXPLAIN ANALYZE or execution plans; ensure selectivity.
  • Pooling: watch pool usage, timeouts, and waits; right‑size connection pools.

Why: Query shape and indexes dominate latency at scale.

Caching Strategy

  • What to cache: pure computations, stable lookups, rendered fragments, tokens/claims.
  • Where: IMemoryCache for per‑node hot data; IDistributedCache/Redis for shared state.
  • Policy: TTLs, size limits, eviction, stampede protection (single‑flight), cache versioning.
  • Validation: expose cache hit rate and latency; treat cache misses as first‑class metrics.

Why: Effective caching can be the single largest throughput multiplier.

Containers & Runtime Limits

  • CPU limits: .NET respects cgroups; choose requests for baseline and limits to prevent noisy neighbors.
  • Memory limits: monitor working set vs. limit; avoid close‑to‑cap thrash; consider DOTNET_GCHeapHardLimit with care.
  • GC threads: ensure enough cores; server GC scales with CPU count.
  • Startup: trim images, pre‑pull, and use R2R to minimize cold start in orchestrators.

Why: Right‑sized limits prevent OOM kills and CPU throttling that masquerade as app bugs.

Production‑Safe Diagnostics

  • dotnet-counters & EventCounters: low overhead, safe to run during incidents.
  • dotnet-trace (short windows): sample CPU; keep collections brief to limit overhead.
  • dotnet-gcdump: heap stats with minimal disruption.
  • dotnet-dump: targeted dumps on cgroup pressure or OOM signals.
  • dotnet-monitor sidecar: on‑demand traces, dumps, metrics, and collection rules.

Why: Safe tools let you learn from production without making it worse.

Validate Fixes & Prevent Regression

  1. Re‑run the baseline with identical load; compare SLIs and counters.
  2. Canary the change (5–10% traffic) and watch SLO burn & dependency p95.
  3. Add tests: microbenchmarks (BenchmarkDotNet), load tests in CI, and synthetic probes.
  4. Guardrails: alerts on allocation rate spikes, threadpool queue growth, and error rates.

Why: The best fix includes proof and protection.

Tooling Map

Symptom Primary Tool Overhead What You Get When to Use
High latency, unknown cause dotnet-counters Very low CPU, GC% time, allocation rate, threads First triage
Suspected CPU hot path dotnet-trace + speedscope Low–moderate (short) Flame graph of hot stacks Identify heavy methods
High allocations/GC churn dotnet-gcdump Low Top allocating types & callsites Reduce allocation pressure
Leak/OOM investigation dotnet-dump Moderate (point‑in‑time) Object graph, roots, LOH usage Find leaks & roots
Prod capture automation dotnet-monitor Low Collection rules, export traces/dumps Safe production diagnostics

OpenTelemetry Setup Snippet

Add metrics & tracing to illuminate bottlenecks end‑to‑end.


// Program.cs (.NET 8)
using OpenTelemetry;
using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenTelemetry()
    .ConfigureResource(rb => rb.AddService(
        serviceName: "MyService",
        serviceVersion: "1.0.0"))
    .WithMetrics(mb => mb
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddRuntimeInstrumentation()
        .AddProcessInstrumentation()
        .AddMeter("Microsoft.AspNetCore.Hosting", "System.Net.Http", "System.Runtime")
        .AddOtlpExporter())
    .WithTracing(tb => tb
        .AddAspNetCoreInstrumentation(o => { o.RecordException = true; o.Enrich = (activity, eventName, obj) => { /* add tags */ }; })
        .AddHttpClientInstrumentation()
        .AddSqlClientInstrumentation(o => { o.SetDbStatementForText = true; o.RecordException = true; })
        .AddOtlpExporter());

var app = builder.Build();

app.MapGet("/health", () => Results.Ok(new { status = "ok" }));

app.Run();

// Environment (deployment)
// OTEL_EXPORTER_OTLP_ENDPOINT=https://otel-collector:4317
// OTEL_RESOURCE_ATTRIBUTES=service.name=MyService,service.version=1.0.0,service.instance.id=$(HOSTNAME)

Why: Standard telemetry makes your diagnostics repeatable and tool‑agnostic.

CLI Tooling Snippets

dotnet-counters

Triage runtime health in real time.

# List processes
dotnet-counters ps

# Monitor key providers
dotnet-counters monitor -p <pid> \
  System.Runtime Microsoft.AspNetCore.Hosting System.Net.Http

# Example: CSV collection for 2 minutes
dotnet-counters collect -p <pid> --duration 00:02:00 -o counters.csv

dotnet-trace

Capture CPU profiles and open in speedscope.

# List processes
dotnet-trace ps

# CPU profile; save as speedscope JSON
dotnet-trace collect -p <pid> --profile cpu --format speedscope -o trace.speedscope.json

# Open trace.speedscope.json in https://www.speedscope.app

dotnet-gcdump

Analyze allocation patterns and heap composition.

# Collect a GC dump
dotnet-gcdump collect -p <pid> -o heap.gcdump

# Analyze (top types, stacks)
dotnet-gcdump analyze heap.gcdump

dotnet-dump

Investigate leaks and OOMs via object graphs & roots.

# Capture a memory dump
dotnet-dump collect -p <pid> -o crash.dmp

# Interactive analysis
dotnet-dump analyze crash.dmp
# Inside the analyzer (SOS):
# > clrstack
# > dumpheap -stat
# > dumpheap -type MyType
# > gcroot <object-address>

Optional: Startup attach via diagnostic port (containers)

# In container spec
ENV DOTNET_DiagnosticPorts=/diag/port,suspend
# Then attach from sidecar/host:
dotnet-trace collect --diagnostic-port /diag/port --profile cpu -o startup.speedscope.json

Pro tip: Keep your scripts and dashboards versioned next to your application so every release ships with its own diagnostics.

Post a Comment

0 Comments