Monitoring and Observability in .NET Web API Development

Monitoring and Observability in .NET 8 Web API Development

In distributed systems and cloud-native applications, visibility into your .NET 8 Web APIs is crucial for maintaining reliability, performance, and rapid issue resolution. Monitoring and observability go beyond basic logging—they provide deep insights into application behavior, user interactions, and system health. This comprehensive guide explores implementing robust observability in .NET 8, using OpenTelemetry, Application Insights, and other tools to build APIs that are transparent and maintainable at scale.

We'll cover telemetry collection, distributed tracing, metrics aggregation, and alerting strategies, with practical code examples and enterprise-grade best practices. Effective observability transforms reactive debugging into proactive system management.

OpenTelemetry Integration

OpenTelemetry provides a unified approach to collecting traces, metrics, and logs across your entire stack.

Setting Up OpenTelemetry

Configure comprehensive telemetry collection:

builder.Services.AddOpenTelemetry()
    .WithTracing(tracerProviderBuilder =>
    {
        tracerProviderBuilder
            .AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation()
            .AddEntityFrameworkCoreInstrumentation()
            .AddOtlpExporter(options =>
            {
                options.Endpoint = new Uri("http://otel-collector:4317");
            });
    })
    .WithMetrics(metricsProviderBuilder =>
    {
        metricsProviderBuilder
            .AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation()
            .AddRuntimeInstrumentation()
            .AddOtlpExporter(options =>
            {
                options.Endpoint = new Uri("http://otel-collector:4317");
            });
    })
    .WithLogging(loggingBuilder =>
    {
        loggingBuilder.AddOtlpExporter(options =>
        {
            options.Endpoint = new Uri("http://otel-collector:4317");
        });
    });

Why this matters: Imagine your API as a complex machine in a factory. Without OpenTelemetry, you're trying to fix breakdowns by guessing what's wrong inside a black box. With it, you get a real-time blueprint of every component's performance, like having X-ray vision into your machinery. In the real world, companies like Netflix use OpenTelemetry to monitor their streaming infrastructure, ensuring millions of users get uninterrupted service by quickly identifying and fixing issues before they cascade.

Custom Tracing and Metrics

Add custom instrumentation to your business logic:

public class OrderProcessingService
{
    private readonly ILogger<OrderProcessingService> _logger;
    private readonly Meter _meter;
    private readonly Counter<long> _ordersProcessed;
    private readonly Histogram<double> _processingDuration;

    public OrderProcessingService(ILogger<OrderProcessingService> logger)
    {
        _logger = logger;
        _meter = new Meter("OrderProcessing", "1.0.0");
        _ordersProcessed = _meter.CreateCounter<long>("orders_processed_total", "Total number of orders processed");
        _processingDuration = _meter.CreateHistogram<double>("order_processing_duration_seconds", "Time spent processing orders");
    }

    public async Task<OrderResult> ProcessOrderAsync(OrderRequest request)
    {
        using var activity = ActivitySourceProvider.Default.CreateActivity("ProcessOrder", ActivityKind.Internal);
        activity?.SetTag("order.id", request.OrderId);
        activity?.SetTag("order.amount", request.TotalAmount);

        var stopwatch = Stopwatch.StartNew();
        
        try
        {
            _logger.LogInformation("Starting order processing for order {OrderId}", request.OrderId);
            
            var result = await ProcessOrderInternalAsync(request);
            
            _ordersProcessed.Add(1, new KeyValuePair<string, object?>("status", "success"));
            _processingDuration.Record(stopwatch.Elapsed.TotalSeconds, 
                new KeyValuePair<string, object?>("status", "success"));
            
            _logger.LogInformation("Order {OrderId} processed successfully", request.OrderId);
            
            return result;
        }
        catch (Exception ex)
        {
            _ordersProcessed.Add(1, new KeyValuePair<string, object?>("status", "error"));
            _processingDuration.Record(stopwatch.Elapsed.TotalSeconds, 
                new KeyValuePair<string, object?>("status", "error"));
            
            activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
            _logger.LogError(ex, "Failed to process order {OrderId}", request.OrderId);
            
            throw;
        }
    }
}

Define ActivitySource:

public static class ActivitySourceProvider
{
    public static readonly ActivitySource Default = new("OrderProcessing.API", "1.0.0");
}

Why this matters: Think of custom tracing like leaving breadcrumbs through a dense forest. When a user reports a lost order, instead of searching blindly, you can follow the exact path the request took through your system— from the API gateway, through validation, payment processing, and inventory checks. Real-world example: Amazon uses similar tracing to track package deliveries across their vast network, ensuring they can pinpoint exactly where a delay occurred in the supply chain, much like debugging a complex e-commerce transaction.

Application Insights Deep Dive

Azure Application Insights offers rich application performance monitoring with minimal configuration.

Advanced Configuration

Set up comprehensive Application Insights monitoring:

builder.Services.AddApplicationInsightsTelemetry(options =>
{
    options.ConnectionString = builder.Configuration["ApplicationInsights:ConnectionString"];
    options.EnableAdaptiveSampling = true;
    options.EnableHeartbeat = true;
});

builder.Services.Configure<TelemetryConfiguration>(config =>
{
    config.TelemetryInitializers.Add(new OperationCorrelationTelemetryInitializer());
    config.TelemetryInitializers.Add(new HttpDependenciesParsingTelemetryInitializer());
    
    // Custom telemetry initializer
    config.TelemetryInitializers.Add(new CustomTelemetryInitializer());
});

Implement custom telemetry initializer:

public class CustomTelemetryInitializer : ITelemetryInitializer
{
    public void Initialize(ITelemetry telemetry)
    {
        if (telemetry is RequestTelemetry requestTelemetry)
        {
            requestTelemetry.Properties["CustomProperty"] = "API";
            requestTelemetry.Metrics["CustomMetric"] = 42.0;
        }
    }
}

Availability Tests

Configure synthetic monitoring:

{
  "name": "API Health Check",
  "type": "Microsoft.Insights/webtests",
  "properties": {
    "Enabled": true,
    "Frequency": 300,
    "Timeout": 30,
    "Kind": "ping",
    "Locations": [
      {
        "Id": "us-il-ch1-azr"
      }
    ],
    "Configuration": {
      "WebTest": "<WebTest Name=\"API Health Check\" Enabled=\"True\" CssProjectStructure=\"\" CssIteration=\"\" Timeout=\"30\" WorkItemIds=\"\" xmlns=\"http://microsoft.com/schemas/VisualStudio/TeamTest/2010\" Description=\"\" CredentialUserName=\"\" CredentialPassword=\"\" PreAuthenticate=\"True\" Proxy=\"default\" StopOnError=\"False\" RecordedResultFile=\"\" ResultsLocale=\"\"><Items><Request Method=\"GET\" Version=\"1.1\" Url=\"https://your-api.azurewebsites.net/health\" ThinkTime=\"0\" Timeout=\"30\" ParseDependentRequests=\"False\" FollowRedirects=\"True\" RecordResult=\"True\" Cache=\"False\" ResponseTimeGoal=\"0\" Encoding=\"utf-8\" ExpectedHttpStatusCode=\"200\" ExpectedResponseUrl=\"\" ReportingName=\"\" IgnoreHttpStatusCode=\"False\" /></Items></WebTest>"
    }
  }
}

Why this matters: Application Insights is like having a car's dashboard that not only shows your speed and fuel level but also predicts when you'll need an oil change. In enterprise scenarios, companies like Microsoft use it internally to monitor their own services, automatically detecting anomalies—like a sudden spike in failed logins—that could indicate a security breach, allowing them to respond before customers are affected.

Structured Logging with Serilog

Implement structured logging for better searchability and analysis.

Serilog Configuration

Set up Serilog with multiple sinks:

builder.Host.UseSerilog((context, services, configuration) => configuration
    .ReadFrom.Configuration(context.Configuration)
    .ReadFrom.Services(services)
    .Enrich.FromLogContext()
    .Enrich.WithMachineName()
    .Enrich.WithEnvironmentName()
    .Enrich.WithProperty("Application", "OrderProcessing.API")
    .WriteTo.Console()
    .WriteTo.ApplicationInsights(
        services.GetRequiredService<TelemetryConfiguration>(),
        TelemetryConverter.Traces)
    .WriteTo.Elasticsearch(new ElasticsearchSinkOptions(new Uri("http://elasticsearch:9200"))
    {
        AutoRegisterTemplate = true,
        IndexFormat = "order-api-{0:yyyy.MM.dd}"
    }));

appsettings.json configuration:

{
  "Serilog": {
    "MinimumLevel": {
      "Default": "Information",
      "Override": {
        "Microsoft": "Warning",
        "Microsoft.Hosting.Lifetime": "Information"
      }
    },
    "WriteTo": [
      {
        "Name": "Console",
        "Args": {
          "outputTemplate": "[{Timestamp:HH:mm:ss} {Level:u3}] {Message:lj} {Properties:j}{NewLine}{Exception}"
        }
      }
    ]
  }
}

Contextual Logging

Use logging scopes for request correlation:

public class RequestLoggingMiddleware
{
    private readonly RequestDelegate _next;
    private readonly ILogger<RequestLoggingMiddleware> _logger;

    public RequestLoggingMiddleware(RequestDelegate next, ILogger<RequestLoggingMiddleware> logger)
    {
        _next = next;
        _logger = logger;
    }

    public async Task InvokeAsync(HttpContext context)
    {
        var requestId = context.TraceIdentifier;
        
        using (_logger.BeginScope(new Dictionary<string, object>
        {
            ["RequestId"] = requestId,
            ["UserId"] = context.User?.Identity?.Name ?? "Anonymous",
            ["Path"] = context.Request.Path,
            ["Method"] = context.Request.Method
        }))
        {
            _logger.LogInformation("Request started");
            
            var stopwatch = Stopwatch.StartNew();
            try
            {
                await _next(context);
                _logger.LogInformation("Request completed in {ElapsedMs}ms with status {StatusCode}", 
                    stopwatch.ElapsedMilliseconds, context.Response.StatusCode);
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Request failed after {ElapsedMs}ms", stopwatch.ElapsedMilliseconds);
                throw;
            }
        }
    }
}

Register middleware:

app.UseMiddleware<RequestLoggingMiddleware>();

Why this matters: Structured logging is like organizing your paperwork in a filing cabinet versus scattering documents across your desk. When an issue occurs, you can quickly query for "all errors from user X in the last hour" instead of sifting through thousands of log lines. Real-world example: During the 2021 Facebook outage, engineers used structured logs to trace the exact sequence of events that led to the global blackout, enabling them to restore service in hours rather than days.

Health Checks and Probes

Implement comprehensive health monitoring for your API and dependencies.

Advanced Health Checks

Configure detailed health checks:

builder.Services.AddHealthChecks()
    .AddDbContextCheck<ApplicationDbContext>("database", 
        tags: new[] { "database", "sql" })
    .AddRedis("redis", 
        tags: new[] { "cache", "redis" })
    .AddUrlGroup(new Uri("https://external-api.com/health"), "external-api",
        tags: new[] { "external", "api" })
    .AddCheck<CustomHealthCheck>("custom", 
        tags: new[] { "custom" });

builder.Services.AddHealthChecksUI(options =>
{
    options.UIPath = "/health-ui";
}).AddInMemoryStorage();

Implement custom health check:

public class CustomHealthCheck : IHealthCheck
{
    private readonly IHttpClientFactory _httpClientFactory;

    public CustomHealthCheck(IHttpClientFactory httpClientFactory)
    {
        _httpClientFactory = httpClientFactory;
    }

    public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, 
        CancellationToken cancellationToken = default)
    {
        try
        {
            var client = _httpClientFactory.CreateClient();
            var response = await client.GetAsync("https://critical-service.com/status", cancellationToken);
            
            if (response.IsSuccessStatusCode)
            {
                return HealthCheckResult.Healthy("Critical service is responding");
            }
            else
            {
                return HealthCheckResult.Degraded($"Critical service returned {response.StatusCode}");
            }
        }
        catch (Exception ex)
        {
            return HealthCheckResult.Unhealthy("Critical service is unreachable", ex);
        }
    }
}

Map health endpoints:

app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready"),
    ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});

app.MapHealthChecks("/health/live", new HealthCheckOptions
{
    Predicate = _ => false // Liveness probe - always returns healthy if app is running
});

Why this matters: Health checks are like regular doctor's visits for your application. Just as a physician can detect early signs of illness before symptoms become severe, health checks identify failing dependencies before they cause outages. In containerized environments like Kubernetes, this is crucial—think of it as the difference between a patient getting preventive care versus showing up at the emergency room with a full-blown crisis.

Distributed Tracing Across Services

Implement end-to-end tracing in microservices architectures.

Service-to-Service Tracing

Configure tracing headers propagation:

public class TracingHttpHandler : DelegatingHandler
{
    private readonly ActivitySource _activitySource;

    public TracingHttpHandler(ActivitySource activitySource)
    {
        _activitySource = activitySource;
    }

    protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, 
        CancellationToken cancellationToken)
    {
        using var activity = _activitySource.StartActivity($"HTTP {request.Method} {request.RequestUri?.Host}", 
            ActivityKind.Client);
        
        activity?.SetTag("http.method", request.Method.Method);
        activity?.SetTag("http.url", request.RequestUri?.ToString());
        activity?.SetTag("http.scheme", request.RequestUri?.Scheme);

        // Add tracing headers
        if (Activity.Current != null)
        {
            request.Headers.Add("traceparent", Activity.Current.Id);
            if (Activity.Current.TraceStateString != null)
            {
                request.Headers.Add("tracestate", Activity.Current.TraceStateString);
            }
        }

        var response = await base.SendAsync(request, cancellationToken);
        
        activity?.SetTag("http.status_code", (int)response.StatusCode);
        
        return response;
    }
}

Register in HttpClient:

builder.Services.AddHttpClient("TracedClient")
    .AddHttpMessageHandler<TracingHttpHandler>();

Trace Sampling

Implement intelligent sampling to manage telemetry volume:

builder.Services.AddOpenTelemetry()
    .WithTracing(tracerProviderBuilder =>
    {
        tracerProviderBuilder
            .SetSampler(new TraceIdRatioBasedSampler(0.1)) // Sample 10% of traces
            .AddAspNetCoreInstrumentation()
            .AddOtlpExporter();
    });

Why this matters: Distributed tracing is like tracking a package through multiple shipping companies and warehouses. When a customer complains about a delayed delivery, you can see exactly where the bottleneck occurred—was it at the initial pickup, during transit, or at the final sorting facility? In microservices, this visibility is invaluable; companies like Uber use distributed tracing to monitor ride requests across dozens of services, ensuring they can quickly identify and fix issues in their complex ecosystem.

Alerting and Incident Response

Set up proactive monitoring with intelligent alerting.

Application Insights Alerts

Configure metric-based alerts:

{
  "name": "High Response Time Alert",
  "type": "Microsoft.Insights/metricAlerts",
  "properties": {
    "description": "Alert when API response time exceeds threshold",
    "severity": 2,
    "enabled": true,
    "scopes": ["/subscriptions/.../resourceGroups/.../providers/Microsoft.Web/sites/your-api"],
    "evaluationFrequency": "PT5M",
    "windowSize": "PT15M",
    "criteria": {
      "allOf": [
        {
          "name": "ResponseTime",
          "metricName": "HttpResponseTime",
          "dimensions": [],
          "operator": "GreaterThan",
          "threshold": 5000,
          "timeAggregation": "Average"
        }
      ]
    },
    "actions": [
      {
        "actionGroupId": "/subscriptions/.../resourceGroups/.../providers/microsoft.insights/actionGroups/email-alerts"
      }
    ]
  }
}

Custom Alerting Logic

Implement application-level alerting:

public class AlertingService
{
    private readonly TelemetryClient _telemetry;
    private readonly ILogger<AlertingService> _logger;

    public AlertingService(TelemetryClient telemetry, ILogger<AlertingService> logger)
    {
        _telemetry = telemetry;
        _logger = logger;
    }

    public async Task CheckAndAlertAsync()
    {
        var metrics = await GetSystemMetricsAsync();
        
        if (metrics.ErrorRate > 0.05) // 5% error rate
        {
            _telemetry.TrackEvent("HighErrorRateAlert", new Dictionary<string, string>
            {
                ["ErrorRate"] = metrics.ErrorRate.ToString("P2"),
                ["TimeWindow"] = "5m"
            });
            
            _logger.LogWarning("High error rate detected: {ErrorRate}", metrics.ErrorRate);
            
            // Send alert to external system
            await SendAlertAsync("High Error Rate", $"Error rate is {metrics.ErrorRate:P2}");
        }
    }
}

Why this matters: Alerting is like having smoke detectors in your house—they don't prevent fires, but they give you early warning to evacuate and call the fire department. In the financial industry, companies use similar alerting systems to detect fraudulent transactions in real-time, stopping potential losses before they escalate. Without proactive alerts, you're essentially waiting for customers to complain before discovering major issues.

Observability Best Practices Checklist

Use this checklist to ensure comprehensive observability:

Telemetry Collection

  • [ ] OpenTelemetry configured for traces, metrics, and logs
  • [ ] Custom instrumentation added to business logic
  • [ ] Request correlation IDs implemented
  • [ ] Sensitive data filtered from telemetry
  • [ ] Telemetry sampling configured appropriately

Logging Strategy

  • [ ] Structured logging implemented with Serilog
  • [ ] Log levels configured correctly (dev vs prod)
  • [ ] Log aggregation and searchability set up
  • [ ] Log retention policies defined
  • [ ] Security events logged appropriately

Monitoring and Alerting

  • [ ] Health checks implemented for all dependencies
  • [ ] Application Insights or equivalent configured
  • [ ] Key performance indicators defined
  • [ ] Alert thresholds established
  • [ ] Incident response procedures documented

Distributed Tracing

  • [ ] Service-to-service tracing headers propagated
  • [ ] Trace context included in async operations
  • [ ] Trace visualization tools configured
  • [ ] Trace retention and sampling optimized

Operational Readiness

  • [ ] Dashboards created for key metrics
  • [ ] Runbooks written for common issues
  • [ ] On-call rotation established
  • [ ] Post-mortem process defined
  • [ ] Observability costs monitored

Implementing comprehensive monitoring and observability in your .NET 8 Web APIs transforms how you operate and maintain distributed systems. By collecting the right telemetry, setting up intelligent alerting, and establishing clear operational procedures, you'll move from reactive firefighting to proactive system management. Remember, observability is not a one-time implementation—it's an ongoing practice that evolves with your system's complexity and scale.

Post a Comment

0 Comments