Skip to content

Monitoring

First PublishedByAtif Alam

Azure’s monitoring stack is built around Azure Monitor — a unified platform that collects metrics, logs, and traces from all Azure resources and applications.

┌──────────────────────────────────────────────────────────────┐
│ Azure Monitor │
│ │
│ Data Sources │ Data Stores │ Consumers │
│ ───────── │ ─────────── │ ───────── │
│ Azure Resources │ Metrics DB │ Dashboards │
│ Applications │ Log Analytics │ Alerts │
│ OS (agents) │ (Kusto / KQL) │ Workbooks │
│ Custom sources │ │ Power BI │
│ │ │ Grafana │
└──────────────────────────────────────────────────────────────┘
ComponentWhat It DoesAWS Equivalent
MetricsNumeric time-series data (CPU, memory, requests)CloudWatch Metrics
Log AnalyticsLog collection and querying (KQL)CloudWatch Logs Insights
Application InsightsApplication performance monitoring (APM)X-Ray + CloudWatch
AlertsNotifications and automated actionsCloudWatch Alarms
WorkbooksInteractive reports and dashboardsCloudWatch Dashboards

Every Azure resource automatically emits platform metrics — no agent needed.

ResourceMetrics
VMCPU %, available memory, disk IOPS, network in/out
App ServiceHTTP requests, response time, CPU %, memory %
Azure SQLDTU/CPU %, connections, deadlocks, storage
Cosmos DBRequest units consumed, latency, availability
StorageTransactions, ingress/egress, latency
AKSNode CPU/memory, pod count, kubelet health
Terminal window
# List available metrics for a resource
az monitor metrics list-definitions \
--resource /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Compute/virtualMachines/my-vm
# Query a metric
az monitor metrics list \
--resource /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Compute/virtualMachines/my-vm \
--metric "Percentage CPU" \
--interval PT1H \
--aggregation Average

In the portal, every resource has a Metrics blade where you can build charts, filter by dimensions, and pin to dashboards.

Send custom metrics from your application:

from opencensus.ext.azure import metrics_exporter
exporter = metrics_exporter.new_metrics_exporter(
connection_string="InstrumentationKey=<your-key>"
)
# Or use Application Insights SDK
from applicationinsights import TelemetryClient
tc = TelemetryClient("<instrumentation-key>")
tc.track_metric("OrdersProcessed", 42)
tc.flush()

Log Analytics is a centralized log store with a powerful query language (KQL — Kusto Query Language). It’s where all Azure logs end up.

A workspace is the central container for logs. All resources send logs to a workspace.

Terminal window
# Create a workspace
az monitor log-analytics workspace create \
--resource-group myapp-rg \
--workspace-name myapp-logs \
--location eastus
# Enable diagnostic logs for a resource (e.g. App Service)
az monitor diagnostic-settings create \
--name send-to-log-analytics \
--resource /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Web/sites/my-webapp \
--workspace myapp-logs \
--logs '[{"category":"AppServiceHTTPLogs","enabled":true},{"category":"AppServiceConsoleLogs","enabled":true}]' \
--metrics '[{"category":"AllMetrics","enabled":true}]'

KQL is the query language for Log Analytics — similar to SQL but optimized for log data:

// Find errors in the last hour
AppServiceHTTPLogs
| where TimeGenerated > ago(1h)
| where ScStatus >= 500
| project TimeGenerated, CsMethod, CsUriStem, ScStatus, TimeTaken
| sort by TimeGenerated desc
// Count errors per 5 minutes
AppServiceHTTPLogs
| where TimeGenerated > ago(24h)
| where ScStatus >= 500
| summarize ErrorCount = count() by bin(TimeGenerated, 5m)
| render timechart
// Top 10 slowest requests
AppServiceHTTPLogs
| where TimeGenerated > ago(1h)
| top 10 by TimeTaken desc
| project TimeGenerated, CsMethod, CsUriStem, TimeTaken, ScStatus
// VM CPU above 80%
Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| where CounterValue > 80
| summarize AvgCPU = avg(CounterValue) by Computer, bin(TimeGenerated, 5m)
// Kubernetes pod restarts
KubePodInventory
| where PodRestartCount > 0
| summarize Restarts = max(PodRestartCount) by PodName, Namespace
| sort by Restarts desc
TableSourceContains
AppServiceHTTPLogsApp ServiceHTTP request logs
AppServiceConsoleLogsApp Servicestdout/stderr
AzureActivityAll resourcesControl plane operations (create, delete, modify)
PerfVMs (agent)Performance counters (CPU, memory, disk)
SyslogVMs (agent)Linux syslog messages
ContainerLogAKSContainer stdout/stderr
KubePodInventoryAKSPod metadata (status, restarts, images)
AzureDiagnosticsVariousDiagnostic logs from many services
AppTracesApplication InsightsApp traces and custom logs
AppRequestsApplication InsightsHTTP requests to your app
AppExceptionsApplication InsightsUnhandled exceptions

Application Insights is an APM (Application Performance Monitoring) service — it instruments your application code to collect requests, dependencies, exceptions, and traces.

SignalWhat It Tracks
RequestsIncoming HTTP requests (URL, status, duration)
DependenciesOutgoing calls (database, HTTP, Redis, queues)
ExceptionsUnhandled exceptions with stack traces
TracesCustom log messages from your code
Page viewsBrowser-side telemetry (load time, client errors)
AvailabilitySynthetic ping tests from multiple locations
PerformanceResponse times, failure rates, throughput

For many platforms, Application Insights can instrument your app with minimal code changes:

Python (Django/Flask/FastAPI):

from azure.monitor.opentelemetry import configure_azure_monitor
configure_azure_monitor(
connection_string="InstrumentationKey=<your-key>;IngestionEndpoint=https://..."
)
# That's it — requests, dependencies, and exceptions are auto-collected

Node.js:

const { useAzureMonitor } = require("@azure/monitor-opentelemetry");
useAzureMonitor({ connectionString: process.env.APPLICATIONINSIGHTS_CONNECTION_STRING });

.NET:

// In Program.cs
builder.Services.AddApplicationInsightsTelemetry();

Application Insights generates a visual map of your application topology — showing services, dependencies, and the health/latency of each connection:

Web App (98.5% success, 45ms avg)
├──► SQL Database (99.9%, 12ms)
├──► Redis Cache (99.99%, 2ms)
├──► External API (95%, 200ms) ← potential issue
└──► Blob Storage (99.99%, 5ms)

Real-time view of requests, failures, and dependencies — useful during deployments and incident response.

Application Insights automatically detects anomalies:

  • Sudden spike in failure rate
  • Abnormal rise in response time
  • Memory leak patterns
  • Unusual exception volumes

Alerts fire when a condition is met and trigger actions (email, SMS, webhook, Logic App, Azure Function).

TypeTriggers On
Metric alertA metric crosses a threshold (e.g. CPU > 80%)
Log alertA KQL query returns results (e.g. error count > 10 in 5 min)
Activity log alertA control-plane event (e.g. VM deallocated, resource deleted)
Smart detectionApplication Insights detects an anomaly
Terminal window
az monitor metrics alert create \
--resource-group myapp-rg \
--name "HighCPU" \
--scopes /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Compute/virtualMachines/my-vm \
--condition "avg Percentage CPU > 80" \
--window-size 5m \
--evaluation-frequency 1m \
--action /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Insights/actionGroups/ops-team

An action group defines who gets notified and how:

ActionWhat It Does
EmailSend to email address
SMSSend text message
WebhookPOST to a URL (Slack, PagerDuty, custom)
Azure FunctionRun a function for auto-remediation
Logic AppTrigger a workflow
ITSMCreate a ticket in ServiceNow, etc.
Terminal window
az monitor action-group create \
--resource-group myapp-rg \
--name ops-team \
--short-name ops \
--action email ops [email protected] \
--action webhook slack https://hooks.slack.com/services/...
Azure MonitorPrometheus + Grafana
SetupBuilt-in for Azure resourcesSelf-hosted
Query languageKQLPromQL + LogQL
APMApplication Insights (built-in)OpenTelemetry + Jaeger/Tempo
DashboardsWorkbooks, portal dashboardsGrafana (more flexible)
CostPer GB ingested + per metric/alertInfrastructure cost only
Best forAzure-native workloadsMulti-cloud, K8s-native

Azure Monitor integrates with Grafana via the Azure Monitor data source — you can use Grafana dashboards with Azure Monitor and Log Analytics data.

  • Azure Monitor is the unified platform for metrics, logs, and alerts. Platform metrics are automatic; logs require diagnostic settings.
  • Log Analytics stores logs and supports KQL queries — a powerful, SQL-like language for log analysis.
  • Application Insights instruments your app for APM: requests, dependencies, exceptions, and the application map.
  • Alerts trigger on metrics, logs, or activity events. Use action groups for notifications (email, Slack, auto-remediation).
  • Enable diagnostic settings on every production resource to send logs to Log Analytics.
  • Use Application Insights auto-instrumentation for easy APM with minimal code changes.