Monitoring

First PublishedFeb 17, 2026ByAtif Alam

Azure’s monitoring stack is built around Azure Monitor — a unified platform that collects metrics, logs, and traces from all Azure resources and applications.

Azure Monitor Overview

1
┌──────────────────────────────────────────────────────────────┐
2
│                      Azure Monitor                            │
3
│                                                               │
4
│  Data Sources          │  Data Stores       │  Consumers      │
5
│  ─────────             │  ───────────       │  ─────────      │
6
│  Azure Resources       │  Metrics DB        │  Dashboards     │
7
│  Applications          │  Log Analytics     │  Alerts         │
8
│  OS (agents)           │  (Kusto / KQL)     │  Workbooks      │
9
│  Custom sources        │                    │  Power BI       │
10
│                        │                    │  Grafana        │
11
└──────────────────────────────────────────────────────────────┘

Component	What It Does	AWS Equivalent
Metrics	Numeric time-series data (CPU, memory, requests)	CloudWatch Metrics
Log Analytics	Log collection and querying (KQL)	CloudWatch Logs Insights
Application Insights	Application performance monitoring (APM)	X-Ray + CloudWatch
Alerts	Notifications and automated actions	CloudWatch Alarms
Workbooks	Interactive reports and dashboards	CloudWatch Dashboards

Metrics

Every Azure resource automatically emits platform metrics — no agent needed.

Common Metrics

Resource	Metrics
VM	CPU %, available memory, disk IOPS, network in/out
App Service	HTTP requests, response time, CPU %, memory %
Azure SQL	DTU/CPU %, connections, deadlocks, storage
Cosmos DB	Request units consumed, latency, availability
Storage	Transactions, ingress/egress, latency
AKS	Node CPU/memory, pod count, kubelet health

Viewing Metrics

1
# List available metrics for a resource
2
az monitor metrics list-definitions \
3
  --resource /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Compute/virtualMachines/my-vm
4

5
# Query a metric
6
az monitor metrics list \
7
  --resource /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Compute/virtualMachines/my-vm \
8
  --metric "Percentage CPU" \
9
  --interval PT1H \
10
  --aggregation Average

In the portal, every resource has a Metrics blade where you can build charts, filter by dimensions, and pin to dashboards.

Custom Metrics

Send custom metrics from your application:

1
from opencensus.ext.azure import metrics_exporter
2

3
exporter = metrics_exporter.new_metrics_exporter(
4
    connection_string="InstrumentationKey=<your-key>"
5
)
6

7
# Or use Application Insights SDK
8
from applicationinsights import TelemetryClient
9
tc = TelemetryClient("<instrumentation-key>")
10
tc.track_metric("OrdersProcessed", 42)
11
tc.flush()

Log Analytics

Log Analytics is a centralized log store with a powerful query language (KQL — Kusto Query Language). It’s where all Azure logs end up.

Log Analytics Workspace

A workspace is the central container for logs. All resources send logs to a workspace.

1
# Create a workspace
2
az monitor log-analytics workspace create \
3
  --resource-group myapp-rg \
4
  --workspace-name myapp-logs \
5
  --location eastus
6

7
# Enable diagnostic logs for a resource (e.g. App Service)
8
az monitor diagnostic-settings create \
9
  --name send-to-log-analytics \
10
  --resource /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Web/sites/my-webapp \
11
  --workspace myapp-logs \
12
  --logs '[{"category":"AppServiceHTTPLogs","enabled":true},{"category":"AppServiceConsoleLogs","enabled":true}]' \
13
  --metrics '[{"category":"AllMetrics","enabled":true}]'

KQL (Kusto Query Language)

KQL is the query language for Log Analytics — similar to SQL but optimized for log data:

1
// Find errors in the last hour
2
AppServiceHTTPLogs
3
| where TimeGenerated > ago(1h)
4
| where ScStatus >= 500
5
| project TimeGenerated, CsMethod, CsUriStem, ScStatus, TimeTaken
6
| sort by TimeGenerated desc
7

8
// Count errors per 5 minutes
9
AppServiceHTTPLogs
10
| where TimeGenerated > ago(24h)
11
| where ScStatus >= 500
12
| summarize ErrorCount = count() by bin(TimeGenerated, 5m)
13
| render timechart
14

15
// Top 10 slowest requests
16
AppServiceHTTPLogs
17
| where TimeGenerated > ago(1h)
18
| top 10 by TimeTaken desc
19
| project TimeGenerated, CsMethod, CsUriStem, TimeTaken, ScStatus
20

21
// VM CPU above 80%
22
Perf
23
| where ObjectName == "Processor" and CounterName == "% Processor Time"
24
| where CounterValue > 80
25
| summarize AvgCPU = avg(CounterValue) by Computer, bin(TimeGenerated, 5m)
26

27
// Kubernetes pod restarts
28
KubePodInventory
29
| where PodRestartCount > 0
30
| summarize Restarts = max(PodRestartCount) by PodName, Namespace
31
| sort by Restarts desc

Common Log Tables

Table	Source	Contains
`AppServiceHTTPLogs`	App Service	HTTP request logs
`AppServiceConsoleLogs`	App Service	stdout/stderr
`AzureActivity`	All resources	Control plane operations (create, delete, modify)
`Perf`	VMs (agent)	Performance counters (CPU, memory, disk)
`Syslog`	VMs (agent)	Linux syslog messages
`ContainerLog`	AKS	Container stdout/stderr
`KubePodInventory`	AKS	Pod metadata (status, restarts, images)
`AzureDiagnostics`	Various	Diagnostic logs from many services
`AppTraces`	Application Insights	App traces and custom logs
`AppRequests`	Application Insights	HTTP requests to your app
`AppExceptions`	Application Insights	Unhandled exceptions

Application Insights

Application Insights is an APM (Application Performance Monitoring) service — it instruments your application code to collect requests, dependencies, exceptions, and traces.

What It Collects

Signal	What It Tracks
Requests	Incoming HTTP requests (URL, status, duration)
Dependencies	Outgoing calls (database, HTTP, Redis, queues)
Exceptions	Unhandled exceptions with stack traces
Traces	Custom log messages from your code
Page views	Browser-side telemetry (load time, client errors)
Availability	Synthetic ping tests from multiple locations
Performance	Response times, failure rates, throughput

Auto-Instrumentation

For many platforms, Application Insights can instrument your app with minimal code changes:

Python (Django/Flask/FastAPI):

1
from azure.monitor.opentelemetry import configure_azure_monitor
2

3
configure_azure_monitor(
4
    connection_string="InstrumentationKey=<your-key>;IngestionEndpoint=https://..."
5
)
6
# That's it — requests, dependencies, and exceptions are auto-collected

Node.js:

1
const { useAzureMonitor } = require("@azure/monitor-opentelemetry");
2
useAzureMonitor({ connectionString: process.env.APPLICATIONINSIGHTS_CONNECTION_STRING });

.NET:

1
// In Program.cs
2
builder.Services.AddApplicationInsightsTelemetry();

Application Map

Application Insights generates a visual map of your application topology — showing services, dependencies, and the health/latency of each connection:

1
Web App (98.5% success, 45ms avg)
2
    ├──► SQL Database (99.9%, 12ms)
3
    ├──► Redis Cache (99.99%, 2ms)
4
    ├──► External API (95%, 200ms)  ← potential issue
5
    └──► Blob Storage (99.99%, 5ms)

Live Metrics

Real-time view of requests, failures, and dependencies — useful during deployments and incident response.

Smart Detection

Application Insights automatically detects anomalies:

Sudden spike in failure rate
Abnormal rise in response time
Memory leak patterns
Unusual exception volumes

Alerts

Alerts fire when a condition is met and trigger actions (email, SMS, webhook, Logic App, Azure Function).

Alert Types

Type	Triggers On
Metric alert	A metric crosses a threshold (e.g. CPU > 80%)
Log alert	A KQL query returns results (e.g. error count > 10 in 5 min)
Activity log alert	A control-plane event (e.g. VM deallocated, resource deleted)
Smart detection	Application Insights detects an anomaly

Creating a Metric Alert

1
az monitor metrics alert create \
2
  --resource-group myapp-rg \
3
  --name "HighCPU" \
4
  --scopes /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Compute/virtualMachines/my-vm \
5
  --condition "avg Percentage CPU > 80" \
6
  --window-size 5m \
7
  --evaluation-frequency 1m \
8
  --action /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Insights/actionGroups/ops-team

Action Groups

An action group defines who gets notified and how:

Action	What It Does
Email	Send to email address
SMS	Send text message
Webhook	POST to a URL (Slack, PagerDuty, custom)
Azure Function	Run a function for auto-remediation
Logic App	Trigger a workflow
ITSM	Create a ticket in ServiceNow, etc.

1
az monitor action-group create \
2
  --resource-group myapp-rg \
3
  --name ops-team \
4
  --short-name ops \
5
  --action email ops [email protected] \
6
  --action webhook slack https://hooks.slack.com/services/...

Azure Monitor vs Prometheus/Grafana

	Azure Monitor	Prometheus + Grafana
Setup	Built-in for Azure resources	Self-hosted
Query language	KQL	PromQL + LogQL
APM	Application Insights (built-in)	OpenTelemetry + Jaeger/Tempo
Dashboards	Workbooks, portal dashboards	Grafana (more flexible)
Cost	Per GB ingested + per metric/alert	Infrastructure cost only
Best for	Azure-native workloads	Multi-cloud, K8s-native

Azure Monitor integrates with Grafana via the Azure Monitor data source — you can use Grafana dashboards with Azure Monitor and Log Analytics data.

Key Takeaways

Azure Monitor is the unified platform for metrics, logs, and alerts. Platform metrics are automatic; logs require diagnostic settings.
Log Analytics stores logs and supports KQL queries — a powerful, SQL-like language for log analysis.
Application Insights instruments your app for APM: requests, dependencies, exceptions, and the application map.
Alerts trigger on metrics, logs, or activity events. Use action groups for notifications (email, Slack, auto-remediation).
Enable diagnostic settings on every production resource to send logs to Log Analytics.
Use Application Insights auto-instrumentation for easy APM with minimal code changes.