Monitoring
Azure’s monitoring stack is built around Azure Monitor — a unified platform that collects metrics, logs, and traces from all Azure resources and applications.
Azure Monitor Overview
Section titled “Azure Monitor Overview”┌──────────────────────────────────────────────────────────────┐│ Azure Monitor ││ ││ Data Sources │ Data Stores │ Consumers ││ ───────── │ ─────────── │ ───────── ││ Azure Resources │ Metrics DB │ Dashboards ││ Applications │ Log Analytics │ Alerts ││ OS (agents) │ (Kusto / KQL) │ Workbooks ││ Custom sources │ │ Power BI ││ │ │ Grafana │└──────────────────────────────────────────────────────────────┘| Component | What It Does | AWS Equivalent |
|---|---|---|
| Metrics | Numeric time-series data (CPU, memory, requests) | CloudWatch Metrics |
| Log Analytics | Log collection and querying (KQL) | CloudWatch Logs Insights |
| Application Insights | Application performance monitoring (APM) | X-Ray + CloudWatch |
| Alerts | Notifications and automated actions | CloudWatch Alarms |
| Workbooks | Interactive reports and dashboards | CloudWatch Dashboards |
Metrics
Section titled “Metrics”Every Azure resource automatically emits platform metrics — no agent needed.
Common Metrics
Section titled “Common Metrics”| Resource | Metrics |
|---|---|
| VM | CPU %, available memory, disk IOPS, network in/out |
| App Service | HTTP requests, response time, CPU %, memory % |
| Azure SQL | DTU/CPU %, connections, deadlocks, storage |
| Cosmos DB | Request units consumed, latency, availability |
| Storage | Transactions, ingress/egress, latency |
| AKS | Node CPU/memory, pod count, kubelet health |
Viewing Metrics
Section titled “Viewing Metrics”# List available metrics for a resourceaz monitor metrics list-definitions \ --resource /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Compute/virtualMachines/my-vm
# Query a metricaz monitor metrics list \ --resource /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Compute/virtualMachines/my-vm \ --metric "Percentage CPU" \ --interval PT1H \ --aggregation AverageIn the portal, every resource has a Metrics blade where you can build charts, filter by dimensions, and pin to dashboards.
Custom Metrics
Section titled “Custom Metrics”Send custom metrics from your application:
from opencensus.ext.azure import metrics_exporter
exporter = metrics_exporter.new_metrics_exporter( connection_string="InstrumentationKey=<your-key>")
# Or use Application Insights SDKfrom applicationinsights import TelemetryClienttc = TelemetryClient("<instrumentation-key>")tc.track_metric("OrdersProcessed", 42)tc.flush()Log Analytics
Section titled “Log Analytics”Log Analytics is a centralized log store with a powerful query language (KQL — Kusto Query Language). It’s where all Azure logs end up.
Log Analytics Workspace
Section titled “Log Analytics Workspace”A workspace is the central container for logs. All resources send logs to a workspace.
# Create a workspaceaz monitor log-analytics workspace create \ --resource-group myapp-rg \ --workspace-name myapp-logs \ --location eastus
# Enable diagnostic logs for a resource (e.g. App Service)az monitor diagnostic-settings create \ --name send-to-log-analytics \ --resource /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Web/sites/my-webapp \ --workspace myapp-logs \ --logs '[{"category":"AppServiceHTTPLogs","enabled":true},{"category":"AppServiceConsoleLogs","enabled":true}]' \ --metrics '[{"category":"AllMetrics","enabled":true}]'KQL (Kusto Query Language)
Section titled “KQL (Kusto Query Language)”KQL is the query language for Log Analytics — similar to SQL but optimized for log data:
// Find errors in the last hourAppServiceHTTPLogs| where TimeGenerated > ago(1h)| where ScStatus >= 500| project TimeGenerated, CsMethod, CsUriStem, ScStatus, TimeTaken| sort by TimeGenerated desc
// Count errors per 5 minutesAppServiceHTTPLogs| where TimeGenerated > ago(24h)| where ScStatus >= 500| summarize ErrorCount = count() by bin(TimeGenerated, 5m)| render timechart
// Top 10 slowest requestsAppServiceHTTPLogs| where TimeGenerated > ago(1h)| top 10 by TimeTaken desc| project TimeGenerated, CsMethod, CsUriStem, TimeTaken, ScStatus
// VM CPU above 80%Perf| where ObjectName == "Processor" and CounterName == "% Processor Time"| where CounterValue > 80| summarize AvgCPU = avg(CounterValue) by Computer, bin(TimeGenerated, 5m)
// Kubernetes pod restartsKubePodInventory| where PodRestartCount > 0| summarize Restarts = max(PodRestartCount) by PodName, Namespace| sort by Restarts descCommon Log Tables
Section titled “Common Log Tables”| Table | Source | Contains |
|---|---|---|
AppServiceHTTPLogs | App Service | HTTP request logs |
AppServiceConsoleLogs | App Service | stdout/stderr |
AzureActivity | All resources | Control plane operations (create, delete, modify) |
Perf | VMs (agent) | Performance counters (CPU, memory, disk) |
Syslog | VMs (agent) | Linux syslog messages |
ContainerLog | AKS | Container stdout/stderr |
KubePodInventory | AKS | Pod metadata (status, restarts, images) |
AzureDiagnostics | Various | Diagnostic logs from many services |
AppTraces | Application Insights | App traces and custom logs |
AppRequests | Application Insights | HTTP requests to your app |
AppExceptions | Application Insights | Unhandled exceptions |
Application Insights
Section titled “Application Insights”Application Insights is an APM (Application Performance Monitoring) service — it instruments your application code to collect requests, dependencies, exceptions, and traces.
What It Collects
Section titled “What It Collects”| Signal | What It Tracks |
|---|---|
| Requests | Incoming HTTP requests (URL, status, duration) |
| Dependencies | Outgoing calls (database, HTTP, Redis, queues) |
| Exceptions | Unhandled exceptions with stack traces |
| Traces | Custom log messages from your code |
| Page views | Browser-side telemetry (load time, client errors) |
| Availability | Synthetic ping tests from multiple locations |
| Performance | Response times, failure rates, throughput |
Auto-Instrumentation
Section titled “Auto-Instrumentation”For many platforms, Application Insights can instrument your app with minimal code changes:
Python (Django/Flask/FastAPI):
from azure.monitor.opentelemetry import configure_azure_monitor
configure_azure_monitor( connection_string="InstrumentationKey=<your-key>;IngestionEndpoint=https://...")# That's it — requests, dependencies, and exceptions are auto-collectedNode.js:
const { useAzureMonitor } = require("@azure/monitor-opentelemetry");useAzureMonitor({ connectionString: process.env.APPLICATIONINSIGHTS_CONNECTION_STRING });.NET:
// In Program.csbuilder.Services.AddApplicationInsightsTelemetry();Application Map
Section titled “Application Map”Application Insights generates a visual map of your application topology — showing services, dependencies, and the health/latency of each connection:
Web App (98.5% success, 45ms avg) ├──► SQL Database (99.9%, 12ms) ├──► Redis Cache (99.99%, 2ms) ├──► External API (95%, 200ms) ← potential issue └──► Blob Storage (99.99%, 5ms)Live Metrics
Section titled “Live Metrics”Real-time view of requests, failures, and dependencies — useful during deployments and incident response.
Smart Detection
Section titled “Smart Detection”Application Insights automatically detects anomalies:
- Sudden spike in failure rate
- Abnormal rise in response time
- Memory leak patterns
- Unusual exception volumes
Alerts
Section titled “Alerts”Alerts fire when a condition is met and trigger actions (email, SMS, webhook, Logic App, Azure Function).
Alert Types
Section titled “Alert Types”| Type | Triggers On |
|---|---|
| Metric alert | A metric crosses a threshold (e.g. CPU > 80%) |
| Log alert | A KQL query returns results (e.g. error count > 10 in 5 min) |
| Activity log alert | A control-plane event (e.g. VM deallocated, resource deleted) |
| Smart detection | Application Insights detects an anomaly |
Creating a Metric Alert
Section titled “Creating a Metric Alert”az monitor metrics alert create \ --resource-group myapp-rg \ --name "HighCPU" \ --scopes /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Compute/virtualMachines/my-vm \ --condition "avg Percentage CPU > 80" \ --window-size 5m \ --evaluation-frequency 1m \ --action /subscriptions/<sub>/resourceGroups/myapp-rg/providers/Microsoft.Insights/actionGroups/ops-teamAction Groups
Section titled “Action Groups”An action group defines who gets notified and how:
| Action | What It Does |
|---|---|
| Send to email address | |
| SMS | Send text message |
| Webhook | POST to a URL (Slack, PagerDuty, custom) |
| Azure Function | Run a function for auto-remediation |
| Logic App | Trigger a workflow |
| ITSM | Create a ticket in ServiceNow, etc. |
az monitor action-group create \ --resource-group myapp-rg \ --name ops-team \ --short-name ops \ --action webhook slack https://hooks.slack.com/services/...Azure Monitor vs Prometheus/Grafana
Section titled “Azure Monitor vs Prometheus/Grafana”| Azure Monitor | Prometheus + Grafana | |
|---|---|---|
| Setup | Built-in for Azure resources | Self-hosted |
| Query language | KQL | PromQL + LogQL |
| APM | Application Insights (built-in) | OpenTelemetry + Jaeger/Tempo |
| Dashboards | Workbooks, portal dashboards | Grafana (more flexible) |
| Cost | Per GB ingested + per metric/alert | Infrastructure cost only |
| Best for | Azure-native workloads | Multi-cloud, K8s-native |
Azure Monitor integrates with Grafana via the Azure Monitor data source — you can use Grafana dashboards with Azure Monitor and Log Analytics data.
Key Takeaways
Section titled “Key Takeaways”- Azure Monitor is the unified platform for metrics, logs, and alerts. Platform metrics are automatic; logs require diagnostic settings.
- Log Analytics stores logs and supports KQL queries — a powerful, SQL-like language for log analysis.
- Application Insights instruments your app for APM: requests, dependencies, exceptions, and the application map.
- Alerts trigger on metrics, logs, or activity events. Use action groups for notifications (email, Slack, auto-remediation).
- Enable diagnostic settings on every production resource to send logs to Log Analytics.
- Use Application Insights auto-instrumentation for easy APM with minimal code changes.