Skip to content

Records and iteration

First PublishedByAtif Alam

This page builds on Data structures — especially list and dict — with patterns you use when each row is a small record (a dict) and you hold many rows in a list.

Examples use a fictional service inventory so the shapes match ops-style scripts. Everything here is stdlib only.

services = [
{"id": 1, "name": "auth-service", "status": "healthy", "replicas": 3},
{"id": 2, "name": "maps-tile-server", "status": "degraded", "replicas": 5},
{"id": 3, "name": "routing-engine", "status": "healthy", "replicas": 2},
]
# Create — append a dict to a list
new_service = {"id": 4, "name": "geocoder", "status": "healthy", "replicas": 4}
services.append(new_service)
print("After create:", [service["name"] for service in services])
# Update — mutate the dict in place (same object the list holds)
def find_by_id(data, service_id):
return next((service for service in data if service["id"] == service_id), None)
def update_status(data, service_id, new_status):
row = find_by_id(data, service_id)
if row:
row["status"] = new_status
return True
return False
update_status(services, 2, "healthy")
print("After update:", find_by_id(services, 2))
# Delete — remove rows by keeping only those that do not match
# If you ran "Create" above, id=4 exists and this removes it.
services = [service for service in services if service["id"] != 4]
print("After delete:", [service["name"] for service in services])

services = [
{"id": 1, "name": "auth-service", "status": "healthy", "replicas": 3},
{"id": 2, "name": "maps-tile-server", "status": "degraded", "replicas": 5},
{"id": 3, "name": "routing-engine", "status": "healthy", "replicas": 2},
]
# Search — first dict with this id, or None
def find_by_id(data, service_id):
return next((service for service in data if service["id"] == service_id), None)
print("By id:", find_by_id(services, 2))
# Filter — one row or a filtered subset
degraded = [service["name"] for service in services if service["status"] == "degraded"]
print("Degraded:", degraded)
# Sort services by replicas.
# key=... picks replicas; reverse=True makes highest come first.
sorted_services = sorted(services, key=lambda service: service["replicas"], reverse=True)
print("By replicas:", [(service["name"], service["replicas"]) for service in sorted_services])
print("Total replicas:", sum(service["replicas"] for service in services))

find_by_id is a small reusable pattern: scan until match, return one dict or None. It is the same idea as “first row where …” in many APIs.

Grouping: itertools.groupby and a dict of lists

Section titled “Grouping: itertools.groupby and a dict of lists”

itertools.groupby only groups consecutive rows that share the same key. Sort by that key first, or you get repeated “groups” for the same key.

services = [
{"id": 1, "name": "auth-service", "status": "healthy", "replicas": 3, "region": "us-west"},
{"id": 2, "name": "maps-tile-server", "status": "degraded", "replicas": 5, "region": "us-east"},
{"id": 3, "name": "routing-engine", "status": "healthy", "replicas": 2, "region": "us-west"},
{"id": 4, "name": "geocoder", "status": "down", "replicas": 0, "region": "eu-west"},
{"id": 5, "name": "search-index", "status": "healthy", "replicas": 4, "region": "us-east"},
{"id": 6, "name": "cdn-edge", "status": "degraded", "replicas": 3, "region": "eu-west"},
]

group_by(data, key) is a reusable helper that builds a dict of lists, where each dict key is a field value (for example, a region).
Unlike itertools.groupby, this helper does not require sorting first.

def group_by(data, key):
result = {}
for item in data:
# Read the value for the supplied key (e.g., "region") from this item.
group_key = item[key]
# Initialize this group's list if missing, then append this item.
result.setdefault(group_key, []).append(item)
return result
# Group by region and print service count and total replicas per region.
for region, items in group_by(services, "region").items():
total = sum(s["replicas"] for s in items)
print(f"{region}: {len(items)} services, {total} replicas")

itemgetter("status") is a short way to say “use each row’s status value” (same idea as lambda row: row["status"]).
groupby(...) groups consecutive rows with the same key, so we sort by status first and then group by that same key.

This status example uses groupby on purpose to show the stdlib alternative; you could also group status with the same dict-of-lists helper used for region. This pattern is useful for stream-like processing: after sorting, you can process one group at a time instead of building all groups in memory first.

from itertools import groupby
from operator import itemgetter
sorted_by_status = sorted(services, key=itemgetter("status"))
for status, group in groupby(sorted_by_status, key=itemgetter("status")):
members = list(group)
print(status, [s["name"] for s in members])

The group_by helper (dict of lists) does not require sorting and is easy to reuse. itemgetter("field") is a fast, readable key function for sorted and groupby.

filter and map return iterators. In day-to-day Python, list comprehensions and sum / max / min are often clearer; still, recognizing filter / map / reduce helps when reading older code or other languages.

from functools import reduce
services = [
{"id": 1, "name": "auth-service", "status": "healthy", "replicas": 3, "region": "us-west"},
{"id": 2, "name": "maps-tile-server", "status": "degraded", "replicas": 5, "region": "us-east"},
{"id": 3, "name": "routing-engine", "status": "healthy", "replicas": 2, "region": "us-west"},
{"id": 4, "name": "geocoder", "status": "down", "replicas": 0, "region": "eu-west"},
{"id": 5, "name": "search-index", "status": "healthy", "replicas": 4, "region": "us-east"},
]
healthy = list(filter(lambda s: s["status"] == "healthy", services))
healthy_lc = [s for s in services if s["status"] == "healthy"]
# Same members as filter() here; list comp is usually preferred for simple filters.
assert [s["id"] for s in healthy] == [s["id"] for s in healthy_lc]
us_healthy = list(
filter(
lambda s: s["status"] == "healthy" and s["region"].startswith("us"),
services,
)
)
print("US healthy:", [s["name"] for s in us_healthy])
names = list(map(lambda s: s["name"], services))
print("All names:", names)
def enrich_service(s):
return {**s, "is_critical": s["replicas"] >= 4}
enriched = list(map(enrich_service, services))
print("Enriched sample:", enriched[1])
total_replicas = reduce(lambda acc, s: acc + s["replicas"], services, 0)
print("Total replicas (reduce):", total_replicas)
busiest = reduce(lambda a, b: a if a["replicas"] > b["replicas"] else b, services)
print("Busiest:", busiest["name"], busiest["replicas"])
id_to_name = reduce(lambda acc, s: {**acc, s["id"]: s["name"]}, services, {})
print("ID lookup:", id_to_name)
total_v2 = sum(
s["replicas"]
for s in services
if s["status"] == "healthy" and "us" in s["region"]
)
print("Healthy US replicas (sum):", total_v2)

Pragmatic rule: prefer comprehensions and sum(...) / max(..., key=...) when they read naturally. Use reduce when you fold into a non-trivial accumulator (for example building a lookup dict with {**acc, k: v}).

The guide Incident records analyzer walks through a small list of incident dicts: filter, enrich, group, aggregate, and a short pipeline — stdlib only, with a full solution you can run.

  • Aggregate: totals, averages, min/max, and counts.
  • Transform / enrich: derive fields like is_critical or normalized names.
  • Project: keep only selected fields for output.
  • Deduplicate: remove repeated rows (for example by id).
  • Join / merge: combine with another dataset by a shared key.
  • Partition: split into two sets by a condition.
  • Validate / clean: check required keys/types and fill defaults.
  • Index: build fast lookups (for example id -> record).
  • Top-k / sample: keep top N rows or random samples for inspection.
  • Export: write JSON/CSV for downstream tools.