Records and iteration

First PublishedMar 30, 2026ByAtif Alam

This page builds on Data structures — especially list and dict — with patterns you use when each row is a small record (a dict) and you hold many rows in a list.

Examples use a fictional service inventory so the shapes match ops-style scripts. Everything here is stdlib only.

List of Dicts: Create, Update, Delete

1
services = [
2
    {"id": 1, "name": "auth-service", "status": "healthy", "replicas": 3},
3
    {"id": 2, "name": "maps-tile-server", "status": "degraded", "replicas": 5},
4
    {"id": 3, "name": "routing-engine", "status": "healthy", "replicas": 2},
5
]

Create: Append a Dict

1
# Create — append a dict to a list
2
new_service = {"id": 4, "name": "geocoder", "status": "healthy", "replicas": 4}
3
services.append(new_service)
4
print("After create:", [service["name"] for service in services])

Update: Mutate in Place

1
# Update — mutate the dict in place (same object the list holds)
2
def find_by_id(data, service_id):
3
    return next((service for service in data if service["id"] == service_id), None)
4

5
def update_status(data, service_id, new_status):
6
    row = find_by_id(data, service_id)
7
    if row:
8
        row["status"] = new_status
9
        return True
10
    return False
11

12
update_status(services, 2, "healthy")
13
print("After update:", find_by_id(services, 2))

Delete: Build a New List

1
# Delete — remove rows by keeping only those that do not match
2
# If you ran "Create" above, id=4 exists and this removes it.
3
services = [service for service in services if service["id"] != 4]
4
print("After delete:", [service["name"] for service in services])

1
services = [
2
    {"id": 1, "name": "auth-service", "status": "healthy", "replicas": 3},
3
    {"id": 2, "name": "maps-tile-server", "status": "degraded", "replicas": 5},
4
    {"id": 3, "name": "routing-engine", "status": "healthy", "replicas": 2},
5
]

Search: First Dict by Id

1
# Search — first dict with this id, or None
2
def find_by_id(data, service_id):
3
    return next((service for service in data if service["id"] == service_id), None)
4

5
print("By id:", find_by_id(services, 2))

Filter: Subset by Condition

1
# Filter — one row or a filtered subset
2
degraded = [service["name"] for service in services if service["status"] == "degraded"]
3
print("Degraded:", degraded)

Sort: Highest Replicas First

1
# Sort services by replicas.
2
# key=... picks replicas; reverse=True makes highest come first.
3
sorted_services = sorted(services, key=lambda service: service["replicas"], reverse=True)
4
print("By replicas:", [(service["name"], service["replicas"]) for service in sorted_services])
5
print("Total replicas:", sum(service["replicas"] for service in services))

find_by_id is a small reusable pattern: scan until match, return one dict or None. It is the same idea as “first row where …” in many APIs.

Grouping: itertools.groupby and a dict of lists

itertools.groupby only groups consecutive rows that share the same key. Sort by that key first, or you get repeated “groups” for the same key.

1
services = [
2
    {"id": 1, "name": "auth-service", "status": "healthy", "replicas": 3, "region": "us-west"},
3
    {"id": 2, "name": "maps-tile-server", "status": "degraded", "replicas": 5, "region": "us-east"},
4
    {"id": 3, "name": "routing-engine", "status": "healthy", "replicas": 2, "region": "us-west"},
5
    {"id": 4, "name": "geocoder", "status": "down", "replicas": 0, "region": "eu-west"},
6
    {"id": 5, "name": "search-index", "status": "healthy", "replicas": 4, "region": "us-east"},
7
    {"id": 6, "name": "cdn-edge", "status": "degraded", "replicas": 3, "region": "eu-west"},
8
]

Group by Region (Dict of Lists Helper)

group_by(data, key) is a reusable helper that builds a dict of lists, where each dict key is a field value (for example, a region).
Unlike itertools.groupby, this helper does not require sorting first.

1
def group_by(data, key):
2
    result = {}
3
    for item in data:
4
        # Read the value for the supplied key (e.g., "region") from this item.
5
        group_key = item[key]
6
        # Initialize this group's list if missing, then append this item.
7
        result.setdefault(group_key, []).append(item)
8
    return result
9

10
# Group by region and print service count and total replicas per region.
11
for region, items in group_by(services, "region").items():
12
    total = sum(s["replicas"] for s in items)
13
    print(f"{region}: {len(items)} services, {total} replicas")

Group by Status (itertools.groupby)

itemgetter("status") is a short way to say “use each row’s status value” (same idea as lambda row: row["status"]).
groupby(...) groups consecutive rows with the same key, so we sort by status first and then group by that same key.

This status example uses groupby on purpose to show the stdlib alternative; you could also group status with the same dict-of-lists helper used for region. This pattern is useful for stream-like processing: after sorting, you can process one group at a time instead of building all groups in memory first.

1
from itertools import groupby
2
from operator import itemgetter
3

4
sorted_by_status = sorted(services, key=itemgetter("status"))
5
for status, group in groupby(sorted_by_status, key=itemgetter("status")):
6
    members = list(group)
7
    print(status, [s["name"] for s in members])

The group_by helper (dict of lists) does not require sorting and is easy to reuse. itemgetter("field") is a fast, readable key function for sorted and groupby.

filter, map, and reduce

filter and map return iterators. In day-to-day Python, list comprehensions and sum / max / min are often clearer; still, recognizing filter / map / reduce helps when reading older code or other languages.

1
from functools import reduce
2

3
services = [
4
    {"id": 1, "name": "auth-service", "status": "healthy", "replicas": 3, "region": "us-west"},
5
    {"id": 2, "name": "maps-tile-server", "status": "degraded", "replicas": 5, "region": "us-east"},
6
    {"id": 3, "name": "routing-engine", "status": "healthy", "replicas": 2, "region": "us-west"},
7
    {"id": 4, "name": "geocoder", "status": "down", "replicas": 0, "region": "eu-west"},
8
    {"id": 5, "name": "search-index", "status": "healthy", "replicas": 4, "region": "us-east"},
9
]
10

11
healthy = list(filter(lambda s: s["status"] == "healthy", services))
12
healthy_lc = [s for s in services if s["status"] == "healthy"]
13
# Same members as filter() here; list comp is usually preferred for simple filters.
14
assert [s["id"] for s in healthy] == [s["id"] for s in healthy_lc]
15

16
us_healthy = list(
17
    filter(
18
        lambda s: s["status"] == "healthy" and s["region"].startswith("us"),
19
        services,
20
    )
21
)
22
print("US healthy:", [s["name"] for s in us_healthy])
23

24
names = list(map(lambda s: s["name"], services))
25
print("All names:", names)
26

27

28
def enrich_service(s):
29
    return {**s, "is_critical": s["replicas"] >= 4}
30

31

32
enriched = list(map(enrich_service, services))
33
print("Enriched sample:", enriched[1])
34

35
total_replicas = reduce(lambda acc, s: acc + s["replicas"], services, 0)
36
print("Total replicas (reduce):", total_replicas)
37

38
busiest = reduce(lambda a, b: a if a["replicas"] > b["replicas"] else b, services)
39
print("Busiest:", busiest["name"], busiest["replicas"])
40

41
id_to_name = reduce(lambda acc, s: {**acc, s["id"]: s["name"]}, services, {})
42
print("ID lookup:", id_to_name)
43

44
total_v2 = sum(
45
    s["replicas"]
46
    for s in services
47
    if s["status"] == "healthy" and "us" in s["region"]
48
)
49
print("Healthy US replicas (sum):", total_v2)

Pragmatic rule: prefer comprehensions and sum(...) / max(..., key=...) when they read naturally. Use reduce when you fold into a non-trivial accumulator (for example building a lookup dict with {**acc, k: v}).

Try it: incident records

The guide Incident records analyzer walks through a small list of incident dicts: filter, enrich, group, aggregate, and a short pipeline — stdlib only, with a full solution you can run.

Other Common Operations on Record Lists

Aggregate: totals, averages, min/max, and counts.
Transform / enrich: derive fields like is_critical or normalized names.
Project: keep only selected fields for output.
Deduplicate: remove repeated rows (for example by id).
Join / merge: combine with another dataset by a shared key.
Partition: split into two sets by a condition.
Validate / clean: check required keys/types and fill defaults.
Index: build fast lookups (for example id -> record).
Top-k / sample: keep top N rows or random samples for inspection.
Export: write JSON/CSV for downstream tools.