Process Cloud Policies
This guide walks you through building a small Python program that traverses and analyzes cloud permissions policy documents (e.g. AWS IAM–style JSON). You implement one part at a time and run the script after every step to verify. Python 3.9+ is recommended (the code uses list[str] in dataclasses). Create the script in a folder of your choice and run it with python process_cloud_policies.py (or your filename).
Scenario and goal
Section titled “Scenario and goal”You are given JSON-like policy documents with a Version and a Statement array. Each statement has Effect, Principal, Action, Resource, and optionally Condition. In real policies, Action / Principal / Resource can be a single string or a list; Statement is sometimes a single object instead of an array. Your goal: parse the document, normalize these shapes, traverse to get one flat record per (statement × action × resource), and support filtering and summarization — useful for auditing, diffing, or feeding other tools.
What you’ll build: Data models (dataclasses), a parser, a generator-based traverser, query helpers, tests, and a __main__ block that runs tests and prints a summary. You’ll add a minimal if __name__ == "__main__": in Step 1 and grow it each step; after every step, run the script to confirm nothing breaks.
Step 1 — Data model and a runnable script
Section titled “Step 1 — Data model and a runnable script”Goal: Define typed structures for a policy document and make the script runnable so you can test after each step.
Create a new file (e.g. process_cloud_policies.py). Add the two dataclasses below. Dataclasses give you clear, typed structures that are easy to extend later.
For more on dataclasses, see Dataclasses. For what a policy and policy statements are in AWS, see Policies.
Then add a minimal entry point that builds a small PolicyDocument and prints it (or "Step 1 OK"). You’ll extend this block in every following step.
from dataclasses import dataclassfrom typing import Any
@dataclassclass PolicyStatement: sid: str effect: str # "Allow" or "Deny" principals: list[str] actions: list[str] resources: list[str] conditions: dict[str, Any]
@dataclassclass PolicyDocument: version: str statements: list[PolicyStatement]
if __name__ == "__main__": doc = PolicyDocument( version="2022-01-01", statements=[ PolicyStatement( sid="Test", effect="Allow", principals=["*"], actions=["s3:GetObject"], resources=["*"], conditions={}, ) ], ) print("Step 1 OK:", doc.version, len(doc.statements))Check: Run python process_cloud_policies.py and confirm you see something like Step 1 OK: 2022-01-01 1.
Step 2 — Parser
Section titled “Step 2 — Parser”Goal: Turn a raw dict (from JSON) into a PolicyDocument, handling real-world quirks.
What to implement:
to_list(val)— Normalize Principal/Action/Resource to a list of strings._parse_statement(index, raw)— Build onePolicyStatementfrom a raw dict.parse_policy_document(raw)— Read Version and Statement(s), then parse each statement.
Minimal policy: Add the dict below at module level (after your dataclasses, before the parser). You’ll use it from __main__ in Steps 2–4. It has two statements (one Allow with lists, one Deny with scalars) so you can verify the parser and later steps without jumping ahead.
MINIMAL_POLICY = { "Version": "2012-10-17", "Statement": [ { "Sid": "AllowS3Read", "Effect": "Allow", "Principal": ["arn:aws:iam::123:role/ReadRole"], "Action": ["s3:GetObject", "s3:ListBucket"], "Resource": ["arn:aws:s3:::my-bucket/*", "arn:aws:s3:::my-bucket"], }, { "Sid": "DenyDelete", "Effect": "Deny", "Principal": "*", "Action": "s3:DeleteObject", "Resource": "arn:aws:s3:::my-bucket/*", "Condition": {"StringEquals": {"aws:RequestedRegion": "us-east-1"}}, }, ],}Parser code: Add these functions at module level, after MINIMAL_POLICY and before your if __name__ == "__main__": block.
# Normalize to list: one string becomes [str], None becomes [].# Policy JSON often has Principal/Action/Resource as# either a string or a list.def to_list(val: Any) -> list[str]: if val is None: return [] return [val] if isinstance(val, str) else list(val)
# Build one PolicyStatement from a raw statement dict.# Uses to_list for Principal/Action/Resource.# If missing: Sid becomes "Statement-{index}",# Effect "Unknown", Condition {}.def _parse_statement(index: int, raw: dict) -> PolicyStatement: return PolicyStatement( sid=raw.get("Sid", f"Statement-{index}"), effect=raw.get("Effect", "Unknown"), principals=to_list(raw.get("Principal")), actions=to_list(raw.get("Action")), resources=to_list(raw.get("Resource")), conditions=raw.get("Condition", {}), )
# Turn a raw policy dict (e.g. from JSON) into a PolicyDocument.# Version defaults to "unknown".# If Statement is a single object instead of a list,# we treat it as a one-element list.def parse_policy_document(raw: dict) -> PolicyDocument: version = raw.get("Version", "unknown") raw_statements = raw.get("Statement", []) if not isinstance(raw_statements, list): raw_statements = [raw_statements] statements = [_parse_statement(i, s) for i, s in enumerate(raw_statements)] return PolicyDocument(version=version, statements=statements)Update __main__: Parse MINIMAL_POLICY and print the statement count (and optionally the second statement’s principals/actions to confirm scalars became lists).
if __name__ == "__main__": doc = parse_policy_document(MINIMAL_POLICY) print("Statements:", len(doc.statements)) if len(doc.statements) >= 2: s = doc.statements[1] print("Second statement principals:", s.principals, "actions:", s.actions)Check: Run the script. You should see 2 statements and the second with principals: ['*'], actions: ['s3:DeleteObject'].
Step 3 — Traversal
Section titled “Step 3 — Traversal”Goal: Yield one flat record per (statement × action × resource) so downstream code can filter or analyze easily.
Implement expand_wildcards(actions) as a stub: return the list as-is, or ["*"] if the list is empty. (Later you could plug in a service action catalog to expand wildcards.) Then implement traverse_policy(doc) as a generator: for each statement, get the action list (via expand_wildcards), then for each action and each resource (or ["*"] if none), yield a dict with sid, effect, principals, action, resource, conditions. For large policies with many wildcards, this can expand a lot; for huge docs, consider streaming or lazy expansion.
Update __main__: Parse MINIMAL_POLICY, call list(traverse_policy(doc)), and print the length. For the minimal policy you should get 5 records (2 actions × 2 resources + 1 × 1).
def expand_wildcards(actions: list[str]) -> list[str]: return actions if actions else ["*"]
def traverse_policy(doc: PolicyDocument): for stmt in doc.statements: actions = expand_wildcards(stmt.actions) for action in actions: for resource in stmt.resources or ["*"]: yield { "sid": stmt.sid, "effect": stmt.effect, "principals": stmt.principals, "action": action, "resource": resource, "conditions": stmt.conditions, }Check: Run the script and confirm the printed count is 5.
Step 4 — Query helpers
Section titled “Step 4 — Query helpers”Goal: Filter traversal results and summarize the document.
Implement find_permissions(doc, *, principal=..., action_prefix=..., resource=..., effect=...): loop over traverse_policy(doc) and keep only records where (if provided) principal is in the record’s principals, action starts with action_prefix, resource matches or is "*", and effect matches (case-insensitive). All filters are ANDed. Return the list of matching dicts.
Implement summarize_policy(doc): collect all records from traverse_policy(doc) and return a dict with version, total_statements, total_permissions, allow_count, deny_count, unique_actions (sorted), unique_resources (sorted).
from typing import Optional
def find_permissions( doc: PolicyDocument, *, principal: Optional[str] = None, action_prefix: Optional[str] = None, resource: Optional[str] = None, effect: Optional[str] = None,) -> list[dict]: results = [] for record in traverse_policy(doc): if principal is not None and principal not in record["principals"]: continue if action_prefix is not None and not record["action"].startswith(action_prefix): continue if resource is not None and record["resource"] != resource and resource != "*": continue if effect is not None and record["effect"].lower() != effect.lower(): continue results.append(record) return results
def summarize_policy(doc: PolicyDocument) -> dict: records = list(traverse_policy(doc)) return { "version": doc.version, "total_statements": len(doc.statements), "total_permissions": len(records), "allow_count": sum(1 for r in records if r["effect"] == "Allow"), "deny_count": sum(1 for r in records if r["effect"] == "Deny"), "unique_actions": sorted({r["action"] for r in records}), "unique_resources": sorted({r["resource"] for r in records}), }Update __main__: Parse MINIMAL_POLICY, call find_permissions(doc, effect="Deny") and summarize_policy(doc), and print the length of the Deny list and a one-line summary (e.g. total_permissions).
Check: Run the script. find_permissions(doc, effect="Deny") should have length 1; the summary should show 5 total permissions, 4 allow, 1 deny.
Step 5 — Sample policy and tests
Section titled “Step 5 — Sample policy and tests”Goal: Lock in behavior with a small test suite and cover edge cases.
Add _sample_policy() -> dict that returns the same structure as MINIMAL_POLICY (you can return MINIMAL_POLICY or a copy). This becomes the canonical sample for tests and the final demo. Add run_tests() that:
- Parses the sample policy and asserts 2 statements; asserts the second statement’s principals and actions are lists; asserts 5 traversal records.
- Asserts
find_permissions(doc, effect="Deny")has length 1 andfind_permissions(doc, action_prefix="s3:")has length 5. - Parses
{}and asserts version"unknown", 0 statements, and empty traversal. - Parses a single statement without
Sidand asserts the first statement’s sid is"Statement-0". - Prints
"All tests passed."on success.
Update __main__: Call run_tests(). Run the script and confirm you see “All tests passed.” with no assertion errors.
Step 6 — Complete the entry point
Section titled “Step 6 — Complete the entry point”Goal: Wire the full behavior into the __main__ block you’ve been using since Step 1: one run runs tests, then prints the policy summary and all Allow permissions.
Add import json at the top of the file if you don’t have it yet. In the existing if __name__ == "__main__": block, keep the call to run_tests(). Then parse the sample with parse_policy_document(_sample_policy()), print summarize_policy(doc) as JSON (e.g. print(json.dumps(summarize_policy(doc), indent=2))), and print “All Allow Permissions” by iterating find_permissions(doc, effect="Allow") and printing each record’s sid, action, and resource.
Check: Run python process_cloud_policies.py once. You should see “All tests passed.”, then the summary JSON, then the list of Allow permissions.
Design decisions
Section titled “Design decisions”- Dataclasses keep the data model clear and typed.
- Generator traversal keeps memory low and lets callers stop early.
expand_wildcardsis a stub so you can later plug in a service action catalog.- Normalizing scalars to lists matches real policy documents that mix single values and arrays.
Extension: Some policies use "Principal": {"AWS": "arn:..."}. To support that, normalize the principal dict to a list of ARNs inside _parse_statement.
Once all steps are done, your module should match the structure above. You can add a “Full script reference” or paste the final file into a gist if you want a single copy to compare against.