Read from API and Process Data

First PublishedMar 27, 2026ByAtif Alam

This page shows a simple pattern for API-driven scripts: fetch JSON, normalize records, summarize, and print a clear report. You will build one piece at a time and run python read_from_api_process_data.py after each step.

Create a file named read_from_api_process_data.py. This walkthrough uses Python standard library modules only (urllib, json, collections, datetime), so no package install is needed.

Scenario and Goal

You want to read recent commit data from the GitHub API endpoint:

GET /repos/{owner}/{repo}/commits

Then process the response into useful outputs:

Total commits fetched
Commits per author
Commits per day
A short latest-commit list

This example uses public repositories only and no token. Unauthenticated requests have lower rate limits.

Step 1 — Fetch Commits JSON

Goal: Request commit data from GitHub and confirm you received a JSON list.

Use urllib.request to avoid extra dependencies. Add a User-Agent header because many APIs expect one.

1
import json
2
from urllib.request import Request, urlopen
3

4
OWNER = "python"  # GitHub org or username
5
REPO = "cpython"  # repository name
6
PER_PAGE = 10  # how many commits to ask for (API may cap this)
7

8

9
def fetch_commits(owner: str, repo: str, per_page: int = 10):
10
    url = f"https://api.github.com/repos/{owner}/{repo}/commits?per_page={per_page}"
11
    # Build a GET request (URL + headers); no network call until urlopen.
12
    req = Request(url, headers={"User-Agent": "python-api-example"})
13

14
    # Send the request and return the response stream (10s max wait).
15
    with urlopen(req, timeout=10) as resp:
16
        status = resp.status  # HTTP result code from the server (200 = success).
17
        data = json.load(resp)  # Read response body and parse JSON into Python data.
18
    return status, data
19

20

21
if __name__ == "__main__":
22
    status, items = fetch_commits(OWNER, REPO, PER_PAGE)
23
    print("Step 1 OK — status:", status)
24
    print("Items fetched:", len(items))

Check: Run the script. You should see a 200 status and a non-zero item count for active repositories.

Step 2 — Parse and Normalize One Commit

Goal: Convert each raw API item into a small, consistent dict.

Add parse_commit and apply it to the fetched list. Keep only a few fields you actually need. Highlighted lines are the new parse_commit function.

1
import json
2
from urllib.request import Request, urlopen
3

4
OWNER = "python"  # GitHub org or username
5
REPO = "cpython"  # repository name
6
PER_PAGE = 10  # how many commits to ask for (API may cap this)
7

8

9
def fetch_commits(owner: str, repo: str, per_page: int = 10):
10
    url = f"https://api.github.com/repos/{owner}/{repo}/commits?per_page={per_page}"
11
    req = Request(url, headers={"User-Agent": "python-api-example"})  # Build a GET request (URL + headers); no network call until urlopen.
12
    with urlopen(req, timeout=10) as resp:  # Send the request and return the response stream (10s max wait).
13
        status = resp.status  # HTTP result code from the server (200 = success).
14
        data = json.load(resp)  # Read response body and parse JSON into Python data.
15
    return status, data
16

17

18
def parse_commit(item: dict) -> dict | None:
19
    sha = item.get("sha")  # full commit hash from the API
20
    commit = item.get("commit", {})  # nested author/message metadata
21
    author = commit.get("author", {})
22
    message = commit.get("message", "")
23
    date = author.get("date")  # ISO-8601 timestamp string
24
    author_name = author.get("name")
25

26
    if not sha or not date:
27
        return None  # skip incomplete API rows
28

29
    return {
30
        "sha_short": sha[:7],
31
        "author_name": author_name or "Unknown",
32
        "date": date[:10],  # YYYY-MM-DD
33
        "message_first_line": message.splitlines()[0] if message else "(no message)",
34
    }
35

36

37
if __name__ == "__main__":
38
    status, items = fetch_commits(OWNER, REPO, PER_PAGE)
39
    parsed = []
40
    for item in items:  # each item is one commit object from JSON
41
        row = parse_commit(item)
42
        if row is not None:
43
            parsed.append(row)
44

45
    print("Step 2 OK — status:", status)
46
    print("Parsed commits:", len(parsed))
47
    if parsed:
48
        print("First parsed commit:", parsed[0])

Check: You should see at least one parsed commit with sha_short, author_name, date, and message_first_line.

Step 3 — Summarize the Parsed Data

Goal: Build simple aggregates: commits by author and commits by day.

Use Counter and defaultdict(int) for compact summary logic. Highlighted lines are the new summarize function.

1
import json
2
from collections import Counter, defaultdict
3
from urllib.request import Request, urlopen
4

5
OWNER = "python"  # GitHub org or username
6
REPO = "cpython"  # repository name
7
PER_PAGE = 10  # how many commits to ask for (API may cap this)
8

9

10
def fetch_commits(owner: str, repo: str, per_page: int = 10):
11
    url = f"https://api.github.com/repos/{owner}/{repo}/commits?per_page={per_page}"
12
    req = Request(url, headers={"User-Agent": "python-api-example"})  # Build a GET request (URL + headers); no network call until urlopen.
13
    with urlopen(req, timeout=10) as resp:  # Send the request and return the response stream (10s max wait).
14
        status = resp.status  # HTTP result code from the server (200 = success).
15
        data = json.load(resp)  # Read response body and parse JSON into Python data.
16
    return status, data
17

18

19
def parse_commit(item: dict) -> dict | None:
20
    sha = item.get("sha")  # full commit hash from the API
21
    commit = item.get("commit", {})  # nested author/message metadata
22
    author = commit.get("author", {})
23
    message = commit.get("message", "")
24
    date = author.get("date")  # ISO-8601 timestamp string
25
    author_name = author.get("name")
26
    if not sha or not date:
27
        return None  # skip incomplete API rows
28
    return {
29
        "sha_short": sha[:7],
30
        "author_name": author_name or "Unknown",
31
        "date": date[:10],
32
        "message_first_line": message.splitlines()[0] if message else "(no message)",
33
    }
34

35

36
def summarize(rows: list[dict]):  # tally commits by author and by calendar day
37
    commits_by_author: Counter[str] = Counter()
38
    commits_by_day: dict[str, int] = defaultdict(int)
39

40
    for row in rows:
41
        commits_by_author[row["author_name"]] += 1
42
        commits_by_day[row["date"]] += 1
43

44
    return commits_by_author, commits_by_day
45

46

47
if __name__ == "__main__":
48
    status, items = fetch_commits(OWNER, REPO, PER_PAGE)
49
    parsed = [row for row in (parse_commit(item) for item in items) if row is not None]  # drop None parses
50
    by_author, by_day = summarize(parsed)
51

52
    print("Step 3 OK — status:", status)
53
    print("Parsed commits:", len(parsed))
54
    print("Top authors:", by_author.most_common(3))
55
    print("By day:", dict(sorted(by_day.items())))

Check: You should see top-author counts and one or more date buckets. Results vary by repository activity.

Step 4 — Print a Readable Report

Goal: Format summary output in clear sections for quick scanning. Highlighted lines are the new print_report function.

1
import json
2
from collections import Counter, defaultdict
3
from urllib.request import Request, urlopen
4

5
OWNER = "python"  # GitHub org or username
6
REPO = "cpython"  # repository name
7
PER_PAGE = 10  # how many commits to ask for (API may cap this)
8

9

10
def fetch_commits(owner: str, repo: str, per_page: int = 10):
11
    url = f"https://api.github.com/repos/{owner}/{repo}/commits?per_page={per_page}"
12
    req = Request(url, headers={"User-Agent": "python-api-example"})  # Build a GET request (URL + headers); no network call until urlopen.
13
    with urlopen(req, timeout=10) as resp:  # Send the request and return the response stream (10s max wait).
14
        status = resp.status  # HTTP result code from the server (200 = success).
15
        data = json.load(resp)  # Read response body and parse JSON into Python data.
16
    return status, data
17

18

19
def parse_commit(item: dict) -> dict | None:
20
    sha = item.get("sha")  # full commit hash from the API
21
    commit = item.get("commit", {})  # nested author/message metadata
22
    author = commit.get("author", {})
23
    message = commit.get("message", "")
24
    date = author.get("date")  # ISO-8601 timestamp string
25
    author_name = author.get("name")
26
    if not sha or not date:
27
        return None  # skip incomplete API rows
28
    return {
29
        "sha_short": sha[:7],
30
        "author_name": author_name or "Unknown",
31
        "date": date[:10],
32
        "message_first_line": message.splitlines()[0] if message else "(no message)",
33
    }
34

35

36
def summarize(rows: list[dict]):  # tally commits by author and by calendar day
37
    commits_by_author: Counter[str] = Counter()
38
    commits_by_day: dict[str, int] = defaultdict(int)
39
    for row in rows:
40
        commits_by_author[row["author_name"]] += 1
41
        commits_by_day[row["date"]] += 1
42
    return commits_by_author, commits_by_day
43

44

45
def print_report(status: int, rows: list[dict], by_author: Counter, by_day: dict[str, int]) -> None:  # print labeled sections to stdout
46
    print("=== API Fetch Summary ===")
47
    print("HTTP status:", status)
48
    print("Parsed commits:", len(rows))
49

50
    print("\n=== Top Authors ===")
51
    for name, count in by_author.most_common(5):  # Counter returns highest counts first
52
        print(f"  {name}: {count}")
53

54
    print("\n=== Commits by Day ===")
55
    for day in sorted(by_day):  # chronological order by date string
56
        print(f"  {day}: {by_day[day]}")
57

58
    print("\n=== Latest Commits ===")
59
    for row in rows[:5]:  # API returns newest commits first
60
        print(f"  {row['sha_short']}  {row['author_name']}  {row['message_first_line']}")
61

62

63
if __name__ == "__main__":
64
    status, items = fetch_commits(OWNER, REPO, PER_PAGE)
65
    parsed = [row for row in (parse_commit(item) for item in items) if row is not None]  # drop None parses
66
    by_author, by_day = summarize(parsed)
67
    print_report(status, parsed, by_author, by_day)

Check: Run the script. You should see summary, top authors, per-day counts, and a short latest-commits list.

Notes for Real-World Usage

Handle non-200 responses explicitly (for example 403 rate limits).
Add retries/timeouts/backoff if this script is run in automation.
For larger result sets, add pagination with page= and loop until empty.
If you later add auth, use an environment variable and Authorization header.