Leadership and Mentoring

First PublishedApr 29, 2026ByAtif Alam

This page covers leadership behaviors that scale for senior engineers on infrastructure, platform, and SRE teams: mentoring, feedback, technical judgment, roadmap influence, and cross-team prioritization. It is not a management or HR guide — performance reviews, hiring decisions, and compensation belong with your manager and people-ops.

”Senior” here means scope and behavior, not a ladder title. Engineers at many levels do this work; the patterns generalize.

Mentoring Structures

Mentoring fails when it is informal-only — “ping me if you have questions.” Concrete structures make the same time investment more useful.

Structure	What It Is	When to Use
Goal-based mentoring	Mentee picks 1–2 outcomes for the next 1–3 months; weekly check-ins track them.	Newer engineers, or anyone learning a new domain.
Shadowing	Mentee observes the mentor doing real work (on-call, design review, incident).	Onboarding to on-call, learning incident command.
Reverse shadowing	Mentor observes the mentee doing the work; gives feedback after.	Validating someone is ready to own a rotation or domain.
Pairing on a real task	Two engineers work the same change together end-to-end.	New tech, scary change, or first independent on-call shift.
Office hours	Recurring open block where any engineer can bring questions.	Cross-team mentoring at scale.

Anti-Patterns

Hero-call mentoring — only available during incidents. The mentee never builds independent judgment.
Take-it-over — mentor jumps in and finishes the task. Faster today, no learning tomorrow.
Vague feedback — “looks good” or “needs work” without specifics. See feedback section below.

Coaching Debugging Methodology

A common gap in less-experienced engineers is how to investigate, not what they know. Coach the method, not the answer.

A reusable triage frame:

What changed? Recent deploys, config, infra, traffic, time of day. Pair with Network troubleshooting flow and Kubernetes troubleshooting and debugging.
What is the smallest reproducer? A single failing request, one pod, one node.
What is the evidence? Metrics first (Prometheus), then logs (Loki), then traces (distributed tracing).
What is the hypothesis? Name it before testing it; write it in the war-room channel.
What is the test? A specific check that will confirm or rule out the hypothesis.

When mentoring, ask the engineer to state the hypothesis out loud before they run the next command. This is the single highest-leverage coaching move on incidents.

Feedback Patterns

Feedback is a skill. Two compact patterns that work in real engineering settings:

Situation–Behavior–Impact (SBI)

1
Situation:  "In yesterday's design review for the new auth service..."
2
Behavior:   "...you asked the presenter four follow-up questions about failure modes..."
3
Impact:     "...which surfaced the timeout retry bug before it shipped. That was high-leverage."

Works for both reinforcing and corrective feedback. Stays factual and specific.

Continue / Start / Stop

After a project, sprint, or rotation:

Bucket	Prompt
Continue	What worked that we should keep doing?
Start	What is missing that we should add?
Stop	What is wasting time or causing harm?

Receiving Feedback

Listen first, paraphrase, then respond. Do not defend in the same breath as hearing.
Ask for one example when feedback is too abstract.
Separate the feedback from the giver — useful feedback can come awkwardly from someone you disagree with.

Building Consistent Technical Judgment

“Judgment” is hard to teach by lecture. Build it by writing rubrics and running design reviews.

Lightweight Design Review Rubric

A short, shared rubric anchors discussion and trains newer reviewers:

Dimension	Question
Problem	Is the problem clearly stated? Who is affected?
Alternatives	What other approaches were considered, and why were they rejected?
Failure modes	What breaks first? What is the blast radius?
Reversibility	If we are wrong, how do we back out, and how expensive is that?
Operational cost	Who carries the pager? What new alerts and runbooks are needed?
Compliance and security	Any new data flows, secrets, or access boundaries? Pair with CI/CD compliance and audit fieldwork.

Decision Records

Capture significant decisions in architecture decision records (ADRs) in the repo: context, decision, consequences, status. They become onboarding material and audit evidence.

Roadmap Influence

Senior engineers shape roadmaps without owning them. The lever is framing, not authority.

Frame	What It Looks Like
Risk	”Without this work, we estimate a 30% probability of a multi-hour outage in the next quarter, based on three near-misses.”
SLO and error budget	”We are 80% through the latency error budget for the quarter; this work directly addresses the top consumer.” See SLOs.
Toil	”This rotation spends 12 hours per week on manual cert renewal; automating it returns one engineer-week per month.”
Compliance	”We are missing evidence for control X; the next SOC 2 audit is in three months.” See audit fieldwork.
Cost	”Right-sizing this fleet is a one-week project that saves $X per month.”

Frame once in numbers, then in a single sentence a non-engineer can repeat in a meeting you are not in.

Resolving Cross-Team Prioritization Conflicts

When two teams disagree on whose work goes first, the shortest path is explicit tradeoffs, not louder advocacy.

A reusable conversation:

State both proposals in one sentence each, neutrally.
List the cost of each delay: who is blocked, what risk increases, what deadline slips.
Name the decision-maker. If unclear, escalate together rather than separately.
Document the decision and the explicit tradeoff that was accepted.

Patterns that help:

Shared service catalogs with stated SLOs reduce arguments by making expectations visible. See service readiness checklist.
Office hours for the platform team converts ad-hoc fights into queued conversations.
Embedded engineers (a temporary rotation onto another team) can resolve a chronic conflict by building shared context.

Working Across Teams in a Cross-Functional Capacity

Most senior infrastructure work is cross-functional by default — security, product, data, support, and platform all have stakes. Patterns that scale:

Pattern	What It Solves
Single point of contact (SPOC) per partner team	Reduces channel sprawl; partner team always knows who to ping.
Pre-incident relationships	The first time you talk to the database team should not be during a database incident.
Shared dashboards	Both teams look at the same metrics during a discussion; reduces “is it your side or mine?” loops.
Joint runbooks	Co-author with the partner team; they get review credit; the runbook actually gets used.

Checklist

Each junior engineer on your team has at least one named mentor and a concrete goal.
Design reviews follow a shared rubric and produce a written decision (ADR or equivalent).
Feedback is given in Situation–Behavior–Impact form, not vague labels.
Roadmap proposals lead with risk, SLO, toil, compliance, or cost — not preference.
Cross-team conflicts are resolved with a written decision and named owners, not a winning argument.
You have at least one pre-incident relationship with each adjacent team you depend on.

Agile for SRE and platform work — sprint planning and toil budgets that protect mentoring time
Incident response and on-call — coaching opportunities during real incidents
QA and reliability guide — the broader reliability practice this fits into