Skip to content

Leadership and Mentoring

First PublishedByAtif Alam

This page covers leadership behaviors that scale for senior engineers on infrastructure, platform, and SRE teams: mentoring, feedback, technical judgment, roadmap influence, and cross-team prioritization. It is not a management or HR guide — performance reviews, hiring decisions, and compensation belong with your manager and people-ops.

”Senior” here means scope and behavior, not a ladder title. Engineers at many levels do this work; the patterns generalize.

Related: QA and reliability guide, Incident response and on-call, Agile for SRE and platform work.

Mentoring fails when it is informal-only — “ping me if you have questions.” Concrete structures make the same time investment more useful.

StructureWhat It IsWhen to Use
Goal-based mentoringMentee picks 1–2 outcomes for the next 1–3 months; weekly check-ins track them.Newer engineers, or anyone learning a new domain.
ShadowingMentee observes the mentor doing real work (on-call, design review, incident).Onboarding to on-call, learning incident command.
Reverse shadowingMentor observes the mentee doing the work; gives feedback after.Validating someone is ready to own a rotation or domain.
Pairing on a real taskTwo engineers work the same change together end-to-end.New tech, scary change, or first independent on-call shift.
Office hoursRecurring open block where any engineer can bring questions.Cross-team mentoring at scale.
  • Hero-call mentoring — only available during incidents. The mentee never builds independent judgment.
  • Take-it-over — mentor jumps in and finishes the task. Faster today, no learning tomorrow.
  • Vague feedback — “looks good” or “needs work” without specifics. See feedback section below.

A common gap in less-experienced engineers is how to investigate, not what they know. Coach the method, not the answer.

A reusable triage frame:

  1. What changed? Recent deploys, config, infra, traffic, time of day. Pair with Network troubleshooting flow and Kubernetes troubleshooting and debugging.
  2. What is the smallest reproducer? A single failing request, one pod, one node.
  3. What is the evidence? Metrics first (Prometheus), then logs (Loki), then traces (distributed tracing).
  4. What is the hypothesis? Name it before testing it; write it in the war-room channel.
  5. What is the test? A specific check that will confirm or rule out the hypothesis.

When mentoring, ask the engineer to state the hypothesis out loud before they run the next command. This is the single highest-leverage coaching move on incidents.

Feedback is a skill. Two compact patterns that work in real engineering settings:

Situation: "In yesterday's design review for the new auth service..."
Behavior: "...you asked the presenter four follow-up questions about failure modes..."
Impact: "...which surfaced the timeout retry bug before it shipped. That was high-leverage."

Works for both reinforcing and corrective feedback. Stays factual and specific.

After a project, sprint, or rotation:

BucketPrompt
ContinueWhat worked that we should keep doing?
StartWhat is missing that we should add?
StopWhat is wasting time or causing harm?
  • Listen first, paraphrase, then respond. Do not defend in the same breath as hearing.
  • Ask for one example when feedback is too abstract.
  • Separate the feedback from the giver — useful feedback can come awkwardly from someone you disagree with.

“Judgment” is hard to teach by lecture. Build it by writing rubrics and running design reviews.

A short, shared rubric anchors discussion and trains newer reviewers:

DimensionQuestion
ProblemIs the problem clearly stated? Who is affected?
AlternativesWhat other approaches were considered, and why were they rejected?
Failure modesWhat breaks first? What is the blast radius?
ReversibilityIf we are wrong, how do we back out, and how expensive is that?
Operational costWho carries the pager? What new alerts and runbooks are needed?
Compliance and securityAny new data flows, secrets, or access boundaries? Pair with CI/CD compliance and audit fieldwork.

Capture significant decisions in architecture decision records (ADRs) in the repo: context, decision, consequences, status. They become onboarding material and audit evidence.

Senior engineers shape roadmaps without owning them. The lever is framing, not authority.

FrameWhat It Looks Like
Risk”Without this work, we estimate a 30% probability of a multi-hour outage in the next quarter, based on three near-misses.”
SLO and error budget”We are 80% through the latency error budget for the quarter; this work directly addresses the top consumer.” See SLOs.
Toil”This rotation spends 12 hours per week on manual cert renewal; automating it returns one engineer-week per month.”
Compliance”We are missing evidence for control X; the next SOC 2 audit is in three months.” See audit fieldwork.
Cost”Right-sizing this fleet is a one-week project that saves $X per month.”

Frame once in numbers, then in a single sentence a non-engineer can repeat in a meeting you are not in.

Resolving Cross-Team Prioritization Conflicts

Section titled “Resolving Cross-Team Prioritization Conflicts”

When two teams disagree on whose work goes first, the shortest path is explicit tradeoffs, not louder advocacy.

A reusable conversation:

  1. State both proposals in one sentence each, neutrally.
  2. List the cost of each delay: who is blocked, what risk increases, what deadline slips.
  3. Name the decision-maker. If unclear, escalate together rather than separately.
  4. Document the decision and the explicit tradeoff that was accepted.

Patterns that help:

  • Shared service catalogs with stated SLOs reduce arguments by making expectations visible. See service readiness checklist.
  • Office hours for the platform team converts ad-hoc fights into queued conversations.
  • Embedded engineers (a temporary rotation onto another team) can resolve a chronic conflict by building shared context.

Working Across Teams in a Cross-Functional Capacity

Section titled “Working Across Teams in a Cross-Functional Capacity”

Most senior infrastructure work is cross-functional by default — security, product, data, support, and platform all have stakes. Patterns that scale:

PatternWhat It Solves
Single point of contact (SPOC) per partner teamReduces channel sprawl; partner team always knows who to ping.
Pre-incident relationshipsThe first time you talk to the database team should not be during a database incident.
Shared dashboardsBoth teams look at the same metrics during a discussion; reduces “is it your side or mine?” loops.
Joint runbooksCo-author with the partner team; they get review credit; the runbook actually gets used.
  • Each junior engineer on your team has at least one named mentor and a concrete goal.
  • Design reviews follow a shared rubric and produce a written decision (ADR or equivalent).
  • Feedback is given in Situation–Behavior–Impact form, not vague labels.
  • Roadmap proposals lead with risk, SLO, toil, compliance, or cost — not preference.
  • Cross-team conflicts are resolved with a written decision and named owners, not a winning argument.
  • You have at least one pre-incident relationship with each adjacent team you depend on.