Leadership and Mentoring
This page covers leadership behaviors that scale for senior engineers on infrastructure, platform, and SRE teams: mentoring, feedback, technical judgment, roadmap influence, and cross-team prioritization. It is not a management or HR guide — performance reviews, hiring decisions, and compensation belong with your manager and people-ops.
”Senior” here means scope and behavior, not a ladder title. Engineers at many levels do this work; the patterns generalize.
Related: QA and reliability guide, Incident response and on-call, Agile for SRE and platform work.
Mentoring Structures
Section titled “Mentoring Structures”Mentoring fails when it is informal-only — “ping me if you have questions.” Concrete structures make the same time investment more useful.
| Structure | What It Is | When to Use |
|---|---|---|
| Goal-based mentoring | Mentee picks 1–2 outcomes for the next 1–3 months; weekly check-ins track them. | Newer engineers, or anyone learning a new domain. |
| Shadowing | Mentee observes the mentor doing real work (on-call, design review, incident). | Onboarding to on-call, learning incident command. |
| Reverse shadowing | Mentor observes the mentee doing the work; gives feedback after. | Validating someone is ready to own a rotation or domain. |
| Pairing on a real task | Two engineers work the same change together end-to-end. | New tech, scary change, or first independent on-call shift. |
| Office hours | Recurring open block where any engineer can bring questions. | Cross-team mentoring at scale. |
Anti-Patterns
Section titled “Anti-Patterns”- Hero-call mentoring — only available during incidents. The mentee never builds independent judgment.
- Take-it-over — mentor jumps in and finishes the task. Faster today, no learning tomorrow.
- Vague feedback — “looks good” or “needs work” without specifics. See feedback section below.
Coaching Debugging Methodology
Section titled “Coaching Debugging Methodology”A common gap in less-experienced engineers is how to investigate, not what they know. Coach the method, not the answer.
A reusable triage frame:
- What changed? Recent deploys, config, infra, traffic, time of day. Pair with Network troubleshooting flow and Kubernetes troubleshooting and debugging.
- What is the smallest reproducer? A single failing request, one pod, one node.
- What is the evidence? Metrics first (Prometheus), then logs (Loki), then traces (distributed tracing).
- What is the hypothesis? Name it before testing it; write it in the war-room channel.
- What is the test? A specific check that will confirm or rule out the hypothesis.
When mentoring, ask the engineer to state the hypothesis out loud before they run the next command. This is the single highest-leverage coaching move on incidents.
Feedback Patterns
Section titled “Feedback Patterns”Feedback is a skill. Two compact patterns that work in real engineering settings:
Situation–Behavior–Impact (SBI)
Section titled “Situation–Behavior–Impact (SBI)”Situation: "In yesterday's design review for the new auth service..."Behavior: "...you asked the presenter four follow-up questions about failure modes..."Impact: "...which surfaced the timeout retry bug before it shipped. That was high-leverage."Works for both reinforcing and corrective feedback. Stays factual and specific.
Continue / Start / Stop
Section titled “Continue / Start / Stop”After a project, sprint, or rotation:
| Bucket | Prompt |
|---|---|
| Continue | What worked that we should keep doing? |
| Start | What is missing that we should add? |
| Stop | What is wasting time or causing harm? |
Receiving Feedback
Section titled “Receiving Feedback”- Listen first, paraphrase, then respond. Do not defend in the same breath as hearing.
- Ask for one example when feedback is too abstract.
- Separate the feedback from the giver — useful feedback can come awkwardly from someone you disagree with.
Building Consistent Technical Judgment
Section titled “Building Consistent Technical Judgment”“Judgment” is hard to teach by lecture. Build it by writing rubrics and running design reviews.
Lightweight Design Review Rubric
Section titled “Lightweight Design Review Rubric”A short, shared rubric anchors discussion and trains newer reviewers:
| Dimension | Question |
|---|---|
| Problem | Is the problem clearly stated? Who is affected? |
| Alternatives | What other approaches were considered, and why were they rejected? |
| Failure modes | What breaks first? What is the blast radius? |
| Reversibility | If we are wrong, how do we back out, and how expensive is that? |
| Operational cost | Who carries the pager? What new alerts and runbooks are needed? |
| Compliance and security | Any new data flows, secrets, or access boundaries? Pair with CI/CD compliance and audit fieldwork. |
Decision Records
Section titled “Decision Records”Capture significant decisions in architecture decision records (ADRs) in the repo: context, decision, consequences, status. They become onboarding material and audit evidence.
Roadmap Influence
Section titled “Roadmap Influence”Senior engineers shape roadmaps without owning them. The lever is framing, not authority.
| Frame | What It Looks Like |
|---|---|
| Risk | ”Without this work, we estimate a 30% probability of a multi-hour outage in the next quarter, based on three near-misses.” |
| SLO and error budget | ”We are 80% through the latency error budget for the quarter; this work directly addresses the top consumer.” See SLOs. |
| Toil | ”This rotation spends 12 hours per week on manual cert renewal; automating it returns one engineer-week per month.” |
| Compliance | ”We are missing evidence for control X; the next SOC 2 audit is in three months.” See audit fieldwork. |
| Cost | ”Right-sizing this fleet is a one-week project that saves $X per month.” |
Frame once in numbers, then in a single sentence a non-engineer can repeat in a meeting you are not in.
Resolving Cross-Team Prioritization Conflicts
Section titled “Resolving Cross-Team Prioritization Conflicts”When two teams disagree on whose work goes first, the shortest path is explicit tradeoffs, not louder advocacy.
A reusable conversation:
- State both proposals in one sentence each, neutrally.
- List the cost of each delay: who is blocked, what risk increases, what deadline slips.
- Name the decision-maker. If unclear, escalate together rather than separately.
- Document the decision and the explicit tradeoff that was accepted.
Patterns that help:
- Shared service catalogs with stated SLOs reduce arguments by making expectations visible. See service readiness checklist.
- Office hours for the platform team converts ad-hoc fights into queued conversations.
- Embedded engineers (a temporary rotation onto another team) can resolve a chronic conflict by building shared context.
Working Across Teams in a Cross-Functional Capacity
Section titled “Working Across Teams in a Cross-Functional Capacity”Most senior infrastructure work is cross-functional by default — security, product, data, support, and platform all have stakes. Patterns that scale:
| Pattern | What It Solves |
|---|---|
| Single point of contact (SPOC) per partner team | Reduces channel sprawl; partner team always knows who to ping. |
| Pre-incident relationships | The first time you talk to the database team should not be during a database incident. |
| Shared dashboards | Both teams look at the same metrics during a discussion; reduces “is it your side or mine?” loops. |
| Joint runbooks | Co-author with the partner team; they get review credit; the runbook actually gets used. |
Checklist
Section titled “Checklist”- Each junior engineer on your team has at least one named mentor and a concrete goal.
- Design reviews follow a shared rubric and produce a written decision (ADR or equivalent).
- Feedback is given in Situation–Behavior–Impact form, not vague labels.
- Roadmap proposals lead with risk, SLO, toil, compliance, or cost — not preference.
- Cross-team conflicts are resolved with a written decision and named owners, not a winning argument.
- You have at least one pre-incident relationship with each adjacent team you depend on.
Related
Section titled “Related”- Agile for SRE and platform work — sprint planning and toil budgets that protect mentoring time
- Incident response and on-call — coaching opportunities during real incidents
- QA and reliability guide — the broader reliability practice this fits into