01 / 16
Matthew Evans
Consulting Engineer, WWT
CISSP · Charlotte, NC

Infrastructure background. Today I spend more than half my day writing code.

Credentials

By the Numbers

20+
Years in Cloud
& Reliability Engineering
280+
Production Services
Migrated to Kubernetes
99.99%
Reliability Achieved
on Re-Architected Platforms
Insurance Financial Services Security SaaS Hedge Fund Banking Aviation
2011

Before SRE had a name

Automating toil · Reducing alert noise · Enabling confident decisions

Definition

SRE is a Framework

Business
Strategy
Engineering
Execution

A shared language for what you need and what it costs

The Core Concept

Shared Language

Business articulates
what they need
Engineering responds with
what it costs

Effort · Risk · Investment

01
Blameless Culture
When incidents happen, we focus on what broke and how to prevent it.
Never who broke it.
02
Consistent Decision-Making
Business Strategy
LOB Priorities
Engineering
Same playbook. Traceable decisions.
03
Right-Sized Investment
Monitor what the business actually needs.
No over-spending on unused tools.
No blind spots that cause surprises.

SRE isn't about saying anyone did anything wrong. It's about giving those same talented people a consistent framework so they can communicate more effectively with the business.

It makes everyone's life easier.

Real-World Examples

Three Organizations, Three Angles

Hedge Fund

Process & Visibility

Security SaaS

Incident Mgmt & Alerts

Fintech

Cost Optimization
Organization A · Hedge Fund

The Situation

  • 15-year-old deployment pipeline
  • Documented in a single Word doc
  • Engineers SSH'd into servers to troubleshoot
  • Identical servers behaved differently in production
  • Zero end-to-end visibility
Organization A · Hedge Fund

The Result

Human deployment errors dropped to near-zero

"don't touch it" "deploy with confidence"
Organization B · Security SaaS
5,000

alerts

Military · Government · Energy · Financial · Medical

No incident process. Engineers couldn't self-serve diagnostics.

Organization B · Security SaaS
5,000
Before
5
After

Every remaining alert: genuinely actionable

Organization C · Fintech

"We're overpaying
for our stage"

They were right.

Organization C · Fintech
$10k
Per Month
$1k
Per Month
90% reduction
+ Regional failover they didn't have before
Key Takeaway

SRE is a Framework,
Not a Criticism

Org A

Process & Visibility

Deployment confidence

Org B

Alerts & Incidents

5,000 → 5 alerts

Org C

Cost Optimization

90% spend reduction

Discussion

Let's Talk About Your World

  • What does your team spend the most time on that feels repetitive or manual?
  • When an incident happens today, what does the first 15 minutes look like?
  • If you could have one piece of data you don't have today, what would it be?