How site reliability engineers (SREs) can "shift left" using a unified service catalog

February 17, 2025

Ready to start?

How site reliability engineers (SREs) can "shift left" using a unified service catalog

Site reliability engineers (SREs) and developers often face the challenge of balancing speed with stability. For the most part, developers tend to focus on building features and coding, while SREs make sure those features run smoothly in production. But when something breaks, the lines blur—and that’s where the problems start.

The "shift left" movement offers a way forward. It allows teams to tackle reliability and operational concerns earlier in the development process. By sharing ownership, teams can reduce friction and work better together.

The SRE and developer disconnect

SREs are responsible for maintaining reliable systems, overseeing uptime, managing incidents, and handling cloud infrastructure. Developers focus on writing code and shipping features. However, these roles often overlap, creating friction between them.

This tension arises from misaligned priorities and a lack of visibility into each other’s workflows. Developers prioritize shipping features and may neglect production requirements until problems arise. While they create applications, they often don’t feel accountable for their reliability. Conversely, SREs strive to maintain uptime but may lack context regarding recent application changes. These dynamics lead to inefficiencies, such as:

Incomplete visibility leaves SREs unprepared during incidents due to missing insights into recent deployments, dependencies, or configuration changes.
Fragmented ownership results in unclear accountability, causing delays in resolving critical issues.
A lack of shared frameworks hinders communication and coordination, particularly during high-pressure incidents.
Product owners or business stakeholders may apply additional pressure on SREs without clear processes, exacerbating an already stressful situation.

Although developers increasingly embrace the shift-left movement, focusing on production requirements, secure coding, and leveraging AI tools to enhance their workflows, these efforts are insufficient. Developers must take full accountability for their applications, encompassing code and reliability. Additionally, SREs and developers must collaborate on a shared framework with a unified source of truth for service ownership, health, and dependencies. This foundation enables faster, more effective workflows and mitigates team disconnects.

Step-by-step: Shifting left in incident management

Consider a scenario where a high-severity incident occurs during peak traffic. SREs may have all the infrastructure metrics but lack insights into recent application updates or dependencies. On the other hand, developers might not have access to production monitoring tools, leaving them blind to the issue's root cause. This lack of shared responsibility turns a manageable problem into a prolonged outage.

Let’s explore a step-by-step guide for managing an incident or outage to demonstrate the impact of shifting left.

1. Proactive prevention

Preventing incidents begins long before they occur. Teams can take several proactive steps to ensure production readiness:

Define ownership: Use a unified service catalog to establish clear ownership for every service, including its dependencies, health metrics, and escalation paths.
Automate readiness checks: Implement automated checks for production readiness, such as ensuring proper observability setups, validating CI/CD pipelines, and checking for outdated dependencies.
Monitor proactively: Set up alerts for potential issues, such as increasing error rates, slow response times, or anomalies in deployment processes. These alerts allow teams to address problems before they escalate.

2. Detecting and diagnosing the issue

When an incident occurs, swift detection and diagnosis are crucial:

Unified visibility: Teams use a centralized portal to access real-time metrics, logs, and dependency maps. This shared view ensures everyone has the information needed to assess the problem.
Ownership identification: The service catalog automatically identifies the responsible team or individual and notifies them through pre-configured communication channels like Slack or Teams.
Cross-functional insights: Both developers and SREs can see relevant details about recent deployments, configuration changes, and application updates, enabling faster root cause analysis.

3. Coordinating the response

With clear ownership and diagnostic data, the team can focus on resolving the issue:

Automated incident channels: An automated communication channel is created to bring together the right stakeholders and provide access to relevant tools and data.
Self-service remediation: Developers use predefined workflows to address the issue, such as rolling back a faulty deployment, restarting services, or scaling resources. These actions can be executed directly from the portal, reducing dependence on SRE intervention.
Escalation protocols: If the issue requires specialized expertise, SREs step in to handle complex problems or enforce operational standards.

4. Post-incident improvements

After resolving the incident, teams focus on continuous improvement:

Root cause analysis: Teams collaborate to understand what went wrong and document their findings in the service catalog.
Tool enhancements: Adjust monitoring tools and automated workflows to prevent similar issues in the future.
Process refinement: Incorporate feedback to improve response procedures, training, and documentation.

As you can see, the solution is to redefine ownership and give everyone access to the tools they need. SREs should focus on setting standards and automating reliability tasks, while developers should own their applications end-to-end, including uptime and health.

Unified service catalogs: A key to shifting left

A unified service catalog can bridge the gap. It provides a clear view of services, their owners, and their dependencies. This is an essential piece when implementing the "shift left" approach. By serving as a single source of truth, it provides:

Clear ownership: Ensuring every service has a defined owner and team responsible for its health and reliability.
Comprehensive visibility: Offering insights into dependencies, configurations, and compliance with production readiness standards.
Efficient collaboration: Supporting self-service actions and automated workflows to enable faster, more effective incident resolution.

While the service catalog is critical, it’s part of a broader ecosystem that includes self-service workflows, incident management automation, and collaboration tools. Together, these features empower teams to work more efficiently and confidently.

Real wins with unified tools

Teams using unified service catalogs see improvements in proactive prevention and reactive recovery. Here’s a deeper look at the benefits:

Proactive incident prevention: With automated compliance tracking, teams can identify and resolve issues before they escalate. For instance, a team might receive automated alerts when an application isn’t meeting production readiness criteria, such as missing observability setups or outdated dependencies. By addressing these gaps before release, the team avoids outages and ensures smoother launches.
Faster recovery times: During an incident, such as when a key service goes down during a peak traffic event, developers can quickly access self-service workflows to roll back changes, restart services, or scale resources. Instead of waiting for SREs to intervene, the developer responsible can follow a pre-defined remediation path in the portal—rolling back a recent deployment or scaling resources with a single click. This significantly reduces the Mean Time to Recovery (MTTR).
Improved collaboration: With clear visibility into ownership, teams avoid confusion during high-pressure situations. For example, when a failure occurs, a unified portal immediately identifies the service owner and pulls in relevant stakeholders through automated Slack channels. Teams can focus on solving the problem rather than debating who should take action.

Imagine a critical outage occurs late at night. Instead of scrambling to figure out who owns the impacted service, the unified portal automatically creates a dedicated Slack channel for the incident, notifies the service owner, and provides access to critical metrics, logs, and dependency maps. Within minutes, the team can collaborate effectively to resolve the issue, cutting downtime dramatically. This streamlined approach exemplifies the power of shifting left: equipping teams with tools to act quickly, confidently, and efficiently.

A new ownership model

Shifting left supports a shared accountability model. Developers own their applications, including reliability. SREs provide guidance, tools, and high-level support when needed. This balance ensures everyone can focus on what they do best.

For example, developers take the lead in managing the response during an incident. They use the tools the service catalog provides to diagnose and fix the issue. SREs step in only for complex problems or to ensure standards are met. This approach reduces bottlenecks and empowers teams to work more effectively.

Ready to shift left?

A unified service catalog can transform how SREs and developers collaborate. It fosters collaboration, reduces bottlenecks, and keeps systems reliable. Speak to like-minded people who are also shifting left in Port’s community. Or see how you can shift left using Port’s live demo here.

Tags:

Use Case

Check out Port's pre-populated demo and see what it's all about.

Check live demo

No email required

Check out the 2025 State of Internal Developer Portals report

See the full report

No email required

Contact sales for a technical product walkthrough

Let’s start

Open a free Port account. No credit card required

Let’s start

Watch Port live coding videos - setting up an internal developer portal & platform

Let’s start

Check out Port's pre-populated demo and see what it's all about.

(no email required)

Let’s start

Contact sales for a technical walkthrough of Port

Let’s start

Open a free Port account. No credit card required

Let’s start

Watch Port live coding videos - setting up an internal developer portal & platform

Let’s start

Book a demo right now to check out Port's developer portal yourself

Apply to join the Beta for Port's new Backstage plugin

Apply for beta

It's a Trap - Jenkins as Self service UI

How do GitOps affect developer experience?

It's a Trap - Jenkins as Self service UI. Click her to download the eBook

Download eBook

Learning from CyberArk - building an internal developer platform in-house

Learn more about Port’s Backstage plugin

Build Backstage better — with Port

Read the plugin docs

Return to Backstage Plugin docs

Example JSON block

{
  "foo": "bar"
}

Order Domain

{
  "properties": {},
  "relations": {},
  "title": "Orders",
  "identifier": "Orders"
}

Cart System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Cart",
  "title": "Cart"
}

Products System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Products",
  "title": "Products"
}

Cart Resource

{
  "properties": {
    "type": "postgress"
  },
  "relations": {},
  "icon": "GPU",
  "title": "Cart SQL database",
  "identifier": "cart-sql-sb"
}

Cart API

{
 "identifier": "CartAPI",
 "title": "Cart API",
 "blueprint": "API",
 "properties": {
   "type": "Open API"
 },
 "relations": {
   "provider": "CartService"
 },
 "icon": "Link"
}

Core Kafka Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Kafka Library",
  "identifier": "CoreKafkaLibrary"
}

Core Payment Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Payment Library",
  "identifier": "CorePaymentLibrary"
}

Cart Service JSON

{
 "identifier": "CartService",
 "title": "Cart Service",
 "blueprint": "Component",
 "properties": {
   "type": "service"
 },
 "relations": {
   "system": "Cart",
   "resources": [
     "cart-sql-sb"
   ],
   "consumesApi": [],
   "components": [
     "CorePaymentLibrary",
     "CoreKafkaLibrary"
   ]
 },
 "icon": "Cloud"
}

Products Service JSON

{
  "identifier": "ProductsService",
  "title": "Products Service",
  "blueprint": "Component",
  "properties": {
    "type": "service"
  },
  "relations": {
    "system": "Products",
    "consumesApi": [
      "CartAPI"
    ],
    "components": []
  }
}

Component Blueprint

{
 "identifier": "Component",
 "title": "Component",
 "icon": "Cloud",
 "schema": {
   "properties": {
     "type": {
       "enum": [
         "service",
         "library"
       ],
       "icon": "Docs",
       "type": "string",
       "enumColors": {
         "service": "blue",
         "library": "green"
       }
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "system": {
     "target": "System",
     "required": false,
     "many": false
   },
   "resources": {
     "target": "Resource",
     "required": false,
     "many": true
   },
   "consumesApi": {
     "target": "API",
     "required": false,
     "many": true
   },
   "components": {
     "target": "Component",
     "required": false,
     "many": true
   },
   "providesApi": {
     "target": "API",
     "required": false,
     "many": false
   }
 }
}

Resource Blueprint

{
 “identifier”: “Resource”,
 “title”: “Resource”,
 “icon”: “DevopsTool”,
 “schema”: {
   “properties”: {
     “type”: {
       “enum”: [
         “postgress”,
         “kafka-topic”,
         “rabbit-queue”,
         “s3-bucket”
       ],
       “icon”: “Docs”,
       “type”: “string”
     }
   },
   “required”: []
 },
 “mirrorProperties”: {},
 “formulaProperties”: {},
 “calculationProperties”: {},
 “relations”: {}
}

API Blueprint

{
 "identifier": "API",
 "title": "API",
 "icon": "Link",
 "schema": {
   "properties": {
     "type": {
       "type": "string",
       "enum": [
         "Open API",
         "grpc"
       ]
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "provider": {
     "target": "Component",
     "required": true,
     "many": false
   }
 }
}

Domain Blueprint

{
 "identifier": "Domain",
 "title": "Domain",
 "icon": "Server",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {}
}

System Blueprint

{
 "identifier": "System",
 "title": "System",
 "icon": "DevopsTool",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "domain": {
     "target": "Domain",
     "required": true,
     "many": false
   }
 }
}

Microservices SDLC

Scaffold a new microservice
Deploy (canary or blue-green)
Feature flagging
Revert
Lock deployments
Add Secret
Force merge pull request (skip tests on crises)
Add environment variable to service
Add IaC to the service
Upgrade package version

Development environments

Spin up a developer environment for 5 days
ETL mock data to environment
Invite developer to the environment
Extend TTL by 3 days

Cloud resources

Provision a cloud resource
Modify a cloud resource
Get permissions to access cloud resource

SRE actions

Update pod count
Update auto-scaling group
Execute incident response runbook automation

Data Engineering

Add / Remove / Update Column to table
Run Airflow DAG
Duplicate table

Backoffice

Change customer configuration
Update customer software version
Upgrade - Downgrade plan tier
Create - Delete customer

Machine learning actions

Train model
Pre-process dataset
Deploy
A/B testing traffic route
Revert
Spin up remote Jupyter notebook

Engineering tools

Observability
Tasks management
CI/CD
On-Call management
Troubleshooting tools
DevSecOps
Runbooks

Infrastructure

Cloud Resources
K8S
Containers & Serverless
IaC
Databases
Environments
Regions

Software and more

Microservices
Docker Images
Docs
APIs
3rd parties
Runbooks
Cron jobs

Starting with Port is simple, fast and free.

Let’s start

Ready to start?

The SRE and developer disconnect

Step-by-step: Shifting left in incident management

1. Proactive prevention

2. Detecting and diagnosing the issue

3. Coordinating the response

4. Post-incident improvements

Unified service catalogs: A key to shifting left

Real wins with unified tools

A new ownership model

Ready to shift left?

Tags:

Previous article

Next article

Check out Port's pre-populated demo and see what it's all about.

Check out the 2025 State of Internal Developer Portals report

Contact sales for a technical product walkthrough

Open a free Port account. No credit card required

Watch Port live coding videos - setting up an internal developer portal & platform

Check out Port's pre-populated demo and see what it's all about.

Contact sales for a technical walkthrough of Port

Open a free Port account. No credit card required

Watch Port live coding videos - setting up an internal developer portal & platform

Book a demo right now to check out Port's developer portal yourself

Apply to join the Beta for Port's new Backstage plugin

It's a Trap - Jenkins as Self service UI

How do GitOps affect developer experience?

It's a Trap - Jenkins as Self service UI. Click her to download the eBook

Learning from CyberArk - building an internal developer platform in-house

Further reading:

Learn more about Port’s Backstage plugin

Build Backstage better — with Port

Example JSON block

Order Domain

Cart System

Products System

Cart Resource

Cart API

Core Kafka Library

Core Payment Library

Cart Service JSON

Products Service JSON

Component Blueprint

Resource Blueprint

API Blueprint

Domain Blueprint

System Blueprint

Microservices SDLC

Development environments

Cloud resources

SRE actions

Data Engineering

Backoffice

Machine learning actions

Engineering tools

Infrastructure

Software and more

You may also be interested in

How to measure the ROI of GenAI tools

What is an internal developer portal homepage?

What is the ROI of Spotify’s Backstage internal developer portal?

Starting with Port is simple, fast and free.