How site reliability engineers (SREs) can "shift left" using a unified service catalog

February 17, 2025

Ready to start?

How site reliability engineers (SREs) can "shift left" using a unified service catalog

Site reliability engineers (SREs) and developers often face the challenge of balancing speed with stability. For the most part, developers tend to focus on building features and coding, while SREs make sure those features run smoothly in production. But when something breaks, the lines blur—and that’s where the problems start.

The "shift left" movement offers a way forward. It allows teams to tackle reliability and operational concerns earlier in the development process. By sharing ownership, teams can reduce friction and work better together.

The SRE and developer disconnect 

SREs are responsible for maintaining reliable systems, overseeing uptime, managing incidents, and handling cloud infrastructure. Developers focus on writing code and shipping features. However, these roles often overlap, creating friction between them.

This tension arises from misaligned priorities and a lack of visibility into each other’s workflows. Developers prioritize shipping features and may neglect production requirements until problems arise. While they create applications, they often don’t feel accountable for their reliability. Conversely, SREs strive to maintain uptime but may lack context regarding recent application changes. These dynamics lead to inefficiencies, such as:

  • Incomplete visibility leaves SREs unprepared during incidents due to missing insights into recent deployments, dependencies, or configuration changes. 
  • Fragmented ownership results in unclear accountability, causing delays in resolving critical issues. 
  • A lack of shared frameworks hinders communication and coordination, particularly during high-pressure incidents. 
  • Product owners or business stakeholders may apply additional pressure on SREs without clear processes, exacerbating an already stressful situation.

Although developers increasingly embrace the shift-left movement, focusing on production requirements, secure coding, and leveraging AI tools to enhance their workflows, these efforts are insufficient. Developers must take full accountability for their applications, encompassing code and reliability. Additionally, SREs and developers must collaborate on a shared framework with a unified source of truth for service ownership, health, and dependencies. This foundation enables faster, more effective workflows and mitigates team disconnects.

Step-by-step: Shifting left in incident management

Consider a scenario where a high-severity incident occurs during peak traffic. SREs may have all the infrastructure metrics but lack insights into recent application updates or dependencies. On the other hand, developers might not have access to production monitoring tools, leaving them blind to the issue's root cause. This lack of shared responsibility turns a manageable problem into a prolonged outage.

Let’s explore a step-by-step guide for managing an incident or outage to demonstrate the impact of shifting left. 

1. Proactive prevention

Preventing incidents begins long before they occur. Teams can take several proactive steps to ensure production readiness:

  • Define ownership: Use a unified service catalog to establish clear ownership for every service, including its dependencies, health metrics, and escalation paths.
  • Automate readiness checks: Implement automated checks for production readiness, such as ensuring proper observability setups, validating CI/CD pipelines, and checking for outdated dependencies.
  • Monitor proactively: Set up alerts for potential issues, such as increasing error rates, slow response times, or anomalies in deployment processes. These alerts allow teams to address problems before they escalate.

2. Detecting and diagnosing the issue

When an incident occurs, swift detection and diagnosis are crucial:

  • Unified visibility: Teams use a centralized portal to access real-time metrics, logs, and dependency maps. This shared view ensures everyone has the information needed to assess the problem.
  • Ownership identification: The service catalog automatically identifies the responsible team or individual and notifies them through pre-configured communication channels like Slack or Teams.
  • Cross-functional insights: Both developers and SREs can see relevant details about recent deployments, configuration changes, and application updates, enabling faster root cause analysis.

3. Coordinating the response

With clear ownership and diagnostic data, the team can focus on resolving the issue:

  • Automated incident channels: An automated communication channel is created to bring together the right stakeholders and provide access to relevant tools and data.
  • Self-service remediation: Developers use predefined workflows to address the issue, such as rolling back a faulty deployment, restarting services, or scaling resources. These actions can be executed directly from the portal, reducing dependence on SRE intervention.
  • Escalation protocols: If the issue requires specialized expertise, SREs step in to handle complex problems or enforce operational standards.

4. Post-incident improvements

After resolving the incident, teams focus on continuous improvement:

  • Root cause analysis: Teams collaborate to understand what went wrong and document their findings in the service catalog.
  • Tool enhancements: Adjust monitoring tools and automated workflows to prevent similar issues in the future.
  • Process refinement: Incorporate feedback to improve response procedures, training, and documentation.

As you can see, the solution is to redefine ownership and give everyone access to the tools they need. SREs should focus on setting standards and automating reliability tasks, while developers should own their applications end-to-end, including uptime and health.

Unified service catalogs: A key to shifting left

A unified service catalog can bridge the gap. It provides a clear view of services, their owners, and their dependencies. This is an essential piece when implementing the "shift left" approach. By serving as a single source of truth, it provides:

  • Clear ownership: Ensuring every service has a defined owner and team responsible for its health and reliability.
  • Comprehensive visibility: Offering insights into dependencies, configurations, and compliance with production readiness standards.
  • Efficient collaboration: Supporting self-service actions and automated workflows to enable faster, more effective incident resolution.

While the service catalog is critical, it’s part of a broader ecosystem that includes self-service workflows, incident management automation, and collaboration tools. Together, these features empower teams to work more efficiently and confidently.

Real wins with unified tools

Teams using unified service catalogs see improvements in proactive prevention and reactive recovery. Here’s a deeper look at the benefits:

  • Proactive incident prevention: With automated compliance tracking, teams can identify and resolve issues before they escalate. For instance, a team might receive automated alerts when an application isn’t meeting production readiness criteria, such as missing observability setups or outdated dependencies. By addressing these gaps before release, the team avoids outages and ensures smoother launches.
  • Faster recovery times: During an incident, such as when a key service goes down during a peak traffic event, developers can quickly access self-service workflows to roll back changes, restart services, or scale resources. Instead of waiting for SREs to intervene, the developer responsible can follow a pre-defined remediation path in the portal—rolling back a recent deployment or scaling resources with a single click. This significantly reduces the Mean Time to Recovery (MTTR).
  • Improved collaboration: With clear visibility into ownership, teams avoid confusion during high-pressure situations. For example, when a failure occurs, a unified portal immediately identifies the service owner and pulls in relevant stakeholders through automated Slack channels. Teams can focus on solving the problem rather than debating who should take action.

Imagine a critical outage occurs late at night. Instead of scrambling to figure out who owns the impacted service, the unified portal automatically creates a dedicated Slack channel for the incident, notifies the service owner, and provides access to critical metrics, logs, and dependency maps. Within minutes, the team can collaborate effectively to resolve the issue, cutting downtime dramatically. This streamlined approach exemplifies the power of shifting left: equipping teams with tools to act quickly, confidently, and efficiently.

A new ownership model

Shifting left supports a shared accountability model. Developers own their applications, including reliability. SREs provide guidance, tools, and high-level support when needed. This balance ensures everyone can focus on what they do best.

For example, developers take the lead in managing the response during an incident. They use the tools the service catalog provides to diagnose and fix the issue. SREs step in only for complex problems or to ensure standards are met. This approach reduces bottlenecks and empowers teams to work more effectively.

Ready to shift left?

A unified service catalog can transform how SREs and developers collaborate. It fosters collaboration, reduces bottlenecks, and keeps systems reliable. Speak to like-minded people who are also shifting left in Port’s community. Or see how you can shift left using Port’s live demo here.

{{cta_1}}

Check out Port's pre-populated demo and see what it's all about.

Check live demo

No email required

{{cta_survey}}

Check out the 2025 State of Internal Developer Portals report

See the full report

No email required

{{cta_2}}

Contact sales for a technical product walkthrough

Let’s start
{{cta_3}}

Open a free Port account. No credit card required

Let’s start
{{cta_4}}

Watch Port live coding videos - setting up an internal developer portal & platform

{{cta_5}}

Check out Port's pre-populated demo and see what it's all about.

(no email required)

Let’s start
{{cta_6}}

Contact sales for a technical walkthrough of Port

Let’s start
{{cta_7}}

Open a free Port account. No credit card required

Let’s start
{{cta_8}}

Watch Port live coding videos - setting up an internal developer portal & platform

{{cta-demo}}
{{reading-box-backstage-vs-port}}
{{cta-backstage-docs-button}}

Example JSON block

{
  "foo": "bar"
}

Order Domain

{
  "properties": {},
  "relations": {},
  "title": "Orders",
  "identifier": "Orders"
}

Cart System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Cart",
  "title": "Cart"
}

Products System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Products",
  "title": "Products"
}

Cart Resource

{
  "properties": {
    "type": "postgress"
  },
  "relations": {},
  "icon": "GPU",
  "title": "Cart SQL database",
  "identifier": "cart-sql-sb"
}

Cart API

{
 "identifier": "CartAPI",
 "title": "Cart API",
 "blueprint": "API",
 "properties": {
   "type": "Open API"
 },
 "relations": {
   "provider": "CartService"
 },
 "icon": "Link"
}

Core Kafka Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Kafka Library",
  "identifier": "CoreKafkaLibrary"
}

Core Payment Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Payment Library",
  "identifier": "CorePaymentLibrary"
}

Cart Service JSON

{
 "identifier": "CartService",
 "title": "Cart Service",
 "blueprint": "Component",
 "properties": {
   "type": "service"
 },
 "relations": {
   "system": "Cart",
   "resources": [
     "cart-sql-sb"
   ],
   "consumesApi": [],
   "components": [
     "CorePaymentLibrary",
     "CoreKafkaLibrary"
   ]
 },
 "icon": "Cloud"
}

Products Service JSON

{
  "identifier": "ProductsService",
  "title": "Products Service",
  "blueprint": "Component",
  "properties": {
    "type": "service"
  },
  "relations": {
    "system": "Products",
    "consumesApi": [
      "CartAPI"
    ],
    "components": []
  }
}

Component Blueprint

{
 "identifier": "Component",
 "title": "Component",
 "icon": "Cloud",
 "schema": {
   "properties": {
     "type": {
       "enum": [
         "service",
         "library"
       ],
       "icon": "Docs",
       "type": "string",
       "enumColors": {
         "service": "blue",
         "library": "green"
       }
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "system": {
     "target": "System",
     "required": false,
     "many": false
   },
   "resources": {
     "target": "Resource",
     "required": false,
     "many": true
   },
   "consumesApi": {
     "target": "API",
     "required": false,
     "many": true
   },
   "components": {
     "target": "Component",
     "required": false,
     "many": true
   },
   "providesApi": {
     "target": "API",
     "required": false,
     "many": false
   }
 }
}

Resource Blueprint

{
 “identifier”: “Resource”,
 “title”: “Resource”,
 “icon”: “DevopsTool”,
 “schema”: {
   “properties”: {
     “type”: {
       “enum”: [
         “postgress”,
         “kafka-topic”,
         “rabbit-queue”,
         “s3-bucket”
       ],
       “icon”: “Docs”,
       “type”: “string”
     }
   },
   “required”: []
 },
 “mirrorProperties”: {},
 “formulaProperties”: {},
 “calculationProperties”: {},
 “relations”: {}
}

API Blueprint

{
 "identifier": "API",
 "title": "API",
 "icon": "Link",
 "schema": {
   "properties": {
     "type": {
       "type": "string",
       "enum": [
         "Open API",
         "grpc"
       ]
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "provider": {
     "target": "Component",
     "required": true,
     "many": false
   }
 }
}

Domain Blueprint

{
 "identifier": "Domain",
 "title": "Domain",
 "icon": "Server",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {}
}

System Blueprint

{
 "identifier": "System",
 "title": "System",
 "icon": "DevopsTool",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "domain": {
     "target": "Domain",
     "required": true,
     "many": false
   }
 }
}
{{tabel-1}}

Microservices SDLC

  • Scaffold a new microservice

  • Deploy (canary or blue-green)

  • Feature flagging

  • Revert

  • Lock deployments

  • Add Secret

  • Force merge pull request (skip tests on crises)

  • Add environment variable to service

  • Add IaC to the service

  • Upgrade package version

Development environments

  • Spin up a developer environment for 5 days

  • ETL mock data to environment

  • Invite developer to the environment

  • Extend TTL by 3 days

Cloud resources

  • Provision a cloud resource

  • Modify a cloud resource

  • Get permissions to access cloud resource

SRE actions

  • Update pod count

  • Update auto-scaling group

  • Execute incident response runbook automation

Data Engineering

  • Add / Remove / Update Column to table

  • Run Airflow DAG

  • Duplicate table

Backoffice

  • Change customer configuration

  • Update customer software version

  • Upgrade - Downgrade plan tier

  • Create - Delete customer

Machine learning actions

  • Train model

  • Pre-process dataset

  • Deploy

  • A/B testing traffic route

  • Revert

  • Spin up remote Jupyter notebook

{{tabel-2}}

Engineering tools

  • Observability

  • Tasks management

  • CI/CD

  • On-Call management

  • Troubleshooting tools

  • DevSecOps

  • Runbooks

Infrastructure

  • Cloud Resources

  • K8S

  • Containers & Serverless

  • IaC

  • Databases

  • Environments

  • Regions

Software and more

  • Microservices

  • Docker Images

  • Docs

  • APIs

  • 3rd parties

  • Runbooks

  • Cron jobs

Starting with Port is simple, fast and free.

Let’s start