How site reliability engineers (SREs) can "shift left" using a unified service catalog

Site reliability engineers (SREs) and developers often face the challenge of balancing speed with stability. For the most part, developers tend to focus on building features and coding, while SREs make sure those features run smoothly in production. But when something breaks, the lines blur—and that’s where the problems start.
The "shift left" movement offers a way forward. It allows teams to tackle reliability and operational concerns earlier in the development process. By sharing ownership, teams can reduce friction and work better together.
The SRE and developer disconnect
SREs are responsible for maintaining reliable systems, overseeing uptime, managing incidents, and handling cloud infrastructure. Developers focus on writing code and shipping features. However, these roles often overlap, creating friction between them.
This tension arises from misaligned priorities and a lack of visibility into each other’s workflows. Developers prioritize shipping features and may neglect production requirements until problems arise. While they create applications, they often don’t feel accountable for their reliability. Conversely, SREs strive to maintain uptime but may lack context regarding recent application changes. These dynamics lead to inefficiencies, such as:
- Incomplete visibility leaves SREs unprepared during incidents due to missing insights into recent deployments, dependencies, or configuration changes.
- Fragmented ownership results in unclear accountability, causing delays in resolving critical issues.
- A lack of shared frameworks hinders communication and coordination, particularly during high-pressure incidents.
- Product owners or business stakeholders may apply additional pressure on SREs without clear processes, exacerbating an already stressful situation.
Although developers increasingly embrace the shift-left movement, focusing on production requirements, secure coding, and leveraging AI tools to enhance their workflows, these efforts are insufficient. Developers must take full accountability for their applications, encompassing code and reliability. Additionally, SREs and developers must collaborate on a shared framework with a unified source of truth for service ownership, health, and dependencies. This foundation enables faster, more effective workflows and mitigates team disconnects.
Step-by-step: Shifting left in incident management
Consider a scenario where a high-severity incident occurs during peak traffic. SREs may have all the infrastructure metrics but lack insights into recent application updates or dependencies. On the other hand, developers might not have access to production monitoring tools, leaving them blind to the issue's root cause. This lack of shared responsibility turns a manageable problem into a prolonged outage.
Let’s explore a step-by-step guide for managing an incident or outage to demonstrate the impact of shifting left.
1. Proactive prevention
Preventing incidents begins long before they occur. Teams can take several proactive steps to ensure production readiness:
- Define ownership: Use a unified service catalog to establish clear ownership for every service, including its dependencies, health metrics, and escalation paths.
- Automate readiness checks: Implement automated checks for production readiness, such as ensuring proper observability setups, validating CI/CD pipelines, and checking for outdated dependencies.
- Monitor proactively: Set up alerts for potential issues, such as increasing error rates, slow response times, or anomalies in deployment processes. These alerts allow teams to address problems before they escalate.
2. Detecting and diagnosing the issue
When an incident occurs, swift detection and diagnosis are crucial:
- Unified visibility: Teams use a centralized portal to access real-time metrics, logs, and dependency maps. This shared view ensures everyone has the information needed to assess the problem.
- Ownership identification: The service catalog automatically identifies the responsible team or individual and notifies them through pre-configured communication channels like Slack or Teams.
- Cross-functional insights: Both developers and SREs can see relevant details about recent deployments, configuration changes, and application updates, enabling faster root cause analysis.
3. Coordinating the response
With clear ownership and diagnostic data, the team can focus on resolving the issue:
- Automated incident channels: An automated communication channel is created to bring together the right stakeholders and provide access to relevant tools and data.
- Self-service remediation: Developers use predefined workflows to address the issue, such as rolling back a faulty deployment, restarting services, or scaling resources. These actions can be executed directly from the portal, reducing dependence on SRE intervention.
- Escalation protocols: If the issue requires specialized expertise, SREs step in to handle complex problems or enforce operational standards.
4. Post-incident improvements
After resolving the incident, teams focus on continuous improvement:
- Root cause analysis: Teams collaborate to understand what went wrong and document their findings in the service catalog.
- Tool enhancements: Adjust monitoring tools and automated workflows to prevent similar issues in the future.
- Process refinement: Incorporate feedback to improve response procedures, training, and documentation.
As you can see, the solution is to redefine ownership and give everyone access to the tools they need. SREs should focus on setting standards and automating reliability tasks, while developers should own their applications end-to-end, including uptime and health.
Unified service catalogs: A key to shifting left
A unified service catalog can bridge the gap. It provides a clear view of services, their owners, and their dependencies. This is an essential piece when implementing the "shift left" approach. By serving as a single source of truth, it provides:
- Clear ownership: Ensuring every service has a defined owner and team responsible for its health and reliability.
- Comprehensive visibility: Offering insights into dependencies, configurations, and compliance with production readiness standards.
- Efficient collaboration: Supporting self-service actions and automated workflows to enable faster, more effective incident resolution.
While the service catalog is critical, it’s part of a broader ecosystem that includes self-service workflows, incident management automation, and collaboration tools. Together, these features empower teams to work more efficiently and confidently.
Real wins with unified tools
Teams using unified service catalogs see improvements in proactive prevention and reactive recovery. Here’s a deeper look at the benefits:
- Proactive incident prevention: With automated compliance tracking, teams can identify and resolve issues before they escalate. For instance, a team might receive automated alerts when an application isn’t meeting production readiness criteria, such as missing observability setups or outdated dependencies. By addressing these gaps before release, the team avoids outages and ensures smoother launches.
- Faster recovery times: During an incident, such as when a key service goes down during a peak traffic event, developers can quickly access self-service workflows to roll back changes, restart services, or scale resources. Instead of waiting for SREs to intervene, the developer responsible can follow a pre-defined remediation path in the portal—rolling back a recent deployment or scaling resources with a single click. This significantly reduces the Mean Time to Recovery (MTTR).
- Improved collaboration: With clear visibility into ownership, teams avoid confusion during high-pressure situations. For example, when a failure occurs, a unified portal immediately identifies the service owner and pulls in relevant stakeholders through automated Slack channels. Teams can focus on solving the problem rather than debating who should take action.
Imagine a critical outage occurs late at night. Instead of scrambling to figure out who owns the impacted service, the unified portal automatically creates a dedicated Slack channel for the incident, notifies the service owner, and provides access to critical metrics, logs, and dependency maps. Within minutes, the team can collaborate effectively to resolve the issue, cutting downtime dramatically. This streamlined approach exemplifies the power of shifting left: equipping teams with tools to act quickly, confidently, and efficiently.
A new ownership model
Shifting left supports a shared accountability model. Developers own their applications, including reliability. SREs provide guidance, tools, and high-level support when needed. This balance ensures everyone can focus on what they do best.
For example, developers take the lead in managing the response during an incident. They use the tools the service catalog provides to diagnose and fix the issue. SREs step in only for complex problems or to ensure standards are met. This approach reduces bottlenecks and empowers teams to work more effectively.
Ready to shift left?
A unified service catalog can transform how SREs and developers collaborate. It fosters collaboration, reduces bottlenecks, and keeps systems reliable. Speak to like-minded people who are also shifting left in Port’s community. Or see how you can shift left using Port’s live demo here.
Tags:
Use CaseCheck out Port's pre-populated demo and see what it's all about.
No email required
.png)
Check out the 2025 State of Internal Developer Portals report
No email required
Contact sales for a technical product walkthrough
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Check out Port's pre-populated demo and see what it's all about.
(no email required)
Contact sales for a technical walkthrough of Port
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Book a demo right now to check out Port's developer portal yourself
Apply to join the Beta for Port's new Backstage plugin
It's a Trap - Jenkins as Self service UI
Further reading:
Learn more about Port’s Backstage plugin
Build Backstage better — with Port
Example JSON block
Order Domain
Cart System
Products System
Cart Resource
Cart API
Core Kafka Library
Core Payment Library
Cart Service JSON
Products Service JSON
Component Blueprint
Resource Blueprint
API Blueprint
Domain Blueprint
System Blueprint
Microservices SDLC
Scaffold a new microservice
Deploy (canary or blue-green)
Feature flagging
Revert
Lock deployments
Add Secret
Force merge pull request (skip tests on crises)
Add environment variable to service
Add IaC to the service
Upgrade package version
Development environments
Spin up a developer environment for 5 days
ETL mock data to environment
Invite developer to the environment
Extend TTL by 3 days
Cloud resources
Provision a cloud resource
Modify a cloud resource
Get permissions to access cloud resource
SRE actions
Update pod count
Update auto-scaling group
Execute incident response runbook automation
Data Engineering
Add / Remove / Update Column to table
Run Airflow DAG
Duplicate table
Backoffice
Change customer configuration
Update customer software version
Upgrade - Downgrade plan tier
Create - Delete customer
Machine learning actions
Train model
Pre-process dataset
Deploy
A/B testing traffic route
Revert
Spin up remote Jupyter notebook
Engineering tools
Observability
Tasks management
CI/CD
On-Call management
Troubleshooting tools
DevSecOps
Runbooks
Infrastructure
Cloud Resources
K8S
Containers & Serverless
IaC
Databases
Environments
Regions
Software and more
Microservices
Docker Images
Docs
APIs
3rd parties
Runbooks
Cron jobs