Intro
Without day-2 operations, internal developer portals aren’t useful. Here’s why:
Let's start by talking about self-service actions. Self-service actions are often the first thing that developers associate with internal developer portals. There’s a good reason for that: self-service actions provide developers with a better experience, by promoting autonomy along with guardrails and golden paths. They also help to ensure compliance with standards such as security and costs, as these are all baked in.
Self-service actions come in three different categories:
- Create - the action will result in the creation of a new entity by triggering a provisioning process in your infrastructure.
- Day-2 - the action will trigger logic in your infrastructure to update or modify an existing entity in your catalog.
- Delete - the action will result in the deletion of an existing entity by triggering delete logic in your infrastructure.
Usually, platform engineers focus on the ‘create’ self-service action category. The typical self-service action they want to provide developers is scaffolding a new microservice, since no-one wants to wait on this action, and it usually involves many steps. In fact, it’s perhaps what they think of first when they think of self-service actions and using a portal.
This makes sense as it’s a real pain point and naturally where anyone would start: adding a cloud resource, scaffolding a service, provisioning a lambda function, etc. This is the bread and butter of platform engineering, and a portal delivers quick wins here - getting devs onboard and making devops lives easier.
But while the focus is often on creating, what’s being overlooked is the day-2 operations category. Day-2 operations as self-service actions are crucial for providing everyone in the platform engineering team with the arsenal they need to really impact the organization’s software development life cycle positively.
Here’s two key reasons why:
1. Overall goal vs a quick win
No matter which use case you’re using the portal for, you have to consider the overall goal that you’re trying to achieve - and create an end-to-end experience for your developers to achieve that goal.
A ‘create’ self-service action will help you to achieve a quick win, but it is only one part of an end-to-end experience. As a platform engineer, you’ll want to provide the developer with ways that make the entire journey easier - not just one action. What is the entire developer workflow? It’s providing value to the developer from planning to post production operations (read here for some ideas on a portal MVP that covers the developer SDLC journey)
While boosting the first step of a process is an important way of accomplishing a goal, they don’t tell the full story.
For example, 'create' actions are generally carried out once at the beginning of a process, whereas Day-2 operations happen repeatedly. Because these tasks are ongoing, improving Day-2 operations can significantly boost teams’ productivity and system reliability over time. It’s crucial to focus on these recurring tasks—such as updates, modifications, and scaling—to provide real, lasting value to developers and the entire organization.
For example, if an organization’s overall objective is to improve developer productivity, one of the key metrics (KPIs) that is an indicator of productivity is deployment frequency. To improve deployment frequency, you could look at how your team currently brings a feature to production. By breaking the process into smaller steps, you can optimize each step. By doing so, you can improve deployment frequency, and as a result, improve developer productivity.
So for bringing a feature to production, enabling a user to ‘scaffold a new service’ as a self-service action is a great start, but it’s not enough to streamline the entire process. There are so many other parts of the journey: managing secrets, creating storage, waiting for PRs to review, CI/CD, etc.
This is where you’ll want to streamline how developers:
- Spin up a dev environment
- Add a secret
- Send a reminder to review a PR
- Build
- Promote
- Rollback to a previous version
Only when you consider these capabilities as a whole, and provide them to the developer in an easy way, can you really impact deployment frequency (and therefore, developer productivity).
If we take another example, where the main goal is to improve incident response times.
Here, it’s a good start to quickly know who’s on call and who the owner of a particular service is.
Is it enough? In short, no. It’s because you’ll also want to enable developers to:
- Get JIT permissions to production
- Easily restart a service
- Rollback to a previous version
- Scale up a cloud resource
Only then can you actually streamline the management and resolution of incident management and improve MTTR.
In both examples, as a platform engineer, the focus has to be on the overarching objective, and building an end-to-end developer experience that helps your team to hit those goals, rather than the sporadic, quick-win actions that boost only one step of the process, but won’t make a big impact on the end goal on their own.
2. Best practices require day-2 ops
It’s in our DNA to take shortcuts. We want to use the easiest method to get something done. What often stops developers from being able to do this is that they have to comply with standards using a cumbersome process. If you want devs to follow best practices, you have to make the compliant way, the easiest way too.
Golden paths are well-defined practices and workflows that guide developers through ways to develop, deploy and manage software. They provide the right balance between developer autonomy and complying with engineering standards. By setting golden paths, you’re effectively providing developers with the ‘right’ way of doing things, but also, making it clear and easy for them to follow. After all, the easier something is to follow, the better chance you have of people following it.
Day-2 operations can ensure you’re allowing developers to always follow golden paths.
Take extending the TTL of an ephemeral environment as an example.
You could either:
- Open a ticket to DevOps, which can take days, in which time many devs will get fed up and find a workaround (such as a friend who has the right permissions to do it for them). As a result, you’ll end up with a bad experience and perhaps an ephemeral environment that will not terminate when it should.
This is definitely *not* a golden path - Create a self-service action where the developer can request for the extension of the TTL of an ephemeral environment. The experience is seamless, it reduces the burden on devops, and it takes into account cost optimization.
This *is* a golden path.
It’s clear that day-2 ops as self-service actions can help engineering teams to:
- Maintain best practices, and ensure golden paths can be used and followed
- Achieve their objectives
How day-2 operations work with Port
With Port, you can go beyond only using the ‘create’ self-service actions, and enable your developers to provision, terminate and perform day-2 operations with self-service actions.
That includes:
- Restart running service
- Extend developer environment TTL
- Get temporary permission to cloud resource
- Create new Jira issue
- Change replicas count
- Change ownership of service
- Add a secret
- Toggle feature flag
- Rollback service
…and many more.
Port works with your backend meaning that you can trigger any CI pipelines or HTTP endpoints to trigger any actions. It supports long-running and asynchronous actions and shows developers the run logs they need, along with TTL support. The self-service actions are loosely coupled from the underlying infrastructure of the platform to ensure developers get the same experience regardless of how the infrastructure evolves.
Context through catalog
In Port, self-service actions are reflected in the catalog, which is always kept up-to-date. Associating day-2 ops with the catalog means you can act with the full context in-mind. For instance, you can view all of your running services with the number of critical alerts associated with them, the CPU and memory usage. This allows users to trigger day-2 ops to: change replica counts, rollback the running service or restart it.
In addition, users can configure permission rules based on the data in their catalog. For example, a user can deploy a service to production only if the user is the leader of the team that owns the service.
Conclusion
Day 2 operations as self-service actions are essential for a portal. Without them, you can’t really build an end-to-end experience for developers. And that would mean you can’t:
- Help developers to adhere to standards or provide them with a golden path
- Achieve the team’s wider objectives such as improving developer experience, developer productivity, optimizing costs or meeting SLAs/SLOs.
Want to learn how to use self-service actions in Port? Check out our docs here.
Unsure on how to get started with a portal? Check out this blog.
Check out Port's pre-populated demo and see what it's all about.
No email required
Contact sales for a technical product walkthrough
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Check out Port's pre-populated demo and see what it's all about.
(no email required)
Contact sales for a technical product walkthrough
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Book a demo right now to check out Port's developer portal yourself
Apply to join the Beta for Port's new Backstage plugin
It's a Trap - Jenkins as Self service UI
Further reading:
Example JSON block
Order Domain
Cart System
Products System
Cart Resource
Cart API
Core Kafka Library
Core Payment Library
Cart Service JSON
Products Service JSON
Component Blueprint
Resource Blueprint
API Blueprint
Domain Blueprint
System Blueprint
Microservices SDLC
Scaffold a new microservice
Deploy (canary or blue-green)
Feature flagging
Revert
Lock deployments
Add Secret
Force merge pull request (skip tests on crises)
Add environment variable to service
Add IaC to the service
Upgrade package version
Development environments
Spin up a developer environment for 5 days
ETL mock data to environment
Invite developer to the environment
Extend TTL by 3 days
Cloud resources
Provision a cloud resource
Modify a cloud resource
Get permissions to access cloud resource
SRE actions
Update pod count
Update auto-scaling group
Execute incident response runbook automation
Data Engineering
Add / Remove / Update Column to table
Run Airflow DAG
Duplicate table
Backoffice
Change customer configuration
Update customer software version
Upgrade - Downgrade plan tier
Create - Delete customer
Machine learning actions
Train model
Pre-process dataset
Deploy
A/B testing traffic route
Revert
Spin up remote Jupyter notebook
Engineering tools
Observability
Tasks management
CI/CD
On-Call management
Troubleshooting tools
DevSecOps
Runbooks
Infrastructure
Cloud Resources
K8S
Containers & Serverless
IaC
Databases
Environments
Regions
Software and more
Microservices
Docker Images
Docs
APIs
3rd parties
Runbooks
Cron jobs