Give day-2 operations autonomy to your developers

May 13, 2024

Give day-2 operations autonomy to your developers

Ready to start?

Intro

Without day-2 operations, internal developer portals aren’t useful. Here’s why: 

Let's start by talking about self-service actions. Self-service actions are often the first thing that developers associate with internal developer portals. There’s a good reason for that: self-service actions provide developers with a better experience, by promoting autonomy along with guardrails and golden paths. They also help to ensure compliance with standards such as security and costs, as these are all baked in. 

Self-service actions come in three different categories: 

  • Create - the action will result in the creation of a new entity by triggering a provisioning process in your infrastructure.
  • Day-2 - the action will trigger logic in your infrastructure to update or modify an existing entity in your catalog.
  • Delete - the action will result in the deletion of an existing entity by triggering delete logic in your infrastructure.

Usually, platform engineers focus on the ‘create’ self-service action category. The typical self-service action they want to provide developers is scaffolding a new microservice, since no-one wants to wait on this action, and it usually involves many steps. In fact, it’s perhaps what they think of first when they think of self-service actions and using a portal.  

This makes sense as it’s a real pain point and naturally where anyone would start: adding a cloud resource, scaffolding a service, provisioning a lambda function, etc. This is the bread and butter of platform engineering, and a portal delivers quick wins here - getting devs onboard and making devops lives easier.

But while the focus is often on creating, what’s being overlooked is the day-2 operations category. Day-2 operations as self-service actions are crucial for providing everyone in the platform engineering team with the arsenal they need to really impact the organization’s software development life cycle positively.

Here’s two key reasons why:

1. Overall goal vs a quick win 

No matter which use case you’re using the portal for, you have to consider the overall goal that you’re trying to achieve - and create an end-to-end experience for your developers to achieve that goal.

A ‘create’ self-service action will help you to achieve a quick win, but it is only one part of an end-to-end experience. As a platform engineer, you’ll want to provide the developer with ways that make the entire journey easier - not just one action. What is the entire developer workflow? It’s providing value to the developer from planning to post production operations (read here for some ideas on a portal MVP that covers the developer SDLC journey) 

While boosting the first step of a process is an important way of accomplishing a goal, they don’t tell the full story.

For example, 'create' actions are generally carried out once at the beginning of a process, whereas Day-2 operations happen repeatedly. Because these tasks are ongoing, improving Day-2 operations can significantly boost teams’ productivity and system reliability over time. It’s crucial to focus on these recurring tasks—such as updates, modifications, and scaling—to provide real, lasting value to developers and the entire organization.

For example, if an organization’s overall objective is to improve developer productivity, one of the key metrics (KPIs) that is an indicator of productivity is deployment frequency. To improve deployment frequency, you could look at how your team currently brings a feature to production. By breaking the process into smaller steps, you can optimize each step. By doing so, you can improve deployment frequency, and as a result, improve developer productivity. 

So for bringing a feature to production, enabling a user to ‘scaffold a new service’ as a self-service action is a great start, but it’s not enough to streamline the entire process. There are so many other parts of the journey: managing secrets, creating storage, waiting for PRs to review, CI/CD, etc.

This is where you’ll want to streamline how developers:

  • Spin up a dev environment
  • Add a secret
  • Send a reminder to review a PR
  • Build
  • Promote
  • Rollback to a previous version

Only when you consider these capabilities as a whole, and provide them to the developer in an easy way, can you really impact deployment frequency (and therefore, developer productivity). 

If we take another example, where the main goal is to improve incident response times.

Here, it’s a good start to quickly know who’s on call and who the owner of a particular service is.  

Is it enough? In short, no. It’s because you’ll also want to enable developers to:

- Get JIT permissions to production

- Easily restart a service

- Rollback to a previous version

- Scale up a cloud resource 

Only then can you actually streamline the management and resolution of incident management and improve MTTR. 

In both examples, as a platform engineer, the focus has to be on the overarching objective, and building an end-to-end developer experience that helps your team to hit those goals, rather than the sporadic, quick-win actions that boost only one step of the process, but won’t make a big impact on the end goal on their own.  

2. Best practices require day-2 ops

It’s in our DNA to take shortcuts. We want to use the easiest method to get something done. What often stops developers from being able to do this is that they have to comply with standards using a cumbersome process. If you want devs to follow best practices, you have to make the compliant way, the easiest way too.

Golden paths are well-defined practices and workflows that guide developers through ways to develop, deploy and manage software. They provide the right balance between developer autonomy and complying with engineering standards. By setting golden paths, you’re effectively providing developers with the ‘right’ way of doing things, but also, making it clear and easy for them to follow. After all, the easier something is to follow, the better chance you have of people following it.

Day-2 operations can ensure you’re allowing developers to always follow golden paths. 

Take extending the TTL of an ephemeral environment as an example.

You could either:

  1. Open a ticket to DevOps, which can take days, in which time many devs will get fed up and find a workaround (such as a friend who has the right permissions to do it for them). As a result, you’ll end up with a bad experience and perhaps an ephemeral environment that will not terminate when it should.
    This is definitely *not* a golden path

  2. Create a self-service action where the developer can request for the extension of the TTL of an ephemeral environment. The experience is seamless, it reduces the burden on devops, and it takes into account cost optimization. 

This *is* a golden path. 

It’s clear that day-2 ops as self-service actions can help engineering teams to:

  • Maintain best practices, and ensure golden paths can be used and followed
  • Achieve their objectives

How day-2 operations work with Port

With Port, you can go beyond only using the ‘create’ self-service actions, and enable your developers to provision, terminate and perform day-2 operations with self-service actions. 

That includes:

  • Restart running service
  • Extend developer environment TTL
  • Get temporary permission to cloud resource
  • Create new Jira issue
  • Change replicas count
  • Change ownership of service
  • Add a secret
  • Toggle feature flag
  • Rollback service 

                 …and many more

Port works with your backend meaning that you can trigger any CI pipelines or HTTP endpoints to trigger any actions. It supports long-running and asynchronous actions and shows developers the run logs they need, along with TTL support.  The self-service actions are loosely coupled from the underlying infrastructure of the platform to ensure developers get the same experience regardless of how the infrastructure evolves.

Context through catalog

In Port, self-service actions are reflected in the catalog, which is always kept up-to-date. Associating day-2 ops with the catalog means you can act with the full context in-mind. For instance, you can view all of your running services with the number of critical alerts associated with them, the CPU and memory usage. This allows users to trigger day-2 ops to: change replica counts, rollback the running service or restart it.

In addition, users can configure permission rules based on the data in their catalog. For example, a user can deploy a service to production only if the user is the leader of the team that owns the service.

Conclusion

Day 2 operations as self-service actions are essential for a portal. Without them, you can’t really build an end-to-end experience for developers. And that would mean you can’t: 

  • Help developers to adhere to standards or provide them with a golden path
  • Achieve the team’s wider objectives such as improving developer experience, developer productivity, optimizing costs or meeting SLAs/SLOs. 

Want to learn how to use self-service actions in Port? Check out our docs here.

Unsure on how to get started with a portal? Check out this blog

{{cta_1}}

Check out Port's pre-populated demo and see what it's all about.

Check live demo

No email required

{{cta_2}}

Contact sales for a technical product walkthrough

Let’s start
{{cta_3}}

Open a free Port account. No credit card required

Let’s start
{{cta_4}}

Watch Port live coding videos - setting up an internal developer portal & platform

Let’s start
{{cta_5}}

Check out Port's pre-populated demo and see what it's all about.

(no email required)

Let’s start
{{cta_6}}

Contact sales for a technical product walkthrough

Let’s start
{{cta_7}}

Open a free Port account. No credit card required

Let’s start
{{cta_8}}

Watch Port live coding videos - setting up an internal developer portal & platform

Let’s start
{{cta-demo}}
{{reading-box-backstage-vs-port}}

Example JSON block

{
  "foo": "bar"
}

Order Domain

{
  "properties": {},
  "relations": {},
  "title": "Orders",
  "identifier": "Orders"
}

Cart System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Cart",
  "title": "Cart"
}

Products System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Products",
  "title": "Products"
}

Cart Resource

{
  "properties": {
    "type": "postgress"
  },
  "relations": {},
  "icon": "GPU",
  "title": "Cart SQL database",
  "identifier": "cart-sql-sb"
}

Cart API

{
 "identifier": "CartAPI",
 "title": "Cart API",
 "blueprint": "API",
 "properties": {
   "type": "Open API"
 },
 "relations": {
   "provider": "CartService"
 },
 "icon": "Link"
}

Core Kafka Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Kafka Library",
  "identifier": "CoreKafkaLibrary"
}

Core Payment Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Payment Library",
  "identifier": "CorePaymentLibrary"
}

Cart Service JSON

{
 "identifier": "CartService",
 "title": "Cart Service",
 "blueprint": "Component",
 "properties": {
   "type": "service"
 },
 "relations": {
   "system": "Cart",
   "resources": [
     "cart-sql-sb"
   ],
   "consumesApi": [],
   "components": [
     "CorePaymentLibrary",
     "CoreKafkaLibrary"
   ]
 },
 "icon": "Cloud"
}

Products Service JSON

{
  "identifier": "ProductsService",
  "title": "Products Service",
  "blueprint": "Component",
  "properties": {
    "type": "service"
  },
  "relations": {
    "system": "Products",
    "consumesApi": [
      "CartAPI"
    ],
    "components": []
  }
}

Component Blueprint

{
 "identifier": "Component",
 "title": "Component",
 "icon": "Cloud",
 "schema": {
   "properties": {
     "type": {
       "enum": [
         "service",
         "library"
       ],
       "icon": "Docs",
       "type": "string",
       "enumColors": {
         "service": "blue",
         "library": "green"
       }
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "system": {
     "target": "System",
     "required": false,
     "many": false
   },
   "resources": {
     "target": "Resource",
     "required": false,
     "many": true
   },
   "consumesApi": {
     "target": "API",
     "required": false,
     "many": true
   },
   "components": {
     "target": "Component",
     "required": false,
     "many": true
   },
   "providesApi": {
     "target": "API",
     "required": false,
     "many": false
   }
 }
}

Resource Blueprint

{
 “identifier”: “Resource”,
 “title”: “Resource”,
 “icon”: “DevopsTool”,
 “schema”: {
   “properties”: {
     “type”: {
       “enum”: [
         “postgress”,
         “kafka-topic”,
         “rabbit-queue”,
         “s3-bucket”
       ],
       “icon”: “Docs”,
       “type”: “string”
     }
   },
   “required”: []
 },
 “mirrorProperties”: {},
 “formulaProperties”: {},
 “calculationProperties”: {},
 “relations”: {}
}

API Blueprint

{
 "identifier": "API",
 "title": "API",
 "icon": "Link",
 "schema": {
   "properties": {
     "type": {
       "type": "string",
       "enum": [
         "Open API",
         "grpc"
       ]
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "provider": {
     "target": "Component",
     "required": true,
     "many": false
   }
 }
}

Domain Blueprint

{
 "identifier": "Domain",
 "title": "Domain",
 "icon": "Server",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {}
}

System Blueprint

{
 "identifier": "System",
 "title": "System",
 "icon": "DevopsTool",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "domain": {
     "target": "Domain",
     "required": true,
     "many": false
   }
 }
}
{{tabel-1}}

Microservices SDLC

  • Scaffold a new microservice

  • Deploy (canary or blue-green)

  • Feature flagging

  • Revert

  • Lock deployments

  • Add Secret

  • Force merge pull request (skip tests on crises)

  • Add environment variable to service

  • Add IaC to the service

  • Upgrade package version

Development environments

  • Spin up a developer environment for 5 days

  • ETL mock data to environment

  • Invite developer to the environment

  • Extend TTL by 3 days

Cloud resources

  • Provision a cloud resource

  • Modify a cloud resource

  • Get permissions to access cloud resource

SRE actions

  • Update pod count

  • Update auto-scaling group

  • Execute incident response runbook automation

Data Engineering

  • Add / Remove / Update Column to table

  • Run Airflow DAG

  • Duplicate table

Backoffice

  • Change customer configuration

  • Update customer software version

  • Upgrade - Downgrade plan tier

  • Create - Delete customer

Machine learning actions

  • Train model

  • Pre-process dataset

  • Deploy

  • A/B testing traffic route

  • Revert

  • Spin up remote Jupyter notebook

{{tabel-2}}

Engineering tools

  • Observability

  • Tasks management

  • CI/CD

  • On-Call management

  • Troubleshooting tools

  • DevSecOps

  • Runbooks

Infrastructure

  • Cloud Resources

  • K8S

  • Containers & Serverless

  • IaC

  • Databases

  • Environments

  • Regions

Software and more

  • Microservices

  • Docker Images

  • Docs

  • APIs

  • 3rd parties

  • Runbooks

  • Cron jobs

Starting with Port is simple, fast and free.

Let’s start