A different version of this article first appeared on The New Stack, here.
What are golden paths?
Golden paths are a set of well defined practices and workflows that guide developers through ways to develop, deploy and manage software.
Why do golden paths exist? Because of two contradictory pushes in software development:
- On the one hand, developers want to run fast and release as many features as possible.
- On the other hand, the DevOps team wants to ensure that whatever developers do complies with engineering standards, and isn’t getting the company in trouble. In some cases they (correctly) feel that developers don't really know how to comply with standards or understand the infrastructure they are working with.
This is where the golden path comes in. The golden path is the sweet spot between giving developers complete freedom to letting DevOps do all the work, for fear developers will do something wrong.
Chad Metcalf from Daytona in Introduction to guardrails and paved paths calls a golden path an “opinionated and supported path to build and run systems”, saying it “provides recommended tools, processes, and documentation tailored for developers within an organization”. Paved roads, according to Metcalf, impose fewer constraints on developers.
Golden paths are ways for developers to do things by themselves, but in a way where platform engineers and DevOps have baked the organization’s standards in. Practically, this means providing them with some form of automation or self-service to be able to carry out the things they need to do quickly, with the right guardrails in place. In many cases, the golden path also saves the developer cognitive load since they don’t have to deal with many options of performing an action: they just need to follow the golden path.
However, even when developers follow golden paths, there still is a need to ensure standards are still met over time and are not degraded, long after the first self-service action.
This is why it’s only a golden path if you can keep your developers on it.
The value of golden paths for the developer
Reducing cognitive load - the influx of new dev tools and technologies in modem engineering is impossible for one person to follow. Even if you have some devs that are skilled at Terraform, or understand the infrastructure because they have some background experience, they can’t keep up if they’re full stack developers. By providing a way for developers to get things done without the need to understand each tool or infrastructure in-depth can reduce cognitive load and frustration.
Velocity - If a developer has to wait three days for a Kubernetes cluster to be deployed or a database cluster to be provided for them, then that’s time wasted. If the developer can deploy the cluster themselves with a click of a button, and get the cluster within five or ten minutes, it will inevitably increase velocity.
Golden path lifecycle: RDS cluster as case in point
When services are first created: self-service action that meets standards
A great example of a golden path is providing developers with a standardized automation to manage applications, cloud resources and any organizational assets. If this was to create a new Kubernetes cluster, for instance, that would mean having the security and networking standards, as well as version control, set in place by this self-service action, so that developers can create the cluster. This provides them with the provisioning to get started while complying with standards.
If we were talking about creating a new AWS RDS Cluster, most organizations have some form of automation for creating a new RDS cluster (or a new cloud resource in general). This is usually carried out by connecting to the TerraForm environment with GitHub, updating some sort of a Git repo, and applying it after a pull request. The aim would be to be able to do this, while also adhering to the standards of security, production readiness, availability and version from day 1.
Ensuring standards are still met: scorecards for existing resources
The next step is about measuring the standards of your existing resources. For example, are you able to track whether someone with admin permissions has changed the version or security? While some organizations may carry out this kind of review on a quarterly or monthly basis in a manual process, the better approach is to do this automatically.
In the AWS RDS Cluster example, you need to be able to ask at any given time, whether you have some clusters that are coming to end-of-life, and also which clusters are not highly available. For instance, perhaps you have a database where the replica count was three instead of one and therefore it isn’t highly available - making this a single point of failure within the architecture. This can be assessed using a “scorecard” approach, essentially scoring the level of standards compliance.
Fixing non-compliant resources
Let’s say that a resource isn’t compliant. For example, if the version of the database isn’t right, or perhaps the number of Kubernetes workloads does not have a high enough replica count, how do you go about fixing these? Here, there also needs to be a way where you can provide developers with a method to fix the bad resources and bring them back to the golden path.
Golden paths using an internal developer portal
Let’s see how golden paths are implemented in an internal developer portal. We’re going to focus on three of the key pillars of a portal: the software catalog, self-service actions and scorecards.
Step 1 - Setting up the software catalog
Developers should feel at home in the portal, so they should be able to find and access the information they need. This is where the software catalog comes in. The software catalog is an inventory for all of the assets in your organization: a service catalog, cloud resources, JIRA board, and more. It encompasses everything that is relevant for the developers’ routine.
Within the portal, the service catalog would be the landing page for the developer. Here, they can see all of their applications in one place, with the relevant details for them, such as the name, the owning team, the URL to the Git repo, and more. They can also see information which is relevant for their daily routines such as data coming from AWS (whether it’s from RDS clusters or Lambdas running S3 buckets), information from the JIRA board of their team, filtered for their routines, and more. The idea is to visualize everything relevant for their daily routines, with live integrations with different tools.
If you click on the RDS clusters view, you can see different clusters.
Step 2 - Self-service actions
A developer that is working on a new microservice, probably needs a database to do this. In an ideal situation, they would be able to use self-service actions in the portal - where they have a form that lets them do just that. This self-service capability in the internal developer portal is defined and provided by the platform team.
For example, by clicking on the top right where it says ‘+ RDS Cluster’, the developer can create a new RDS cluster.
The design of this self-service form uses abstraction. The developer needs to provide inputs that are relevant for the creation of the new RDS cluster, but they are simple and straightforward. In this case developers don’t need to have specific knowledge about this AWS resource or about the infrastructure - all they need to provide is the relevant context for the automation that is going to run. The platform engineers have already pre configured these forms with dynamic parameters. By answering using the self-service form, the developer is essentially using a golden path that has been set out for them by the platform engineers and DevOps teams.
The developer can then see the logs reflecting what is happening behind the scenes in this action run: a new RDS cluster was being created and it was then created successfully. Behind the scenes this is enabled by connecting the portal to a GitHub action that updated the TerraForm file, and provisioned this for the developer. This meets the platform team’s standards and as a developer it also is efficient as it took only a minute. The developer then doesn’t even need to have permission to the AWS console (but they can go there if they want to). Instead, they can see all the information they need in the catalog. They can click on the new created resource:
Here, the developer has all of the information they need like the yarn as well as additional details requested such as the engine and the link to their AWS resource. The new core resource that has been added in the software catalog will show the link and the version and all of the information about it - the portal acts like a source of truth for the environment in one place.
Step 3 - Scorecards
The next step is to show how the developer can measure every one of their resources on an ongoing basis through scorecards.
This means creating rules to measure the quality of the assets in your software catalog. The portal allows you to define the standards of your organization as well as what you would want every resource measured against. This is a scorecard. It’s a metric that you can create on top of the data in the portal in the context of your organization to ask questions such as ‘whether my DB is coming to the end of life?’ or ‘does my DB have high availability?’
By clicking on the scorecards tab within the new resource, developers will be able to see a screen just for them. In this case, you can see that in this example, there are two scopes to measure: production readiness and version.
There are three tiers, bronze, silver, and gold. In this example, ‘gold’ is effectively a perfect database, and if the database does not meet any of the requirements it is ‘basic’.
What is being measured here are three different versions to make sure the new cluster is up-to-date and that the self-service action that has been put into place is also up-to-date.
For production readiness, in this example, the scorecard is checking whether the database is running in multi-availability zones, whether it has deletion protection configured, and whether there are backups running and performance insights configured by AWS. All of the KPIs and standards for scorecards are completely customizable.
Golden path lifecycle
The golden path in this example isn’t just about creating a new resource; it has factored in the basic questions that the developer had to answer, which were pre-configured by platform engineers. This means that standards are baked in, processes are made more seamless, and the communication and rapport between engineers and developers remains positive. In addition, the process of using the portal makes it easier for organizations to instill the golden path to begin with - but more importantly, ensures that developers can remain on the golden path lifecycle every step of the way. This, after all, is the hardest part to get right.
Check out Port's pre-populated demo and see what it's all about.
No email required
Contact sales for a technical product walkthrough
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Check out Port's pre-populated demo and see what it's all about.
(no email required)
Contact sales for a technical product walkthrough
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Book a demo right now to check out Port's developer portal yourself
Apply to join the Beta for Port's new Backstage plugin
It's a Trap - Jenkins as Self service UI
Further reading:
Example JSON block
Order Domain
Cart System
Products System
Cart Resource
Cart API
Core Kafka Library
Core Payment Library
Cart Service JSON
Products Service JSON
Component Blueprint
Resource Blueprint
API Blueprint
Domain Blueprint
System Blueprint
Microservices SDLC
Scaffold a new microservice
Deploy (canary or blue-green)
Feature flagging
Revert
Lock deployments
Add Secret
Force merge pull request (skip tests on crises)
Add environment variable to service
Add IaC to the service
Upgrade package version
Development environments
Spin up a developer environment for 5 days
ETL mock data to environment
Invite developer to the environment
Extend TTL by 3 days
Cloud resources
Provision a cloud resource
Modify a cloud resource
Get permissions to access cloud resource
SRE actions
Update pod count
Update auto-scaling group
Execute incident response runbook automation
Data Engineering
Add / Remove / Update Column to table
Run Airflow DAG
Duplicate table
Backoffice
Change customer configuration
Update customer software version
Upgrade - Downgrade plan tier
Create - Delete customer
Machine learning actions
Train model
Pre-process dataset
Deploy
A/B testing traffic route
Revert
Spin up remote Jupyter notebook
Engineering tools
Observability
Tasks management
CI/CD
On-Call management
Troubleshooting tools
DevSecOps
Runbooks
Infrastructure
Cloud Resources
K8S
Containers & Serverless
IaC
Databases
Environments
Regions
Software and more
Microservices
Docker Images
Docs
APIs
3rd parties
Runbooks
Cron jobs