Blog
/
/
How to Drive a Culture of Kubernetes Standards With Platform Engineering
Use Case

How to Drive a Culture of Kubernetes Standards With Platform Engineering

Matan Heled
Jan 30, 2023
Sign up
Download PDF
Use Case

Introduction

Platform engineering is about creating reusable elements so that developers can consume infrastructure resources with autonomy and become more productive. Using an internal developer portal lets platform engineering both create a single pane of glass that shows all microservices, cloud resources and infrastructure (this is called the software catalog) as well as enables creating a product-like self-service interface for developers, with any self-service action getting immediately reflected in the software catalog.

Internal developer portals also drive a culture of quality by setting guardrails for developer self-service and scorecards that determine quality. We’ll take you through some examples in this post, but first, let’s take a closer look at what internal developer portals are made of:

  • The software catalog which contains data about microservices, environments and also Kubernetes (and more). 
  • Self-service actions hub - letting developers consume infrastructure through existing automations. 
  • Scorecards that take several KPIs with regards to a certain entity in the software catalog and determine its quality, readiness, health etc.
  • There are additional layers, such as workflow automation connectivity and role based access control, but we won’t touch them in this post.

In this sense, quality and standards can be part of two distinct elements in the internal developer platform. 

  • For developer self-service guardrails are set by allowing developers to only control certain Kubernetes parameters that should be of interest to them, exposing only a handful of such parameters to reduce cognitive load and prevent configuration errors. 
  • For ongoing quality the solution is scorecards that are associated with software catalog entities, allowing you to define a baseline standard for the quality of all services and infrastructure. This post will take us through a detailed example of Kubernetes standards and how they are reflected in scorecards within the internal developer portal. 

Kubernetes standards and scorecards 

Developers usually don’t know enough Kubernetes to be able to deal with raw Kubernetes data on their own. The data is usually too detailed and related to Kubernetes issues that are outside of developers’ control. Developer autonomy, as envisioned by the drive for platform engineering, should abstract Kubernetes data and make it consumable by developers. In terms of the internal developer portal this is done in the software catalog as well as in the single service view where only the relevant Kubernetes data (from the developer’s POV) is added. Here’s an example.

Let’s see how you can set Kubernetes standards using an internal developer portal and scorecards.

Application production readiness check


One of the basic requirements for production readiness is high availability. Every platform, framework, provider and orchestrator has a different standard, definition and best practice for high availability. When it comes to Kubernetes application health, the conversation usually boils down to replica sets and wanted replica counts vs current replica counts. 

A devOps engineer will immediately spot a current replica count of “1” and understand that something is wrong, but developers won’t necessarily notice this.

Below is a scorecard for a running service - which represents a real, live service, running in an actual Kubernetes cluster. It has reached the gold tier since it has more than the two required replicas. 

And hey, who doesn’t love gold?!

Showing this information in a scorecard immediately spells out what is production ready and what isn't.  From the devOps side, “high availability” was defined and tracked, ensuring that a minimum of two replicas is met, and that the service owner and devOps will be notified when this isn’t the case. 

Resource usage checks

Resource usage is a problematic metric. When is an application using too much memory or CPU? What is the definition of high memory usage? Sixty percent? Eighty percent? 

DevOps care about resource usage since it defines the ability to scale a service and also the likelihood of incidents. Resource issues are mainly identified and solved by the devOps team which either scales the application up/out or identifies an issue causing the high resource usage. But we would want to alert developers regarding certain resource issues ahead of time. 

In the scorecard example below you can see that the running service is not at gold tier but rather two tiers down - bronze. The reason is that the scorecard for application health shows problems with CPU usage and memory. This lower tier can be used as an alert for both devOps and the service owners. By specifying a bronze tier, you are able to more finely define the standards your organization aspires to uphold, and make it easier to generate action items for constant improvement and excellence.

Infrastructure availability checks

We’ve shown how developers can understand their applications’ scorecards and standards. Let's examine how devOps can use scorecards for a bird's-eye view of their many (many) clusters, and generally how they can use scorecards for infrastructure. 
Just like microservices need high availability, so do clusters. When deploying clusters with state-of-the-art CI/CD tools, or IaC solutions, it is easy to miss a misconfigured cluster. 

Let's look at this cluster’s scorecard…

A production ready cluster which was planned with high availability in mind, should have at least two running nodes. In this example, the scorecard sets a cluster at the gold tier when the node count is  greater than or equal to two. Looking at this scorecard, we can conclude that our cluster is in fact highly available.

Let's make sure by taking a look at the relevant entity:

We can see that the nodeCount is in fact 2, which is the standard for gold tier(>=2).

Cluster version check

Setting thresholds for your cluster versions is important for both security reasons and infrastructure standards-setting. Being able to easily identify which cluster is not up-to-date will make maintenance easier, help prevent unwanted bugs, security issues, or version mismatches with infrastructure components.

This is valuable to support initiatives, such as updating all versions on specific clusters and on specific deployments.

Here, we can check cluster versions and verify they are up to date, setting it at the gold tier. In this example the gold tier is a version higher than 1.24.

Conclusion

Scorecards work well to promote quality - they can single out what’s important for developers but also work for devops, who need internal developer portals too. Most importantly, they show metrics over specific software catalog elements, providing devops and developers with a deep understanding of what quality is and whether a certain software catalog entity meets it.

{{cta_7}}

{{cta-demo}}

Book a demo right now to check out Port's developer portal yourself

Book a demo
{{jenkins}}

It's a Trap - Jenkins as Self service UI

Read more
{{gitops}}

How do GitOps affect developer experience?

Read more
{{ebook}}

It's a Trap - Jenkins as Self service UI. Click her to download the eBook

Download eBook
{{cyberark}}

Learning from CyberArk - building an internal developer platform in-house

Read more
{{dropdown}}

Example JSON block

{
  "foo": "bar"
}

Order Domain

{
  "properties": {},
  "relations": {},
  "title": "Orders",
  "identifier": "Orders"
}

Cart System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Cart",
  "title": "Cart"
}

Products System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Products",
  "title": "Products"
}

Cart Resource

{
  "properties": {
    "type": "postgress"
  },
  "relations": {},
  "icon": "GPU",
  "title": "Cart SQL database",
  "identifier": "cart-sql-sb"
}

Cart API

{
 "identifier": "CartAPI",
 "title": "Cart API",
 "blueprint": "API",
 "properties": {
   "type": "Open API"
 },
 "relations": {
   "provider": "CartService"
 },
 "icon": "Link"
}

Core Kafka Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Kafka Library",
  "identifier": "CoreKafkaLibrary"
}

Core Payment Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Payment Library",
  "identifier": "CorePaymentLibrary"
}

Cart Service JSON

{
 "identifier": "CartService",
 "title": "Cart Service",
 "blueprint": "Component",
 "properties": {
   "type": "service"
 },
 "relations": {
   "system": "Cart",
   "resources": [
     "cart-sql-sb"
   ],
   "consumesApi": [],
   "components": [
     "CorePaymentLibrary",
     "CoreKafkaLibrary"
   ]
 },
 "icon": "Cloud"
}

Products Service JSON

{
  "identifier": "ProductsService",
  "title": "Products Service",
  "blueprint": "Component",
  "properties": {
    "type": "service"
  },
  "relations": {
    "system": "Products",
    "consumesApi": [
      "CartAPI"
    ],
    "components": []
  }
}

Component Blueprint

{
 "identifier": "Component",
 "title": "Component",
 "icon": "Cloud",
 "schema": {
   "properties": {
     "type": {
       "enum": [
         "service",
         "library"
       ],
       "icon": "Docs",
       "type": "string",
       "enumColors": {
         "service": "blue",
         "library": "green"
       }
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "system": {
     "target": "System",
     "required": false,
     "many": false
   },
   "resources": {
     "target": "Resource",
     "required": false,
     "many": true
   },
   "consumesApi": {
     "target": "API",
     "required": false,
     "many": true
   },
   "components": {
     "target": "Component",
     "required": false,
     "many": true
   },
   "providesApi": {
     "target": "API",
     "required": false,
     "many": false
   }
 }
}

Resource Blueprint

{
 “identifier”: “Resource”,
 “title”: “Resource”,
 “icon”: “DevopsTool”,
 “schema”: {
   “properties”: {
     “type”: {
       “enum”: [
         “postgress”,
         “kafka-topic”,
         “rabbit-queue”,
         “s3-bucket”
       ],
       “icon”: “Docs”,
       “type”: “string”
     }
   },
   “required”: []
 },
 “mirrorProperties”: {},
 “formulaProperties”: {},
 “calculationProperties”: {},
 “relations”: {}
}

API Blueprint

{
 "identifier": "API",
 "title": "API",
 "icon": "Link",
 "schema": {
   "properties": {
     "type": {
       "type": "string",
       "enum": [
         "Open API",
         "grpc"
       ]
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "provider": {
     "target": "Component",
     "required": true,
     "many": false
   }
 }
}

Domain Blueprint

{
 "identifier": "Domain",
 "title": "Domain",
 "icon": "Server",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {}
}

System Blueprint

{
 "identifier": "System",
 "title": "System",
 "icon": "DevopsTool",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "domain": {
     "target": "Domain",
     "required": true,
     "many": false
   }
 }
}
{{tabel-1}}

Microservices SDLC

  • Scaffold a new microservice

  • Deploy (canary or blue-green)

  • Feature flagging

  • Revert

  • Lock deployments

  • Add Secret

  • Force merge pull request (skip tests on crises)

  • Add environment variable to service

  • Add IaC to the service

  • Upgrade package version

Development environments

  • Spin up a developer environment for 5 days

  • ETL mock data to environment

  • Invite developer to the environment

  • Extend TTL by 3 days

Cloud resources

  • Provision a cloud resource

  • Modify a cloud resource

  • Get permissions to access cloud resource

SRE actions

  • Update pod count

  • Update auto-scaling group

  • Execute incident response runbook automation

Data Engineering

  • Add / Remove / Update Column to table

  • Run Airflow DAG

  • Duplicate table

Backoffice

  • Change customer configuration

  • Update customer software version

  • Upgrade - Downgrade plan tier

  • Create - Delete customer

Machine learning actions

  • Train model

  • Pre-process dataset

  • Deploy

  • A/B testing traffic route

  • Revert

  • Spin up remote Jupyter notebook

{{tabel-2}}

Engineering tools

  • Observability

  • Tasks management

  • CI/CD

  • On-Call management

  • Troubleshooting tools

  • DevSecOps

  • Runbooks

Infrastructure

  • Cloud Resources

  • K8S

  • Containers & Serverless

  • IaC

  • Databases

  • Environments

  • Regions

Software and more

  • Microservices

  • Docker Images

  • Docs

  • APIs

  • 3rd parties

  • Runbooks

  • Cron jobs

Check out Port's pre-populated demo and see what it's all about.

Check live demo

No email required

Contact sales for a technical product walkthrough

Let’s start

Open a free Port account. No credit card required

Let’s start

Watch Port live coding videos - setting up an internal developer portal & platform

Let’s start

Check out Port's pre-populated demo and see what it's all about.

(no email required)

Let’s start

Contact sales for a technical product walkthrough

Let’s start

Open a free Port account. No credit card required

Let’s start

Watch Port live coding videos - setting up an internal developer portal & platform

Let’s start

Let us walk you through the platform and catalog the assets of your choice.

I’m ready, let’s start