Alert Management and Unification With Platform Engineering

April 3, 2023

Ready to start?

Alert Management and Unification With Platform Engineering

Introduction

Alerts are tricky, especially when it comes to alerts coming from DevOps tools, monitoring tools and cloud resources. Datadog, Sentry, grafana, AWS, coralogix, splunk, new relic all fire alerts about different parts of the engineering infrastructure. This doesn’t only create alert fatigue, it also creates significant cognitive load for developers, who may not know these tools well in the first place.  Think about alerts coming from Kubernetes tools or infrastructure alerts that developers can’t make sense of, nor be 100% sure what service they are actually related to.

Internal developer portals can help with alert management and unification

Internal developer portals have a software catalog at their center. The catalog is extremely flexible in terms of its data model, allowing you to create “types” (they are called blueprints in Port) that define types of entities that need to be in the catalog. A running service is one type that can be tracked in the software catalog, and you can have other types, whether for resources, such as AWS or Kubernetes, or according to the use case you’d like to work in, you can also use internal developer portal templates that map provider data, use GitOps, connect your CI/CD pipeline, map Kubernetes or cloud resources, or use IaC. 

The result is a graph-based software catalog showing entities, from microservices, to ephemeral environments, kubernetes clusters, packages, cloud accounts and more. Internal developer portals are about providing developers with self-service actions in one place, from setting up ephemeral environments, requesting permissions, rollback deployments and more. The software catalog provides developers with the data they need, in context, to understand the devops assets they rely on. 

When alerts are also included in the software catalog, developers get a single pane of glass for all things alerts, in-context within the relevant software catalog entity and with all the relevant information that is needed, such as the service or resource owner.  Apart from the convenience of not needing to check many alert tools, the fact that alerts are in context provides developers with a reduction of cognitive load, since the alert is tied to its origin, such as a problem in the production environment related to a certain service. Alerts can also be tied to day-2 operations that help resolving the underlying issue.

Alert unification in the software catalog

What we’re describing is using the internal developer portal as a unified alerts database. We take data from alert platforms (Prometheus and others) and present it in developer dashboards. In this way, developers can easily track the status of their services, in one place, and not move between different systems. 

Here is the alerts view in Port:

Acting upon alerts with developer self-service in the internal developer portal

As mentioned above, internal developer portals are used for developer self-service actions, and this comes in handy with alerts unification for developers. Inside the portal, once they see alerts, developers can act - anything from acknowledging an issue, to investigating it in the developer portal or even triggering a playbook or reverting a version. The information about services and resources can help them find owners, on-call and more. 

Clicking on the three dots on the right hand side of the table show above shows developers the self-service actions that are set for each alerts, on the entity and in-context.

Using scorecards to drive engineering quality using alert information

Scorecards are an important functionality of internal developer portals. They track metrics, from service naturity, through health and more, and display them in the context of the entity they belong to. These scorecards can help set engineering standards, and are also useful for alerts. Scorecards can help developers understand software health, since they can be based on aggregated alerts coming from various tools. 

Alert unification and workflow automation

Workflow automation within Port can also be tied to alerts, where an alert is associated with a specific entity in the software catalog, and the actions that are triggered post alert can be tied to the criticality of the service (if it is and the alert is high severity a message can be sent to several teams), or the fact that the service is internet exposed.

{{cta_1}}

Check out Port's pre-populated demo and see what it's all about.

Check live demo

No email required

{{cta_2}}

Contact sales for a technical product walkthrough

Let’s start
{{cta_3}}

Open a free Port account. No credit card required

Let’s start
{{cta_4}}

Watch Port live coding videos - setting up an internal developer portal & platform

{{cta_5}}

Check out Port's pre-populated demo and see what it's all about.

(no email required)

Let’s start
{{cta_6}}

Contact sales for a technical product walkthrough

Let’s start
{{cta_7}}

Open a free Port account. No credit card required

Let’s start
{{cta_8}}

Watch Port live coding videos - setting up an internal developer portal & platform

{{cta-demo}}
{{reading-box-backstage-vs-port}}

Example JSON block

{
  "foo": "bar"
}

Order Domain

{
  "properties": {},
  "relations": {},
  "title": "Orders",
  "identifier": "Orders"
}

Cart System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Cart",
  "title": "Cart"
}

Products System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Products",
  "title": "Products"
}

Cart Resource

{
  "properties": {
    "type": "postgress"
  },
  "relations": {},
  "icon": "GPU",
  "title": "Cart SQL database",
  "identifier": "cart-sql-sb"
}

Cart API

{
 "identifier": "CartAPI",
 "title": "Cart API",
 "blueprint": "API",
 "properties": {
   "type": "Open API"
 },
 "relations": {
   "provider": "CartService"
 },
 "icon": "Link"
}

Core Kafka Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Kafka Library",
  "identifier": "CoreKafkaLibrary"
}

Core Payment Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Payment Library",
  "identifier": "CorePaymentLibrary"
}

Cart Service JSON

{
 "identifier": "CartService",
 "title": "Cart Service",
 "blueprint": "Component",
 "properties": {
   "type": "service"
 },
 "relations": {
   "system": "Cart",
   "resources": [
     "cart-sql-sb"
   ],
   "consumesApi": [],
   "components": [
     "CorePaymentLibrary",
     "CoreKafkaLibrary"
   ]
 },
 "icon": "Cloud"
}

Products Service JSON

{
  "identifier": "ProductsService",
  "title": "Products Service",
  "blueprint": "Component",
  "properties": {
    "type": "service"
  },
  "relations": {
    "system": "Products",
    "consumesApi": [
      "CartAPI"
    ],
    "components": []
  }
}

Component Blueprint

{
 "identifier": "Component",
 "title": "Component",
 "icon": "Cloud",
 "schema": {
   "properties": {
     "type": {
       "enum": [
         "service",
         "library"
       ],
       "icon": "Docs",
       "type": "string",
       "enumColors": {
         "service": "blue",
         "library": "green"
       }
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "system": {
     "target": "System",
     "required": false,
     "many": false
   },
   "resources": {
     "target": "Resource",
     "required": false,
     "many": true
   },
   "consumesApi": {
     "target": "API",
     "required": false,
     "many": true
   },
   "components": {
     "target": "Component",
     "required": false,
     "many": true
   },
   "providesApi": {
     "target": "API",
     "required": false,
     "many": false
   }
 }
}

Resource Blueprint

{
 “identifier”: “Resource”,
 “title”: “Resource”,
 “icon”: “DevopsTool”,
 “schema”: {
   “properties”: {
     “type”: {
       “enum”: [
         “postgress”,
         “kafka-topic”,
         “rabbit-queue”,
         “s3-bucket”
       ],
       “icon”: “Docs”,
       “type”: “string”
     }
   },
   “required”: []
 },
 “mirrorProperties”: {},
 “formulaProperties”: {},
 “calculationProperties”: {},
 “relations”: {}
}

API Blueprint

{
 "identifier": "API",
 "title": "API",
 "icon": "Link",
 "schema": {
   "properties": {
     "type": {
       "type": "string",
       "enum": [
         "Open API",
         "grpc"
       ]
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "provider": {
     "target": "Component",
     "required": true,
     "many": false
   }
 }
}

Domain Blueprint

{
 "identifier": "Domain",
 "title": "Domain",
 "icon": "Server",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {}
}

System Blueprint

{
 "identifier": "System",
 "title": "System",
 "icon": "DevopsTool",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "domain": {
     "target": "Domain",
     "required": true,
     "many": false
   }
 }
}
{{tabel-1}}

Microservices SDLC

  • Scaffold a new microservice

  • Deploy (canary or blue-green)

  • Feature flagging

  • Revert

  • Lock deployments

  • Add Secret

  • Force merge pull request (skip tests on crises)

  • Add environment variable to service

  • Add IaC to the service

  • Upgrade package version

Development environments

  • Spin up a developer environment for 5 days

  • ETL mock data to environment

  • Invite developer to the environment

  • Extend TTL by 3 days

Cloud resources

  • Provision a cloud resource

  • Modify a cloud resource

  • Get permissions to access cloud resource

SRE actions

  • Update pod count

  • Update auto-scaling group

  • Execute incident response runbook automation

Data Engineering

  • Add / Remove / Update Column to table

  • Run Airflow DAG

  • Duplicate table

Backoffice

  • Change customer configuration

  • Update customer software version

  • Upgrade - Downgrade plan tier

  • Create - Delete customer

Machine learning actions

  • Train model

  • Pre-process dataset

  • Deploy

  • A/B testing traffic route

  • Revert

  • Spin up remote Jupyter notebook

{{tabel-2}}

Engineering tools

  • Observability

  • Tasks management

  • CI/CD

  • On-Call management

  • Troubleshooting tools

  • DevSecOps

  • Runbooks

Infrastructure

  • Cloud Resources

  • K8S

  • Containers & Serverless

  • IaC

  • Databases

  • Environments

  • Regions

Software and more

  • Microservices

  • Docker Images

  • Docs

  • APIs

  • 3rd parties

  • Runbooks

  • Cron jobs

Starting with Port is simple, fast and free.

Let’s start