Introduction
Alerts are tricky, especially when it comes to alerts coming from DevOps tools, monitoring tools and cloud resources. Datadog, Sentry, grafana, AWS, coralogix, splunk, new relic all fire alerts about different parts of the engineering infrastructure. This doesn’t only create alert fatigue, it also creates significant cognitive load for developers, who may not know these tools well in the first place. Think about alerts coming from Kubernetes tools or infrastructure alerts that developers can’t make sense of, nor be 100% sure what service they are actually related to.
Internal developer portals can help with alert management and unification
Internal developer portals have a software catalog at their center. The catalog is extremely flexible in terms of its data model, allowing you to create “types” (they are called blueprints in Port) that define types of entities that need to be in the catalog. A running service is one type that can be tracked in the software catalog, and you can have other types, whether for resources, such as AWS or Kubernetes, or according to the use case you’d like to work in, you can also use internal developer portal templates that map provider data, use GitOps, connect your CI/CD pipeline, map Kubernetes or cloud resources, or use IaC.
The result is a graph-based software catalog showing entities, from microservices, to ephemeral environments, kubernetes clusters, packages, cloud accounts and more. Internal developer portals are about providing developers with self-service actions in one place, from setting up ephemeral environments, requesting permissions, rollback deployments and more. The software catalog provides developers with the data they need, in context, to understand the devops assets they rely on.
When alerts are also included in the software catalog, developers get a single pane of glass for all things alerts, in-context within the relevant software catalog entity and with all the relevant information that is needed, such as the service or resource owner. Apart from the convenience of not needing to check many alert tools, the fact that alerts are in context provides developers with a reduction of cognitive load, since the alert is tied to its origin, such as a problem in the production environment related to a certain service. Alerts can also be tied to day-2 operations that help resolving the underlying issue.
Alert unification in the software catalog
What we’re describing is using the internal developer portal as a unified alerts database. We take data from alert platforms (Prometheus and others) and present it in developer dashboards. In this way, developers can easily track the status of their services, in one place, and not move between different systems.
Here is the alerts view in Port:
Acting upon alerts with developer self-service in the internal developer portal
As mentioned above, internal developer portals are used for developer self-service actions, and this comes in handy with alerts unification for developers. Inside the portal, once they see alerts, developers can act - anything from acknowledging an issue, to investigating it in the developer portal or even triggering a playbook or reverting a version. The information about services and resources can help them find owners, on-call and more.
Clicking on the three dots on the right hand side of the table show above shows developers the self-service actions that are set for each alerts, on the entity and in-context.
Using scorecards to drive engineering quality using alert information
Scorecards are an important functionality of internal developer portals. They track metrics, from service naturity, through health and more, and display them in the context of the entity they belong to. These scorecards can help set engineering standards, and are also useful for alerts. Scorecards can help developers understand software health, since they can be based on aggregated alerts coming from various tools.
Alert unification and workflow automation
Workflow automation within Port can also be tied to alerts, where an alert is associated with a specific entity in the software catalog, and the actions that are triggered post alert can be tied to the criticality of the service (if it is and the alert is high severity a message can be sent to several teams), or the fact that the service is internet exposed.
Tags:
Platform EngineeringCheck out Port's pre-populated demo and see what it's all about.
No email required
Contact sales for a technical product walkthrough
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Check out Port's pre-populated demo and see what it's all about.
(no email required)
Contact sales for a technical product walkthrough
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Book a demo right now to check out Port's developer portal yourself
Apply to join the Beta for Port's new Backstage plugin
It's a Trap - Jenkins as Self service UI
Further reading:
Example JSON block
Order Domain
Cart System
Products System
Cart Resource
Cart API
Core Kafka Library
Core Payment Library
Cart Service JSON
Products Service JSON
Component Blueprint
Resource Blueprint
API Blueprint
Domain Blueprint
System Blueprint
Microservices SDLC
Scaffold a new microservice
Deploy (canary or blue-green)
Feature flagging
Revert
Lock deployments
Add Secret
Force merge pull request (skip tests on crises)
Add environment variable to service
Add IaC to the service
Upgrade package version
Development environments
Spin up a developer environment for 5 days
ETL mock data to environment
Invite developer to the environment
Extend TTL by 3 days
Cloud resources
Provision a cloud resource
Modify a cloud resource
Get permissions to access cloud resource
SRE actions
Update pod count
Update auto-scaling group
Execute incident response runbook automation
Data Engineering
Add / Remove / Update Column to table
Run Airflow DAG
Duplicate table
Backoffice
Change customer configuration
Update customer software version
Upgrade - Downgrade plan tier
Create - Delete customer
Machine learning actions
Train model
Pre-process dataset
Deploy
A/B testing traffic route
Revert
Spin up remote Jupyter notebook
Engineering tools
Observability
Tasks management
CI/CD
On-Call management
Troubleshooting tools
DevSecOps
Runbooks
Infrastructure
Cloud Resources
K8S
Containers & Serverless
IaC
Databases
Environments
Regions
Software and more
Microservices
Docker Images
Docs
APIs
3rd parties
Runbooks
Cron jobs