Internal developer platforms exist so that you can ship software faster and better.
- Faster, by providing developers with golden paths, giving them autonomy and guardrails.
- Better, by improving production readiness and compliance and reducing MTTR.
One of the ways to do this is to ensure that observability is baked into your platform engineering practices. By ensuring observability, you can improve reliability, compliance and velocity.
What is observability?
At its core, observability is a method for understanding what's happening inside your systems. It's often mentioned alongside monitoring, and while the two are closely related, and both are about identifying the root cause of issues, they serve different purposes in maintaining the health of your software.
- Observability is the more proactive of the two. It's like watching a stock ticker, where you see stock prices fluctuate over time. Just as a stock trader might infer trends and make decisions based on these changes, observability helps you understand what’s happening in your system in real-time or to even anticipate future issues.
- Monitoring is more reactive. It alerts you when something has already gone wrong, for example, a 504 error. Monitoring provides a snapshot of the moment, notifying you of issues that need immediate attention.
In short, observability is a live stream around system information; it lets you understand what happened up to and immediately following the event - so it enables you to understand the progression of the state of a system over time, whereas monitoring captures point-in-time information.
What is an internal developer portal and platform?
An internal developer platform provides golden paths for developers and managers. It consists of many tools and the self-service actions that are reusable that run through them. Its goal is to reduce cognitive load on developers without abstracting away context and underlying technologies.
Internal developer portals are the central hub for the internal developer platform, providing a microservice catalog, a way to set and maintain software standards and developer self-service.
If your internal developer platform is a collection of technologies that your enterprise has assembled to operate the business, an internal developer portal is the interface through which various users (like developers, operators, and product managers) interact with these technologies.
The portal is designed to simplify this interaction by abstracting away the complexity and specialized knowledge needed to manage these technologies. It also reduces the amount of tools that developers need to interact with, in this case observability and monitoring tools.
In the context of observability, internal developer portals can:
- Ensure better practices around standards and compliance - driving better software and baking in observability; and
- Make incident management processes better and simpler for developers
How internal developer portals make observability better
If observability is about building reliably and anticipating failures in advance, internal developer portals can drive better reliability by ensuring everything software is built with compliance, resilience and standards in mind. Specifically, internal developer portals support the following:
Ensuring everything software is built with observability inside
Internal developer portals create golden paths, to ensure that when a service is built standards are met. Once such a requirement can be that observability is baked into any new service or self-service action. This will ensure that:
- All assets are monitored
- That every critical asset has a owner and an on-call
Acting as a central system of record
The service catalog is at the center of the internal developer portal. It has various entities, from microservices, to cloud resources, running services, APIs, and additional data can be added to the entities, such as AppSec data or cost. There is tremendous value in connecting information from observability platforms to software catalog entities. What this does is take data from outside tools and immediately add the context that helps understand what’s going on with a service in real time.
Instead of tracking standards, spreadsheets, CMDBs, various methods of checking compliance, checklists and SRE reviews, the internal developer portal has all this data in one place.
How portals and observability work together
Here’s an example: without an internal developer portal, when an issue arises, such as a memory error, a developer is typically assigned a ticket to fix it.
At this point they may need to dive into multiple observability tools like Datadog, Grafana, and New Relic to understand what occurred. This can be difficult because many organizations operate with a ticketing system that requires approval to access these platforms. Once granted access, they need to navigate between dozens of different dashboards and may experience difficulty determining the right dashboard to use. This can result in prolonged war room sessions as teams work through the night to identify the root cause.
An internal developer portal can bridge these gaps by connecting the dots between different systems and giving users the autonomy to access the information they need quickly. This reduces the complexity and time needed to diagnose and resolve issues.
How a portal can help in incident management with the right information
Observability and monitoring tools are crucial, but they are just one part of the broader chain of events needed to resolve issues. The journey to resolution often involves navigating through various systems—you're in Splunk for logs, Prometheus for metrics, and Honeycomb for tracing. This complex web of tools can be time-consuming and cumbersome to sift through to find answers.
This is where an internal developer portal becomes incredibly powerful. Imagine you detect an issue with your recommendation service. With a portal, you can immediately understand the problem's context through a unified view. The portal’s graph allows you to see how services are deployed, their relationships, and the cloud environments they operate in.
The visualization is just one part of the equation. The underlying metrics, logs, and traces—often spread across different systems—are the meat of observability. A portal’s promise lies in its ability to truly integrate these components, making it easier to bring all this data together for context and quick issue resolution.
What SREs can build with an internal developer portal
Site Reliability Engineers (SREs) can use the internal developer portal to drive better incident management outcomes
A better on-call experience: SREs can use the portal to build a framework that will make on-call work better.
For instance, SRE and platform engineering teams focused on uptime and reliability may have already created a suite of dashboards for critical services. The portal can link to these dashboards so that users or on-call personnel can quickly access the necessary information, whether it's in Grafana, Datadog, or another tool. By applying specific filters to dashboards and associating them with the relevant service, individual contributors can easily find the data they need.
These contributors should already have the appropriate permissions to view this information, as it pertains to their services. Ultimately, an internal developer portal aims to break down barriers and make information readily accessible.
Permissions: SREs can also create dynamic permissions or just in time permissions to ensure on-call engineers can easily self-serve.
This is especially critical during off-hours. If an issue arises at 3 AM and the on-call SRE is working alone, having a portal to access all necessary information can be crucial.
For example, many companies rely on a Confluence page or other documentation tool that provides a step-by-step guide of the troubleshooting process. However, these steps can include using tools which the SRE doesn’t have permission to use or may suggest looking at a dashboard that doesn’t make sense to them. A portal can automate and enhance these steps, by providing self-service actions to get permission for access to tools or databases, and by providing relevant information in a way that the SRE will understand it; for instance using dashboards specifically tailored to them, rather than providing them with access to observability and monitoring tools that are tailored to DevOps engineers.
Internal developer portals give you the freedom to change the underlying observability tools
Internal developer portals are loosely coupled with the underlying internal developer platform. This means that the underlying platform tools can be replaced without hurting or changing the developer experience. In the case of sometimes costly observability tools, internal developer portals allow you to change tools as you need, while ensuring that the developers have the same experience addressing observability issues.
This future-proof approach ensures that regardless of the tools an organization uses, an internal developer portal can adapt. If the technology stack changes, the portal can accommodate those changes effortlessly. Being observability-tooling agnostic, the portal ensures that the right tools are used for the right jobs, all while simplifying the user’s interaction with the system.
Tags:
Platform EngineeringCheck out Port's pre-populated demo and see what it's all about.
No email required
Contact sales for a technical product walkthrough
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Check out Port's pre-populated demo and see what it's all about.
(no email required)
Contact sales for a technical product walkthrough
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Book a demo right now to check out Port's developer portal yourself
Apply to join the Beta for Port's new Backstage plugin
It's a Trap - Jenkins as Self service UI
Further reading:
Example JSON block
Order Domain
Cart System
Products System
Cart Resource
Cart API
Core Kafka Library
Core Payment Library
Cart Service JSON
Products Service JSON
Component Blueprint
Resource Blueprint
API Blueprint
Domain Blueprint
System Blueprint
Microservices SDLC
Scaffold a new microservice
Deploy (canary or blue-green)
Feature flagging
Revert
Lock deployments
Add Secret
Force merge pull request (skip tests on crises)
Add environment variable to service
Add IaC to the service
Upgrade package version
Development environments
Spin up a developer environment for 5 days
ETL mock data to environment
Invite developer to the environment
Extend TTL by 3 days
Cloud resources
Provision a cloud resource
Modify a cloud resource
Get permissions to access cloud resource
SRE actions
Update pod count
Update auto-scaling group
Execute incident response runbook automation
Data Engineering
Add / Remove / Update Column to table
Run Airflow DAG
Duplicate table
Backoffice
Change customer configuration
Update customer software version
Upgrade - Downgrade plan tier
Create - Delete customer
Machine learning actions
Train model
Pre-process dataset
Deploy
A/B testing traffic route
Revert
Spin up remote Jupyter notebook
Engineering tools
Observability
Tasks management
CI/CD
On-Call management
Troubleshooting tools
DevSecOps
Runbooks
Infrastructure
Cloud Resources
K8S
Containers & Serverless
IaC
Databases
Environments
Regions
Software and more
Microservices
Docker Images
Docs
APIs
3rd parties
Runbooks
Cron jobs