Blog

Observability in platform engineering

Sooraj Shah

Sam Neill

August 7, 2024

Example H2

Ready to start?

Play with live demo

Internal developer platforms exist so that you can ship software faster and better.

Faster, by providing developers with golden paths, giving them autonomy and guardrails.
Better, by improving production readiness and compliance and reducing MTTR.

One of the ways to do this is to ensure that observability is baked into your platform engineering practices. By ensuring observability, you can improve reliability, compliance and velocity.

What is observability?

At its core, observability is a method for understanding what's happening inside your systems. It's often mentioned alongside monitoring, and while the two are closely related, and both are about identifying the root cause of issues, they serve different purposes in maintaining the health of your software.

Observability is the more proactive of the two. It's like watching a stock ticker, where you see stock prices fluctuate over time. Just as a stock trader might infer trends and make decisions based on these changes, observability helps you understand what’s happening in your system in real-time or to even anticipate future issues.
Monitoring is more reactive. It alerts you when something has already gone wrong, for example, a 504 error. Monitoring provides a snapshot of the moment, notifying you of issues that need immediate attention.

In short, observability is a live stream around system information; it lets you understand what happened up to and immediately following the event - so it enables you to understand the progression of the state of a system over time, whereas monitoring captures point-in-time information.

What is an internal developer portal and platform?

An internal developer platform provides golden paths for developers and managers. It consists of many tools and the self-service actions that are reusable that run through them. Its goal is to reduce cognitive load on developers without abstracting away context and underlying technologies.

Internal developer portals are the central hub for the internal developer platform, providing a microservice catalog, a way to set and maintain software standards and developer self-service.

If your internal developer platform is a collection of technologies that your enterprise has assembled to operate the business, an internal developer portal is the interface through which various users (like developers, operators, and product managers) interact with these technologies.

The portal is designed to simplify this interaction by abstracting away the complexity and specialized knowledge needed to manage these technologies. It also reduces the amount of tools that developers need to interact with, in this case observability and monitoring tools.

In the context of observability, internal developer portals can:

Ensure better practices around standards and compliance - driving better software and baking in observability; and
Make incident management processes better and simpler for developers

How internal developer portals make observability better

If observability is about building reliably and anticipating failures in advance, internal developer portals can drive better reliability by ensuring everything software is built with compliance, resilience and standards in mind. Specifically, internal developer portals support the following:

Ensuring everything software is built with observability inside

Internal developer portals create golden paths, to ensure that when a service is built standards are met. Once such a requirement can be that observability is baked into any new service or self-service action. This will ensure that:

All assets are monitored
That every critical asset has a owner and an on-call

Acting as a central system of record

The service catalog is at the center of the internal developer portal. It has various entities, from microservices, to cloud resources, running services, APIs, and additional data can be added to the entities, such as AppSec data or cost. There is tremendous value in connecting information from observability platforms to software catalog entities. What this does is take data from outside tools and immediately add the context that helps understand what’s going on with a service in real time.

Instead of tracking standards, spreadsheets, CMDBs, various methods of checking compliance, checklists and SRE reviews, the internal developer portal has all this data in one place.

How portals and observability work together

Here’s an example: without an internal developer portal, when an issue arises, such as a memory error, a developer is typically assigned a ticket to fix it.

At this point they may need to dive into multiple observability tools like Datadog, Grafana, and New Relic to understand what occurred. This can be difficult because many organizations operate with a ticketing system that requires approval to access these platforms. Once granted access, they need to navigate between dozens of different dashboards and may experience difficulty determining the right dashboard to use. This can result in prolonged war room sessions as teams work through the night to identify the root cause.

An internal developer portal can bridge these gaps by connecting the dots between different systems and giving users the autonomy to access the information they need quickly. This reduces the complexity and time needed to diagnose and resolve issues.

How a portal can help in incident management with the right information

Observability and monitoring tools are crucial, but they are just one part of the broader chain of events needed to resolve issues. The journey to resolution often involves navigating through various systems—you're in Splunk for logs, Prometheus for metrics, and Honeycomb for tracing. This complex web of tools can be time-consuming and cumbersome to sift through to find answers.

This is where an internal developer portal becomes incredibly powerful. Imagine you detect an issue with your recommendation service. With a portal, you can immediately understand the problem's context through a unified view. The portal’s graph allows you to see how services are deployed, their relationships, and the cloud environments they operate in.

The visualization is just one part of the equation. The underlying metrics, logs, and traces—often spread across different systems—are the meat of observability. A portal’s promise lies in its ability to truly integrate these components, making it easier to bring all this data together for context and quick issue resolution.

What SREs can build with an internal developer portal

Site Reliability Engineers (SREs) can use the internal developer portal to drive better incident management outcomes

A better on-call experience: SREs can use the portal to build a framework that will make on-call work better.

For instance, SRE and platform engineering teams focused on uptime and reliability may have already created a suite of dashboards for critical services. The portal can link to these dashboards so that users or on-call personnel can quickly access the necessary information, whether it's in Grafana, Datadog, or another tool. By applying specific filters to dashboards and associating them with the relevant service, individual contributors can easily find the data they need.

These contributors should already have the appropriate permissions to view this information, as it pertains to their services. Ultimately, an internal developer portal aims to break down barriers and make information readily accessible.

Permissions: SREs can also create dynamic permissions or just in time permissions to ensure on-call engineers can easily self-serve.

This is especially critical during off-hours. If an issue arises at 3 AM and the on-call SRE is working alone, having a portal to access all necessary information can be crucial.

For example, many companies rely on a Confluence page or other documentation tool that provides a step-by-step guide of the troubleshooting process. However, these steps can include using tools which the SRE doesn’t have permission to use or may suggest looking at a dashboard that doesn’t make sense to them. A portal can automate and enhance these steps, by providing self-service actions to get permission for access to tools or databases, and by providing relevant information in a way that the SRE will understand it; for instance using dashboards specifically tailored to them, rather than providing them with access to observability and monitoring tools that are tailored to DevOps engineers.

Internal developer portals give you the freedom to change the underlying observability tools

Internal developer portals are loosely coupled with the underlying internal developer platform. This means that the underlying platform tools can be replaced without hurting or changing the developer experience. In the case of sometimes costly observability tools, internal developer portals allow you to change tools as you need, while ensuring that the developers have the same experience addressing observability issues.

This future-proof approach ensures that regardless of the tools an organization uses, an internal developer portal can adapt. If the technology stack changes, the portal can accommodate those changes effortlessly. Being observability-tooling agnostic, the portal ensures that the right tools are used for the right jobs, all while simplifying the user’s interaction with the system.

Tags:

Platform Engineering

Check out Port's pre-populated demo and see what it's all about.

Check live demo

No email required

Check out the 2025 State of Internal Developer Portals report

See the full report

No email required

Contact sales for a technical product walkthrough

Let’s start

Open a free Port account. No credit card required

Let’s start

Watch Port live coding videos - setting up an internal developer portal & platform

Let’s start

Check out Port's pre-populated demo and see what it's all about.

(no email required)

Let’s start

Contact sales for a technical walkthrough of Port

Let’s start

Open a free Port account. No credit card required

Let’s start

Watch Port live coding videos - setting up an internal developer portal & platform

Let’s start

Book a demo right now to check out Port's developer portal yourself

Apply to join the Beta for Port's new Backstage plugin

Apply for beta

It's a Trap - Jenkins as Self service UI

How do GitOps affect developer experience?

It's a Trap - Jenkins as Self service UI. Click her to download the eBook

Download eBook

Learning from CyberArk - building an internal developer platform in-house

Learn more about Port’s Backstage plugin

Build Backstage better — with Port

Read the plugin docs

Return to Backstage Plugin docs

Example JSON block

{
  "foo": "bar"
}

Order Domain

{
  "properties": {},
  "relations": {},
  "title": "Orders",
  "identifier": "Orders"
}

Cart System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Cart",
  "title": "Cart"
}

Products System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Products",
  "title": "Products"
}

Cart Resource

{
  "properties": {
    "type": "postgress"
  },
  "relations": {},
  "icon": "GPU",
  "title": "Cart SQL database",
  "identifier": "cart-sql-sb"
}

Cart API

{
 "identifier": "CartAPI",
 "title": "Cart API",
 "blueprint": "API",
 "properties": {
   "type": "Open API"
 },
 "relations": {
   "provider": "CartService"
 },
 "icon": "Link"
}

Core Kafka Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Kafka Library",
  "identifier": "CoreKafkaLibrary"
}

Core Payment Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Payment Library",
  "identifier": "CorePaymentLibrary"
}

Cart Service JSON

{
 "identifier": "CartService",
 "title": "Cart Service",
 "blueprint": "Component",
 "properties": {
   "type": "service"
 },
 "relations": {
   "system": "Cart",
   "resources": [
     "cart-sql-sb"
   ],
   "consumesApi": [],
   "components": [
     "CorePaymentLibrary",
     "CoreKafkaLibrary"
   ]
 },
 "icon": "Cloud"
}

Products Service JSON

{
  "identifier": "ProductsService",
  "title": "Products Service",
  "blueprint": "Component",
  "properties": {
    "type": "service"
  },
  "relations": {
    "system": "Products",
    "consumesApi": [
      "CartAPI"
    ],
    "components": []
  }
}

Component Blueprint

{
 "identifier": "Component",
 "title": "Component",
 "icon": "Cloud",
 "schema": {
   "properties": {
     "type": {
       "enum": [
         "service",
         "library"
       ],
       "icon": "Docs",
       "type": "string",
       "enumColors": {
         "service": "blue",
         "library": "green"
       }
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "system": {
     "target": "System",
     "required": false,
     "many": false
   },
   "resources": {
     "target": "Resource",
     "required": false,
     "many": true
   },
   "consumesApi": {
     "target": "API",
     "required": false,
     "many": true
   },
   "components": {
     "target": "Component",
     "required": false,
     "many": true
   },
   "providesApi": {
     "target": "API",
     "required": false,
     "many": false
   }
 }
}

Resource Blueprint

{
 “identifier”: “Resource”,
 “title”: “Resource”,
 “icon”: “DevopsTool”,
 “schema”: {
   “properties”: {
     “type”: {
       “enum”: [
         “postgress”,
         “kafka-topic”,
         “rabbit-queue”,
         “s3-bucket”
       ],
       “icon”: “Docs”,
       “type”: “string”
     }
   },
   “required”: []
 },
 “mirrorProperties”: {},
 “formulaProperties”: {},
 “calculationProperties”: {},
 “relations”: {}
}

API Blueprint

{
 "identifier": "API",
 "title": "API",
 "icon": "Link",
 "schema": {
   "properties": {
     "type": {
       "type": "string",
       "enum": [
         "Open API",
         "grpc"
       ]
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "provider": {
     "target": "Component",
     "required": true,
     "many": false
   }
 }
}

Domain Blueprint

{
 "identifier": "Domain",
 "title": "Domain",
 "icon": "Server",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {}
}

System Blueprint

{
 "identifier": "System",
 "title": "System",
 "icon": "DevopsTool",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "domain": {
     "target": "Domain",
     "required": true,
     "many": false
   }
 }
}

Microservices SDLC

Scaffold a new microservice
Deploy (canary or blue-green)
Feature flagging
Revert
Lock deployments
Add Secret
Force merge pull request (skip tests on crises)
Add environment variable to service
Add IaC to the service
Upgrade package version

Development environments

Spin up a developer environment for 5 days
ETL mock data to environment
Invite developer to the environment
Extend TTL by 3 days

Cloud resources

Provision a cloud resource
Modify a cloud resource
Get permissions to access cloud resource

SRE actions

Update pod count
Update auto-scaling group
Execute incident response runbook automation

Data Engineering

Add / Remove / Update Column to table
Run Airflow DAG
Duplicate table

Backoffice

Change customer configuration
Update customer software version
Upgrade - Downgrade plan tier
Create - Delete customer

Machine learning actions

Train model
Pre-process dataset
Deploy
A/B testing traffic route
Revert
Spin up remote Jupyter notebook

Engineering tools

Observability
Tasks management
CI/CD
On-Call management
Troubleshooting tools
DevSecOps
Runbooks

Infrastructure

Cloud Resources
K8S
Containers & Serverless
IaC
Databases
Environments
Regions

Software and more

Microservices
Docker Images
Docs
APIs
3rd parties
Runbooks
Cron jobs

Starting with Port is simple, fast and free.

Let’s start

Ready to start?

What is observability?

What is an internal developer portal and platform?

How internal developer portals make observability better

Ensuring everything software is built with observability inside

Acting as a central system of record

How portals and observability work together

How a portal can help in incident management with the right information

What SREs can build with an internal developer portal

Internal developer portals give you the freedom to change the underlying observability tools

Tags:

Previous article

Next article

Check out Port's pre-populated demo and see what it's all about.

Check out the 2025 State of Internal Developer Portals report

Contact sales for a technical product walkthrough

Open a free Port account. No credit card required

Watch Port live coding videos - setting up an internal developer portal & platform

Check out Port's pre-populated demo and see what it's all about.

Contact sales for a technical walkthrough of Port

Open a free Port account. No credit card required

Watch Port live coding videos - setting up an internal developer portal & platform

Book a demo right now to check out Port's developer portal yourself

Apply to join the Beta for Port's new Backstage plugin

It's a Trap - Jenkins as Self service UI

How do GitOps affect developer experience?

It's a Trap - Jenkins as Self service UI. Click her to download the eBook

Learning from CyberArk - building an internal developer platform in-house

Further reading:

Learn more about Port’s Backstage plugin

Build Backstage better — with Port

Example JSON block

Order Domain

Cart System

Products System

Cart Resource

Cart API

Core Kafka Library

Core Payment Library

Cart Service JSON

Products Service JSON

Component Blueprint

Resource Blueprint

API Blueprint

Domain Blueprint

System Blueprint

Microservices SDLC

Development environments

Cloud resources

SRE actions

Data Engineering

Backoffice

Machine learning actions

Engineering tools

Infrastructure

Software and more

You may also be interested in

How site reliability engineers (SREs) can "shift left" using a unified service catalog

How to measure the ROI of GenAI tools

What is an internal developer portal homepage?

Starting with Port is simple, fast and free.