What you need to know about the data model in an internal developer portal
September 28, 2023
Ready to start?
The key to a successful portal is covering the developer routines you need
How do you build an internal developer portal? What data needs to be inside it and how do you do that?
You can imagine we get asked these questions often. After all, we keep saying “bring your own data model to Port”, and mean it. In return, people ask “what data model should I use?” This post is an answer to those questions.
It’s easy to respond by diving into Port blueprints (the metadata schema definition used by Port) and the software catalog entities they create, but that would be too much detail, too quickly. Instead, let’s take a step back and begin with the basic principles of the internal developer portal and the data we want to bring into it.
One last note before we begin: getting the data model right and ingesting data into Port may seem like a difficult task, yet it isn’t. You can easily follow the docs and see what to do. Many of our Port Ocean integrations come with ready-made blueprints (so you don’t need to do much), making the real question different: what developer routines do you want to cover in the portal?
This is what really matters: the definition of developer self-service actions, scorecards and automations. We won’t touch on those in this post, but do remember they are the key to a successful internal developer portal.
A map of everything software
The internal developer portal gives developers a connected map of everything they need to care about - every devops asset, cloud resource, devtool, infrastructure, Kubernetes cluster and the software that runs on it all. A good data model should provide *everything* that developers need to know so that they can understand the SDLC in context.
Developers need this data so that they can contextualize issues, assets, and software to quickly understand where they fit within the engineering infrastructure and better understand them. Simply put, they want answers to questions such as:
- Who is on-call right now?
- What is the documentation or readme for a given service?
- What is the version of a service in staging vs production?
- What is the current TTL of my Dev Environment?
- What packages should they use?
- Is the service they deployed compliant with production readiness standards?
- Can they provision an ephemeral environment?
More than microservices
Contrary to conventional wisdom, data in the software catalog isn’t just about microservices, it is also about how these services behave in different environments as well as data coming in from third party tools such as data about incident management, vulnerabilities and more. It's important to remember that the data can also be used by different personas - developers at different levels, managers, architects and devops at all levels.
Wait, what is a data model?
A good data model provides a graph of how everything is related to everything.
A data model is a representation of the layout and architecture of the different components that make up an organization's environment and infrastructure. Its goal is to make it easy to understand the interdependencies in the infrastructure and how the SDLC tech stack comes together. This helps answer questions such as how are cloud resources connected to a test environment on which a certain service is running, all within a larger domain and system.
It’s important to note that the data model shouldn’t be a closed set of properties and “types” that use ingested data to create the software catalog. Every organization is different - the architecture, tools used, cloud providers and frameworks vary, either slightly or broadly. The data model should reliably describe your infrastructure, in a way that resonates with your organization’s terminology, architecture and workflow.
Begin with use cases (what you want to help developers do), then iterate and extend: developer routines are key
Another way to think about where you want to begin with your data model is the desired use cases for the internal developer portal. Tracking incidents, managing microservices or resolving vulnerabilities will all create different initial data models, which can then be extended later. Taking an iterative approach, you can begin with the first data model and use case you need, and then continue to grow the data model and add use cases.
The following describes different ways of thinking about the data model. Choose the one/s that fit best to your understanding of the initial portal and platform you’d like to implement.
The three basic elements of the data model
The core model provides developers with a good understanding of the SDLC. It serves the core abstractions and visualizations that portal users need 99% of the time.
The core model for a software catalog is made of three elements:
- A service
- An environment
- A running service
You can probably guess what the “service” and “environment” are. But what is the “running service”?
The running service reflects the real world, where “live” services are deployed to several environments, such as development, staging or production environments. In the case of a single-tenant architecture, services are also deployed in many different customer environments; those are a “running service” too.
Why do these three elements provide the core of the data model? Because they provide developers with the connected map of the software development lifecycle. This “basic” data provides developers with a lot of what they need, even if the data model wouldn’t be extended to include more data. More importantly, even the basic model provides quite a lot of information about the upstream and downstream impact to any changes the developer is considering. Once you add visualizations, you can ensure each developer or operations person can see what is most relevant to them.
A deeper dive into the core model
To better understand the core model, let’s dig deeper.
Before we do, let’s explain blueprints. Blueprints are customizable schema definitions for any type/kind of asset in your software catalog. The definitions that make up blueprints are properties. You can think about a blueprint as a schema for a table in a database, where you can add all the different columns (properties) you want. Blueprints support all of the major types, going from standard primitives (strings, numbers and booleans) through arrays and the ability to embed markdown, all the way to iframes to create the most relevant visualization for the developer-users of the internal developer portal. We will use blueprints to define the core model.
What should a basic Service blueprint include?
The service blueprint is used to represent a static code repository and related metadata for a microservice. Besides ownership information, the service blueprint can also show documentation, change history, dependencies and more. Think of the critical data that would allow you to create a single source of truth about service components, observability, health, production readiness and more.
Here are some common properties for the service blueprint:
- The URL to the service repository
- The team responsible for the service
- The language of the service
- The README documentation
- The service architecture diagram
Remember this is a suggestion for an initial service blueprint. You can also extend it later, as we show further in this post.
What should an environment blueprint include?
The environment blueprint is used to represent an environment where microservices are deployed and resources are hosted.
With the environment blueprint, we want to keep track of the various environments we deploy and manage, their different categories, and uses (production, staging, etc.). In addition, we also want to have an organized view showing us the cost of the environment and its infrastructure, as well as all of the components, cloud resources and running services that make up the environment.
Here are some common properties for the environment blueprint:
- The cloud provider the environment is hosted on (in case of multiple cloud providers or when there is a transition between providers or to the cloud)
- The type of the environment (production, staging, test, etc)
- The region of the environment
What should a running service blueprint include?
The running service blueprint is used to provide the runtime context for a service running in a given environment. A service on its own is just static code, an environment is a collection of resources. The running service provides runtime information.
Here are some common properties for the running service blueprint:
- The commit hash of the deployed version
- The URL to the Grafana dashboard
- The Swagger API reference for the running version
- A URL to the logs dashboard of the running service
Ingesting data into the core model
In order to ingest data into the core model, we suggest using one of our exporters, or using Port’s open source Ocean Extensibility Framework. When data is ingested, the blueprint metadata schemas are populated, creating software catalog entities through auto-discovery.
- To bring service data into the developer portal, we suggest using our Gitlab, GitHub or Bitbucket exporters. Using these you can ingest pull requests, repositories, packages, monorepo, teams and more into the software catalog. You can also use our integration with the various Git providers to manage the information in Port using GitOps.
- To bring environment data into the developer portal, we suggest using our AWS or Kubernetes exporter. Using these you can ingest Kubernetes resources and service data into the software catalog, and any AWS resource into the software catalog.
- After ingesting your data, you can now create a running service entity and map the environment and the service to it using relations you created. This you can do from Port’s UI or using the API.
Port has many additional integrations using the Ocean extensibility framework. We may use them as we look at extending the data model. Note that some integrations come with blueprints inside, making the work easier.
What can you do with the core model?
The actual result of the model is derived from the blueprints and their relations, which provide the context. As a result, we can now:
- Look at an environment and see all of the different running services, everything that is currently running in production,
- For a given service, see where is it currently deployed
- For a running service, get direct access to the logs and to the API reference to see whether the service is running as it should, using the expected amount of resources, etc.
Self-service actions, scorecards and automations: your next step
Internal developer portals aren’t just about a software catalog. It’s at the core of the portal, but the real justification of the internal developer portal is to reduce toil and cognitive load for developers and that means delivering a robust self-service action layer to developers, as well as scorecards and automations.
For instance, for the basic catalog described above, think about
- Which developer self-service actions you’d like to provide on top of scaffolding a new microservice. For instance, which day-2 operations would you like to offer developers, what actions can they do on the environment side, from requesting permissions, an environment with a TTL and more.
- What scorecards can drive engineering quality? Production readiness, security standards?
- What automations would you like to add? Alerts on scorecard degradation? Automated triggering of runbooks?
Extending the internal developer portal to cover additional “developer routines”
Now is the time to think about extending the core data model.
The extension isn’t simply a function of adding more elements to the software catalog. It is first motivated by a product-driven understanding of what developers need to do on the portal, and what would drive greater productivity for them. Other users of the internal developer portal should also be taken into account. Specifically, platform engineers and devops that use the portal for their needs.
While developers may initially only need the information in the core model, extensions can do one of the following:
- Add data that would serve additional users of the software catalog, such as devops engineers. This would extend the portal to include more resource data that is the underlying infrastructure beneath environment. For instance, adding cloud resource data that devops can use. Another option here is adding information that lets managers do a better job of tracking initiatives, such as engineering quality initiatives, appsec and more.
- Add data that can be abstracted for developers, for instance, adding Kubernetes data at one level of abstraction for developers, and perhaps at another level for devops. This would relieve developers from working in contexts they aren’t familiar with (e.g. Kubernetes), or with tools such as vulnerability, incident management or FinOps.
- Add data that can be used to provide executives and engineering leaders with the ability to see and understand various engineering quality initiatives.
Here’s an example of how incident management data can add to a basic service blueprint as well as create a new one:
Let’s examine the extensions and how they broaden the scope of the internal developer portal.
Cloud resource data
Adding cloud resource data provides another layer of visibility into environments. To address this need, we created the Kubernetes, AWS, GCP and Azure exporters, so that we can get all of our data into the software catalog, where it can be shown in context, showing both metadata as well as runtime data, making dev and DevOps lives easier.
Developers can make use of cloud resource data that’s abstracted for them, such as which applications are related to which cloud resource. On the self-service actions side, developers can ask for cloud resource permissions, resource provisioning and ephemeral environments, all with baked-in guardrails and golden paths.
DevOps may need less abstractions and just want to see the raw data and the interdependencies. Using exporters you can add AWS Lambda functions, S3 buckets, SQS queues, ECS services and anything in the AWS cloud control API (500 resources!). In GCP you can add Cloud runs, Compute instances, Disks, GKE clusters, Memory store databases and service accounts etc. The same applies to Azure.
CI/CD data
Adding CI/CD data to the catalog can create a much needed single source of truth for CI/CD. You can ingest your CI/CD data into Port directly using Azure Pipelines, CircleCI, Codefresh, Github Workflows, GitLab CI/CD, Jenkins, etc. It is also possible to integrate any CI/CD system with Port by making calls to Port’s REST API during the CI/CD pipeline execution.
Through the developer portal, these capabilities also help platform engineering teams provide developers with better visibility into the deployment process, as they can see the deployment status and any errors that occur in real time.
This information also provides insights into the R&D process, with metrics such as the number of deployments performed by a team in a given week, which builds are consistently failing and require triage, which pipelines take much longer than the expected duration and could use optimization.
K8s data
Developers usually don’t know enough Kubernetes to be able to deal with raw Kubernetes data on their own. The data is usually too detailed and related to Kubernetes issues that developers may not be familiar with. To drive developer autonomy, the solution is to abstract Kubernetes data and make it accessible to developers and the first step is adding Kubernetes data to the portal.
Think of showing K8s data in the software catalog as “whitelisting” essential data for developers while retaining the ability to delve deeper into Kubernetes for other user groups, particularly DevOps. This additional data is valuable since DevOps also needs the software catalog.
Port provides an open source Kubernetes exporter that allows you to perform extract, transform, load (ETL) on data from K8s into the desired software catalog data model.
Here’s a diagram showing raw Kubernetes data on the right hand side and the Port blueprints it can be added to.
Incident management and alerts
Managing incidents can be a real challenge, especially when dealing with alerts from a plethora of DevOps and monitoring tools such as Datadog, Sentry, Grafana, AWS, Coralogix, Splunk, and New Relic. These alerts not only contribute to alert fatigue but also impose substantial cognitive load on developers, particularly those who aren't well-versed in these tools. Think about the frustration of receiving alerts from Kubernetes tools or infrastructure that leave developers puzzled, unsure of which service they are actually related to.
When you integrate alerts into your software catalog, you get a single pane of glass for all things alerts, in-context within the relevant software catalog entities, complete with all the information you need like the service or resource owner. Beyond the convenience of not needing to check multiple alert tools, the fact that alerts are in context significantly reduces the cognitive load on developers. Each alert is linked to its origin, such as a production issue tied to a specific service, and it can even be associated with day-2 operations that help resolve the underlying problem.
Within the portal, developers can swiftly act upon encountering alerts, whether it's acknowledging an issue, investigating it directly in the developer portal, or triggering a playbook or version rollback. Information about services and resources can also assist users in locating owners, on-call and more, using PagerDuty as an example.
For instance, we can add PagerDuty data by creating the relevant blueprints. This can include
- Create time
- Incident status
- Incident urgency
- Assignee
- Escalation policy
- Incident URL
Vulnerabilities and misconfigurations
According to Snyk's State of Open Source Security report, enterprises typically employ an average of nine security tools in parallel. From a developer's perspective, this medley of tools can create an overload of information, leading to cognitive load for both developers and AppSec teams. Moreover, understanding and working with each of these tools requires specific expertise, making it challenging for developers to grasp the broader context or determine the appropriate actions to take.
For vulnerabilities and misconfigurations, data from various tools can be abstracted and consolidated within the software catalog to provide context.
The developer portal can present vulnerabilities and misconfigurations from various tools and stages in the development lifecycle, all within a single interface—the internal developer portal. This approach transforms the internal developer portal into a centralized repository for security information. Developers can then gain a comprehensive view of potential risks, allowing them to assess the security status of a resource or microservice within its specific context and proactively address vulnerabilities and misconfigurations.
Issue management
You can integrate Port with issue management tools. The information from issue management tools can greatly enrich the service blueprint.
For example, you can view Jira issues inside Port, and even extend the model by relating a Jira issue to its related microservice. This can allow you to view all of the features currently being worked on for a given service, you could see what Jira issue is dependent on another microservice.
For instance, we can add Jira data by creating the relevant blueprints. This can include
- Assignee
- Status
- Reporter
- Priority
- Type
FinOps
By incorporating FinOps insights into Port, you can enable FinOps, DevOps, and platform engineering teams to efficiently manage cloud costs, optimize spending, and foster a culture of cost-conscious responsibility without the hassle of spending hours on basic reporting.
Cloud resource cost reporting tools such as KubeCost or AWS Cost Explorer provide cost data related to cloud resources or kubernetes objects such as deployment, service, namespace, etc.
But this isn’t the entire story. In many cases, the data doesn’t make much sense from the developer point of view, as well as from the development team perspective. The reason is that it doesn't provide immediate insight as to the microservice or system the developer or team are accountable for. The solution to this problem is usually achieved by tagging, in order to allocate the right costs to the right features and development efforts, which is pretty labor intensive and not always sustainable.
API management
You can ingest the APIs exposed by your microservices and their exposed routes into Port directly using Port’s REST API. The API routes information makes it easy to understand the functionality exposed by an API, including the expected inputs and documentation, making Port the hub to onboard new APIs and consume existing APIs internally.
In addition, by adding status and metrics properties to the API Routes blueprint, you can keep track of the health of the different specific routes, get an at-a-glance view of the health of an API and also make sure that the response time of APIs and their routes remains within expected parameters.
This is the API endpoint blueprint to manage an API catalog and keep track of API routes. In an extended model this blueprint is probably connected to Running Service:
Check out Port's pre-populated demo and see what it's all about.
No email required
Contact sales for a technical product walkthrough
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Check out Port's pre-populated demo and see what it's all about.
(no email required)
Contact sales for a technical product walkthrough
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Book a demo right now to check out Port's developer portal yourself
Apply to join the Beta for Port's new Backstage plugin
It's a Trap - Jenkins as Self service UI
Further reading:
Example JSON block
Order Domain
Cart System
Products System
Cart Resource
Cart API
Core Kafka Library
Core Payment Library
Cart Service JSON
Products Service JSON
Component Blueprint
Resource Blueprint
API Blueprint
Domain Blueprint
System Blueprint
Microservices SDLC
Scaffold a new microservice
Deploy (canary or blue-green)
Feature flagging
Revert
Lock deployments
Add Secret
Force merge pull request (skip tests on crises)
Add environment variable to service
Add IaC to the service
Upgrade package version
Development environments
Spin up a developer environment for 5 days
ETL mock data to environment
Invite developer to the environment
Extend TTL by 3 days
Cloud resources
Provision a cloud resource
Modify a cloud resource
Get permissions to access cloud resource
SRE actions
Update pod count
Update auto-scaling group
Execute incident response runbook automation
Data Engineering
Add / Remove / Update Column to table
Run Airflow DAG
Duplicate table
Backoffice
Change customer configuration
Update customer software version
Upgrade - Downgrade plan tier
Create - Delete customer
Machine learning actions
Train model
Pre-process dataset
Deploy
A/B testing traffic route
Revert
Spin up remote Jupyter notebook
Engineering tools
Observability
Tasks management
CI/CD
On-Call management
Troubleshooting tools
DevSecOps
Runbooks
Infrastructure
Cloud Resources
K8S
Containers & Serverless
IaC
Databases
Environments
Regions
Software and more
Microservices
Docker Images
Docs
APIs
3rd parties
Runbooks
Cron jobs