Engineering metrics to follow in 2024

September 25, 2024

Ready to start?

Engineering metrics to follow in 2024

What are engineering metrics?

Engineering metrics are data points that can be used to help engineering leaders and teams establish the status and quality of various aspects of engineering. They can be separated into metrics that measure:

- Attributes of a product (e.g., quality, scalability and reliability)
- Aspects of a process (e.g., analysis of tools, methods and deployment cycles)
- Progress on a project (e.g., resource utilization — people and costs)
- People (e.g., developer productivity, experience and satisfaction)
- Standards (e.g., code coverage, code quality and complexity, production readiness, security standards)

Engineering leaders want to be able to view metrics in one place, and act upon the data — driving actions that aim to improve these metrics and the developer experience. By doing so, they can improve products, streamline processes, provide transparency to other stakeholders, and prioritize effectively.

Engineering metrics vs. engineering KPIs

Engineering metrics are sometimes confused with key performance indicators (KPIs). The difference is that metrics are standardized measures that engineering leaders and teams can use to monitor and analyze performance, while engineering KPIs are tied to specific goals. Metrics can inform these KPI goals either on their own or in addition to other metrics. Therefore, while all KPIs are metrics, not all metrics are KPIs.

Types of engineering metrics 

  • Quantitative indicators 

Quantitative indicators are number-based metrics. Depending on the type of metric, that could be whole numbers, percentages, or ratios. Examples include lead time for change, deployment frequency, and code churn. Quantitative indicators are simpler measures to put in place and provide a way for leaders to track.

  • Qualitative indicators

Qualitative indicators are metrics that are less precise than quantitative indicators, as they involve surveys, interviews, evaluation, and feedback rather than numerical metrics. In other words, they are indicators of data provided by humans. That doesn’t mean that qualitative data is more unreliable than quantitative indicators — just that they’re different. In fact, qualitative data can assist with figuring out why something is happening — for instance, a survey provides more context on any given answer than quantitative indicators. It is the combination of qualitative and quantitative indicators that provides true value to an engineering organization.

Qualitative metrics allow leaders to measure things that may not be possible with quantitative metrics — some areas such as technical debt, for instance, require human judgment to compare the system state to its ideal state. Examples of qualitative indicators include code review feedback, feedback on software releases, stakeholder satisfaction, employee net promoter score (eNPS), and developer experience metrics.

  • Leading indicators

A leading indicator is a metric that allows engineering leaders to understand how something is going to perform in the future. Leading indicators can be measured quickly, and enable teams and leaders to make changes to address any issues quickly (and sometimes even immediately). Examples of leading indicators depend on the KPI that the metric is attached to. For example, the number of critical bugs in production could be a leading indicator of customer churn, but a lagging indicator (see below for more on this) if the KPI is production stability. Likewise, PR size can be a leading indicator in regards to time to merge, but a lagging indicator if your metric is about files changed within a PR. 

  • Lagging indicators

A lagging indicator is a metric focused on the past. Engineering teams can use this type of metric to review past implementations and measure their effect. An example of this is a team hitting their milestones on time when shipping a feature to production — this may show that the right processes are in place and that engineers have the autonomy to get their work done without interruptions. Another example is the number of support tickets filled after a push to production. This could show whether an action caused more issues than expected, and may indicate further investigation about the process involved is needed.
Unlike leading indicators, lagging indicators require measurement over lengthier periods, and because of that, users can’t directly affect them as quickly. While a leader could look at a lagging indicator and make changes promptly, they won’t be able to see if the changes have had the desired effect for a while. 

  • Input indicators 

Input indicators are metrics that measure the resources used throughout processes in the software development lifecycle. This can include the number of developers on a project, the time it took to complete or the financial investment that the team has made in the project. A KPI can link an input indicator of the financial investment made in a project and compare it to the budget. 

Most useful engineering metrics 

1. Lead time for changes

Lead time for changes is a DORA metric and is defined as “the time taken to go from code committed to code successfully running in production.” It is considered a critical DevOps metric and measures the length of time between a code change being committed to the main branch and when the code is in a deployable state. The metric enables leaders to understand whether existing processes and tools are working as they should; if it is high, then the team should use further metrics to identify any bottlenecks or issues. In short, lead time for changes measures how quickly a team can take action to change requests.

2. Mean time to recover (MTTR)

Mean time to recover is a lagging indicator and refers to the average time it takes to recover from a product or system failure. For time to recovery (TTR), leaders calculate how long it takes to deploy a patch after an issue is reported, ensuring the product or system is fully functioning after this patch. MTTR is, therefore, the average of all the times it took to recover from failures. This can be separated into the MTTR for different systems. Longer MTTR suggests that code may not be of the highest quality and that processes are not efficient.

3. Deployment frequency

Deployment frequency measures how often new code is deployed to a production environment. It is another one of the four DORA metrics. The metric helps to determine the correlation between the speed and the quality of an engineering team. Higher deployment frequency suggests that your team is able to make changes to code at will, without interference for things like unsupported dependencies or issues with infrastructure. It means the team has the agility to deliver consistently, whether it’s an update, a new feature, or a fix.

4. Change failure rate (CFR)

Change failure rate (CFR) measures the proportion of failed code deployments in production. It is calculated by working out the number of deployments that caused failures and dividing it by the total number of deployments. As a lagging indicator, it is a metric that is monitored over time, so that leaders understand the effort required to address issues and to release new code. A lower CFR conveys a well-oiled development pipeline and a team with the required skills to carry out changes without incurring any further problems. Measuring CFR can help teams to ensure the deployment process is smooth, predict risks and encourages a level of accountability.

5. Pull request (PR) or merge request size

A pull request (or merge request) is a way for engineers to contribute to a software initiative. Pull request size refers to either the number of lines of code a new feature requires, or the amount of code changes introduced within a single pull request. It is usually calculated by the number of files modified and lines of code added or removed. By measuring the PR size, engineering teams can ensure that their workflow is evenly balanced, avoiding elaborate changes that can slow down the review process; the idea is that the more manageable the size of a PR, the simpler it is to review code, obtain feedback, and merge better quality code.

6. Cycle time

Cycle time is the amount of time from the beginning of a task to its completion — it is a measure of process speed. It usually refers to either the delivery phase of the software development process from the developer’s first commit to deployment, or it refers to the time when a commit is first logged and when it is merged. Either way, cycle time shows how quickly an engineering department is working, and can be used to determine how internal teams compare to one another, or with external teams.

7. Code churn

Code churn is a metric that shows how often a piece of code — such as a file, a class, or a function — has been altered through rewrites, additions or deletions within 21 days of writing code. By tracking this metric, organizations can get an idea of the stability and health of their software.
A higher churn rate can be a sign of inefficient coding at the outset, suggesting a lack of clarity for developers or issues with certain practices or the codebase. A higher churn rate results in less stable software and increases workload for the engineering team as they have to troubleshoot any issues. Meanwhile, a low code churn rate indicates a more stable codebase, suggesting development processes are well-governed, and engineers have the clarity they need. 

8. Merge frequency 

Merge frequency measures the number of code changes (PRs or merge requests) that are merged into the codebase. This metric has a big bearing on developer satisfaction because merges are often where pull requests get stuck. So a higher merge frequency means that there is less likelihood of a bottleneck slowing things down; suggesting better communication and adherence to DevOps best practices. Keeping a high merge frequency allows teams to spot and resolve issues earlier in the development process.

9. Mean time between failures (MTBF)

Mean time between failures is a metric that enables engineers and end users to analyze the quality and reliability of software. It measures the time lag between each failure; the longer time frame before repairs, the better. MTBF is calculated by dividing the total time of operation by the number of failures that take place during that time. Using this average value, engineering teams can estimate how long a product or piece of software will last without causing any disruptions.

10. Code coverage

Code coverage, also called test coverage, measures the percentage of your code that has been automatically tested. It is a key indicator of software reliability, with higher coverage implying a lower chance of bugs. Code coverage encourages developers to write thorough tests and adopt better design practices to ensure the code meets test standards. Low coverage areas may be riskier to modify.

Best practices for integrating engineering metrics into your company

Tie metrics to objectives

As we mentioned earlier, engineering metrics are data points that can be tied to key performance indicators (KPIs), and can be used in part or as a whole objective within the OKR (objectives and key results) framework. To do this effectively, you have to identify the team’s pain points and objectives - this can include improving developer experience, reducing costs or enhancing compliance and security. Then you have to consider the metrics to use to improve in any one of these areas; for instance, ‘number of tickets created for DevOps’ would correlate with ‘developer experience’ and ‘boosting developer productivity’. Then you can identify what action you’re going to take - perhaps introducing self-service as a way to reduce the number of tickets.

Make sure the metrics are relevant to the user

Engineering leaders need answers to specific questions, and the metrics they use need to be able to help them to get to an answer. They typically want to know if they’re on track to meet goals, where bottlenecks are in their workflow, how they can support their teams more effectively, and what needs immediate attention. So for example, merge frequency is a metric that can help them to identify bottlenecks.

Ensure you can pull together metrics and make sense of the data

Metrics can be found in multiple tools and systems, and this makes it difficult for teams and leaders to make sense of their data. Having a central system of record and the ability to combine, view, and analyze metrics in dashboards can help you to make better informed decisions. But what’s more, you should be able to see how different metrics are connected and correlated, such as knowing if a slowdown in deployment frequency is happening at the same time as a spike in incidents.

Use metrics that go beyond productivity and quality

Measuring many of the metrics above can be helpful but they don’t tell the entire story. Other metrics may include those about security vulnerabilities, compliance with organizational standards, costs, SLO compliance, and more.

Drive change on the metrics you have

Having the metrics is all well and good, but you need to be able to use the metrics to drive change. Consider an approach that enables you to measure metrics, act on them and optimize accordingly; in other words, create a feedback loop to continuously improve your processes.

Use an internal developer portal for your engineering metrics

All of these best practices may sound like the ideal way to go about using engineering metrics, but are much harder to put in place at your organization. To adhere to these best practices you can use an internal developer portal.

The portal provides:

-  a central system of record, pulling in data from many sources and tools, enabling you to track and measure the metrics inside the portal.

-  a way to drive continuous improvement by uncovering actionable insights, identifying areas of improvement, prioritizing action plans and tracking key goals.

- the power to drive change by enabling you to set alerts, define self-service actions, drive initiatives, create automations and more.

Find out more about Port’s Insights here and try out our live demo here.

{{cta_1}}

Check out Port's pre-populated demo and see what it's all about.

Check live demo

No email required

{{cta_2}}

Contact sales for a technical product walkthrough

Let’s start
{{cta_3}}

Open a free Port account. No credit card required

Let’s start
{{cta_4}}

Watch Port live coding videos - setting up an internal developer portal & platform

{{cta_5}}

Check out Port's pre-populated demo and see what it's all about.

(no email required)

Let’s start
{{cta_6}}

Contact sales for a technical product walkthrough

Let’s start
{{cta_7}}

Open a free Port account. No credit card required

Let’s start
{{cta_8}}

Watch Port live coding videos - setting up an internal developer portal & platform

{{cta-demo}}
{{reading-box-backstage-vs-port}}

Example JSON block

{
  "foo": "bar"
}

Order Domain

{
  "properties": {},
  "relations": {},
  "title": "Orders",
  "identifier": "Orders"
}

Cart System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Cart",
  "title": "Cart"
}

Products System

{
  "properties": {},
  "relations": {
    "domain": "Orders"
  },
  "identifier": "Products",
  "title": "Products"
}

Cart Resource

{
  "properties": {
    "type": "postgress"
  },
  "relations": {},
  "icon": "GPU",
  "title": "Cart SQL database",
  "identifier": "cart-sql-sb"
}

Cart API

{
 "identifier": "CartAPI",
 "title": "Cart API",
 "blueprint": "API",
 "properties": {
   "type": "Open API"
 },
 "relations": {
   "provider": "CartService"
 },
 "icon": "Link"
}

Core Kafka Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Kafka Library",
  "identifier": "CoreKafkaLibrary"
}

Core Payment Library

{
  "properties": {
    "type": "library"
  },
  "relations": {
    "system": "Cart"
  },
  "title": "Core Payment Library",
  "identifier": "CorePaymentLibrary"
}

Cart Service JSON

{
 "identifier": "CartService",
 "title": "Cart Service",
 "blueprint": "Component",
 "properties": {
   "type": "service"
 },
 "relations": {
   "system": "Cart",
   "resources": [
     "cart-sql-sb"
   ],
   "consumesApi": [],
   "components": [
     "CorePaymentLibrary",
     "CoreKafkaLibrary"
   ]
 },
 "icon": "Cloud"
}

Products Service JSON

{
  "identifier": "ProductsService",
  "title": "Products Service",
  "blueprint": "Component",
  "properties": {
    "type": "service"
  },
  "relations": {
    "system": "Products",
    "consumesApi": [
      "CartAPI"
    ],
    "components": []
  }
}

Component Blueprint

{
 "identifier": "Component",
 "title": "Component",
 "icon": "Cloud",
 "schema": {
   "properties": {
     "type": {
       "enum": [
         "service",
         "library"
       ],
       "icon": "Docs",
       "type": "string",
       "enumColors": {
         "service": "blue",
         "library": "green"
       }
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "system": {
     "target": "System",
     "required": false,
     "many": false
   },
   "resources": {
     "target": "Resource",
     "required": false,
     "many": true
   },
   "consumesApi": {
     "target": "API",
     "required": false,
     "many": true
   },
   "components": {
     "target": "Component",
     "required": false,
     "many": true
   },
   "providesApi": {
     "target": "API",
     "required": false,
     "many": false
   }
 }
}

Resource Blueprint

{
 “identifier”: “Resource”,
 “title”: “Resource”,
 “icon”: “DevopsTool”,
 “schema”: {
   “properties”: {
     “type”: {
       “enum”: [
         “postgress”,
         “kafka-topic”,
         “rabbit-queue”,
         “s3-bucket”
       ],
       “icon”: “Docs”,
       “type”: “string”
     }
   },
   “required”: []
 },
 “mirrorProperties”: {},
 “formulaProperties”: {},
 “calculationProperties”: {},
 “relations”: {}
}

API Blueprint

{
 "identifier": "API",
 "title": "API",
 "icon": "Link",
 "schema": {
   "properties": {
     "type": {
       "type": "string",
       "enum": [
         "Open API",
         "grpc"
       ]
     }
   },
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "provider": {
     "target": "Component",
     "required": true,
     "many": false
   }
 }
}

Domain Blueprint

{
 "identifier": "Domain",
 "title": "Domain",
 "icon": "Server",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {}
}

System Blueprint

{
 "identifier": "System",
 "title": "System",
 "icon": "DevopsTool",
 "schema": {
   "properties": {},
   "required": []
 },
 "mirrorProperties": {},
 "formulaProperties": {},
 "calculationProperties": {},
 "relations": {
   "domain": {
     "target": "Domain",
     "required": true,
     "many": false
   }
 }
}
{{tabel-1}}

Microservices SDLC

  • Scaffold a new microservice

  • Deploy (canary or blue-green)

  • Feature flagging

  • Revert

  • Lock deployments

  • Add Secret

  • Force merge pull request (skip tests on crises)

  • Add environment variable to service

  • Add IaC to the service

  • Upgrade package version

Development environments

  • Spin up a developer environment for 5 days

  • ETL mock data to environment

  • Invite developer to the environment

  • Extend TTL by 3 days

Cloud resources

  • Provision a cloud resource

  • Modify a cloud resource

  • Get permissions to access cloud resource

SRE actions

  • Update pod count

  • Update auto-scaling group

  • Execute incident response runbook automation

Data Engineering

  • Add / Remove / Update Column to table

  • Run Airflow DAG

  • Duplicate table

Backoffice

  • Change customer configuration

  • Update customer software version

  • Upgrade - Downgrade plan tier

  • Create - Delete customer

Machine learning actions

  • Train model

  • Pre-process dataset

  • Deploy

  • A/B testing traffic route

  • Revert

  • Spin up remote Jupyter notebook

{{tabel-2}}

Engineering tools

  • Observability

  • Tasks management

  • CI/CD

  • On-Call management

  • Troubleshooting tools

  • DevSecOps

  • Runbooks

Infrastructure

  • Cloud Resources

  • K8S

  • Containers & Serverless

  • IaC

  • Databases

  • Environments

  • Regions

Software and more

  • Microservices

  • Docker Images

  • Docs

  • APIs

  • 3rd parties

  • Runbooks

  • Cron jobs

Starting with Port is simple, fast and free.

Let’s start