Introduction
If you're part of an R&D team that is eager to adopt AI tools, assistants, or agents to improve productivity, you probably want to know which tool is best. While you're doing the right thing by considering new tools, you may struggle to understand which investments are working — are these tools making engineers more productive? Has code quality improved?
This is not a new problem or one unique to adopting GenAI tools like GitHub Copilot. Usually, the answers to these questions require analyzing data from multiple systems, and if you’ve dealt with tool sprawl, you know it’s a big challenge to accurately measure the impact of any tool, let alone a specific one. This lack of visibility into tool usage and value is a core issue that platform engineering and internal developer portals have emerged to solve.
Even where teams are able to measure utilization, it's still hard to understand the impact of the tooling on your teams, your engineering metrics, and your bottom line. In this article, we’ll outline how R&D leaders should enable and empower their teams to use AI tools, build an effective measurement system to gauge the impact of tools like GitHub Copilot, and how to leverage an internal developer portal to understand the value generative AI tools have across your software development pipeline.
Why is it important to measure the impact of GenAI tools?
Once engineering leadership has decided to adopt AI tools, they are typically confronted with three challenges:
- It’s difficult to measure their true impact
- There are too many options
- They’re all costly
In most cases, R&D organizations will green light smaller implementations to pilot GenAI products before making larger investments. This is where it will become important to measure the return on your initial investment — and make the case for wider adoption programs.First and foremost, leadership wants to consider the raw costs of adoption. For example, using GitHub Copilot could cost up to $40 per month per user, which quickly scales to $250,000 per year as you clear 500 engineers. Such a large investment will need to be accompanied with an iron-clad business case that proves a meaningful improvement to productivity and developer experience. Another reason it’s worth measuring the return on your investment into GenAI tools is that the technology still requires human intervention and review. After all, the leading models have been trained on a corpus of all of the vulnerability-laden, imperfect code available on the internet. Developers may be excited about the potential to focus their attention on business logic while Copilot handles writing boilerplate, but that may not be an effective strategy for larger teams with more complex, brownfield environments.
How to effectively measure the impact of GenAI
As GenAI matures, frameworks for its impact are still emerging. Plus, as many AI tools promise faster coding, some R&D leaders have voiced concerns about the volume of code produced, which may increase lead times for changes and the average review times for PRs.One approach to gauging the impact of GenAI across your pipeline and team productivity comes from DORA, which offers two impact metrics:
- Throughput: This refers to how long it takes committed code to make it to production, and is typically measured using two well-known DORA metrics, lead time for changes and deployment frequency. Together, these metrics can give you insight into whether code makes it to production faster with AI, and how many changes your deployment pipeline can handle. With an effective GenAI implementation, throughput should increase across the board.
- Stability: This refers to the consistency, reliability, and security of your engineering environment — which DORA suggests be measured using change failure rate and your mean time to recovery. Though AI-written code still needs to be approved and managed by humans, these metrics help you understand whether the AI-written code induces more bugs, vulnerabilities, or outages, and if so, how long it takes your teams to recover from them.
Measuring throughput and stability will help surface the benefits and disadvantages of GenAI across three primary impact factors:
Google’s DORA team doesn’t see these metrics as mutually exclusive — elite performers achieve higher throughput while increasing and maintaining stability. You can increase productivity and improve developer experience at a reasonable cost. If you’re already measuring throughput and stability through DORA metrics, doing so closely during a GenAI trial will help you understand its impact — positive, negative, and neutral — on your teams and systems.
Risks associated with adopting GenAI tools
There is a wide array of GenAI tools available, each of which can uniquely impact the performance and output of your development teams. One of the primary issues with making an investment into GenAI tools is figuring out which ones make sense to use — which comes before you’ll ever need to measure how well they solve your problems. While organizations will typically start with coding assistants, there is scope to leverage AI throughout the DevOps lifecycle, in both coding and non-coding activities. Some examples of commonly implemented generative AI tools include:
- AI agents: This includes tools like Devin, which take instructions and work autonomously, writing code, committing changes, and opening pull requests for the team to review.
- AI assistants for coding activities: These are tools like GitHub Copilot, which are integrated into a developer’s IDE and can generate code suggestions, reply to prompts and chat messages, and overall accelerate developer activities.
- AI assistants for non-coding activities: These are tools like CodeRabbit, which assist with code reviews and offer a variety of solutions for assisting with incident management and other tasks.
- AI app builders: This includes Vercel’s V0 platform, which can perform full-stack web development (though only on a React/Next.js stack).
Adopting new tooling does not necessarily guarantee greater velocity or a better developer experience. And in any case, R&D leadership will need to understand the impact of each tool individually.
Research from Google and DX also demonstrates that developer sentiment around AI can make or break the success of a team’s GenAI implementation trial. Developers still have an advantage over AI in terms of actually understanding what’s going on. AI may be a marvel, but as it stands, developers are better able to understand features, their environments, the context in which code is written or deployed, and any constraints or organizational standards that exist.
We see a correlation between genAI adoption and increased chaos among our customers. AI introduces chaos, more coding leads to more vulnerabilities, more bugs, and more incidents:
To adopt AI responsibly and prevent this exponential chaos, organizations need an internal developer portal. Here is how:
- Measuring impact: Tracking KPIs (like MTTR, Incidents, # of vulnerabilities) is essential to understanding both the risks and benefits AI introduces. Without measurement, there’s no control.
- Reducing Chaos with AI and your internal developer portal: We envision a future where AI, combined with an internal developer portal, doesn’t just mitigate chaos but actively reduces it.
How to set up a successful GenAI tool pilot
As mentioned earlier, small-scale pilots are a common and effective way to measure the value of any potential new software tool. To set up a successful pilot, you’ll want to follow a few guidelines:
- Test with two teams: Launching your pilot with two smaller teams of equal size will be more cost effective than buying a bunch of licenses for everyone all at once. It will also help you develop an effective launch plan for the larger team should you decide to move forward, meaning that your official launch will be more effective.
- Test one tool at a time: Each tool piloted should be piloted on its own to preserve the quality of your data. One team should use one AI tool, and the other should not use any. This will help you sort out whether the tool is the reason for any improvements or degradations in your pilot.
- Choose the metrics you’ll use to gauge AI impact in advance: To avoid confusion and focus on real-world impacts, you’ll want to pay special attention to specific areas the AI tool is meant to help with. DORA metrics can provide a holistic view of GenAI’s impact on your software development pipeline and developer productivity.
Using an internal developer portal to measure the impact of GenAI tools
An internal developer portal is the ideal place to measure throughput and stability, while also fostering a great developer experience. In short, a portal enables devs to do their work efficiently while also allowing engineering leaders to measure and improve software engineering metrics and product delivery goals. This is because portals unify every tool in your software development environment into a single space — bringing together insights from across tools like:
- Usage stats from GitHub Copilot
- Deployment frequency from code repos and CI/CD tools
- Code issue data from security tools, APM, and code quality scanners Copilot, Jira, GitHub Actions, etc.
Together, these give you the power to look at the impact of your chosen AI tools on every aspect of your pipeline.
GenAI may be the hot new thing, but that doesn’t mean it shouldn’t be considered the same as other tools in your technology stack. You can integrate your GenAI tool of choice into your internal developer portal, making it a direct part of your developer workflows and allowing you to measure its impact on your teams. The portal makes it easy to answer questions like:
- Has your throughput truly gone up since you've rolled out AI tools? How has deployment frequency changed?
- Is your stability improving? Have you seen an increase in outages or rollbacks since you introduced AI tools?
- Do your developers feel like they're better off?
Internal developer portals also help you organize your tool pilots according to best practices. You can clearly measure the before and after effects on an individual team, or measure across similar teams where one team acts as the control. Let's take GitHub Copilot as an example and see how Port answers the questions we posed above. There are two potential entry points you and your teams could approach from — either you have an active internal developer portal and engineering benchmarks in place, or you are starting from scratch and are looking to begin improving developer productivity. We’ll consider both options below and how your tests will differ based on where your teams stand at the start of your trial.
Pilot option A: Starting from scratch
You’ve decided to use GitHub Copilot to speed up development and want to conduct a test to measure how much faster development happens with Copilot as opposed to without it. Select two of your teams who have consistently produced high-quality code at similar rates, and set them up as follows:
- Team A is using GitHub Copilot as a pair programming assistant.
- Team B is your control team, who will continue to produce code as they have been without GenAI assistance.
This setup helps you compare teams while controlling for outside factors during your test. You’ll want to record the stability and throughput metrics each team has established to compare with later metrics as an indication of your team’s average performance and capability without GenAI.
Using the GitHub Copilot API, you can track metrics for Team A over a 28-day history. You can determine how often Team A uses Copilot to write code in times per day, including code completions and chat responses in Team A’s IDE for each day. With these insights, you can correlate Team A’s Copilot usage with their productivity.
You can also receive summaries of Team A’s usage for the pilot period, which gives a detailed breakdown of the suggestions Team A developers received, which suggestions they accepted, and how many active Copilot users Team A had on a daily basis, along with the languages they used Copilot for more frequently.
Pilot option B: Starting with an internal developer portal
Your team has successfully implemented an internal developer portal, which has helped you determine your existing engineering benchmarks for throughput and stability. Rather than set up two separate teams to test Copilot, choose one team to participate in the trial and limit the pilot to a period of time. The length of your trial may depend on your budget, but GitHub Copilot API data consistently refreshes every 28 days, which correlates with a month-long sprint.
In this case, your comparison points will be:
- The team’s throughput and stability metrics pulled from a month-long period prior to their Copilot test, but while the portal was in place and no other tools were being piloted (i.e., a normal development cycle).
- The team’s throughput and stability metrics as measured during the Copilot test phase.
With the portal in place, you’ll already have a history of your team’s metrics available to pick from, as well as the ability to directly integrate the GitHub Copilot API into the portal itself.
From there, Copilot becomes more than just another IDE tool — it becomes a measurable part of your entire software development lifecycle. You can use dashboards to keep track of the pilot program’s impact on key throughput and stability metrics:
In Port’s internal developer portal, you also gain an AI Insights panel that summarizes the impact of GitHub Copilot on your team’s metrics:
This panel directly correlates Copilot’s adoption to changes in your metrics, making it much easier to understand the impact of Copilot on your team’s productivity.However, the most important thing an internal developer portal does is keep the human developers at the center of your GenAI tool trial. Human devs remain ultimately responsible for the code that is produced. You might be tempted to assume that because AI produces code faster, the AI is at fault if error rates (bugs, vulns, etc.) go up. But no API or code commit record can tell you which developer wrote each line of code in the final commit. The portal can help you look at teams more holistically:
- If you see stability go down, the reason may be that you don't have adequate testing in place, either in the IDE where the human is working, or in your deployment pipelines.
- Which teams’ stability remains high? The portal gives you the ability to find out why they're doing better.
- You can run additional tests after your pilot, such as testing out AI tools that write tests for you.
The future of GenAI and internal developer portals
At Port, we envision creating a closed-loop feedback cycle through fully agentic workflows. By harnessing the comprehensive context within the portal, where you can access all of your data in one place, we envision integrating with GitHub Copilot to automate meaningful PRs. For instance, if your portal detects an incident, AI can then analyze all recent events on a timeline (code changes, production access, secret rotations, etc.):And then leverage Copilot’s API to propose an optimal PR, streamlining the resolution process and maximizing Copilot’s value for organizations:We believe an internal developer portal will be a meaningful enabler for GenAI technology. Moreover, it will get much more out of it with the context it holds.
Wrapping up
While AI only recently became ubiquitous, the specific concerns you have about measuring their value and ROI are not new. These challenges have arisen alongside many other ubiquitous technologies of the past — and an internal developer portal can play an essential role in solving these age-old issues.Similarly, it’s important to understand where in your SDLC GenAI tools will make a positive impact: GenAI may not be effective for larger teams with more complex, brownfield environments. This is where human developers can shine!The key to measuring the return on your investment in AI is to match your usage with the tool’s cost to deliver business value. Following the recommendations above, an internal developer portal can help you run a smooth, successful, insightful GenAI pilot that delivers the insights you need to make adoption worth it. All in all, a portal can make your adoption journey faster and help your team reach maturity in using AI coding tools more effectively.
Check out Port's pre-populated demo and see what it's all about.
No email required
Contact sales for a technical product walkthrough
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Check out Port's pre-populated demo and see what it's all about.
(no email required)
Contact sales for a technical walkthrough of Port
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Book a demo right now to check out Port's developer portal yourself
Apply to join the Beta for Port's new Backstage plugin
It's a Trap - Jenkins as Self service UI
Further reading:
Learn more about Port’s Backstage plugin
Build Backstage better — with Port
Example JSON block
Order Domain
Cart System
Products System
Cart Resource
Cart API
Core Kafka Library
Core Payment Library
Cart Service JSON
Products Service JSON
Component Blueprint
Resource Blueprint
API Blueprint
Domain Blueprint
System Blueprint
Microservices SDLC
Scaffold a new microservice
Deploy (canary or blue-green)
Feature flagging
Revert
Lock deployments
Add Secret
Force merge pull request (skip tests on crises)
Add environment variable to service
Add IaC to the service
Upgrade package version
Development environments
Spin up a developer environment for 5 days
ETL mock data to environment
Invite developer to the environment
Extend TTL by 3 days
Cloud resources
Provision a cloud resource
Modify a cloud resource
Get permissions to access cloud resource
SRE actions
Update pod count
Update auto-scaling group
Execute incident response runbook automation
Data Engineering
Add / Remove / Update Column to table
Run Airflow DAG
Duplicate table
Backoffice
Change customer configuration
Update customer software version
Upgrade - Downgrade plan tier
Create - Delete customer
Machine learning actions
Train model
Pre-process dataset
Deploy
A/B testing traffic route
Revert
Spin up remote Jupyter notebook
Engineering tools
Observability
Tasks management
CI/CD
On-Call management
Troubleshooting tools
DevSecOps
Runbooks
Infrastructure
Cloud Resources
K8S
Containers & Serverless
IaC
Databases
Environments
Regions
Software and more
Microservices
Docker Images
Docs
APIs
3rd parties
Runbooks
Cron jobs