Introduction
Let’s set the scene, imagine a company with more than 1,000 developers. They work in siloed teams with various products, infrastructure, and technology stacks. Each DevOps team has its own technical specialization and creates internal scripts and control panels to cope - Kafka topics, Kubernetes and Hadoop Clusters, VMs, databases, etc.
Some teams manage their instances in Excel files; some SSH to many different servers - and some search their email history for old ticket requests. The sub-optimal resource utilization and general chaos keeps on growing every single day. Deleting resources becomes an impossible task, mainly because of fear. DevOps teams become scared of deleting an unused instance because this instance can be a part of the production environment in some way.
Does any of this sound familiar?
If it does, you’ll already know that changes need to happen. So here are a few lessons I learned as the Head of Platform Engineering while building a customizable DevPortal to get this giant mess back on track and provide a great developer experience.
Changes are hard but worth it.
The developers had got used to the freedom to work in their own way. You'll meet resistance if you introduce a new way of working and suddenly ‘force’ them to implement it, even if it’s just to add a short and simple YAML file to their Git Repo.
Convincing someone that a project will only have worth in the long term is nearly impossible. So we had to find ways to give the developers immediate value. And it’s not just the developers; the decision stakeholders that greenlit the DevPortal project want to see quick value and direct impact. With this group, prepare to manage expectations - adoption doesn’t happen in a day.
Providing that immediate value meant giving control to the people using the DevPortal; rather than dictating the new system, we allowed them to build it to their requirements. This involvement and the opportunity for self-service helped engagement and interest in the new platform.
{{cta}}
Ticketing systems are not enough; self-service for the win
We had many templates for our tickets, and we were constantly improving them. Yet, no matter how hard we tried, there were always special requests or vital missing information. The biggest issue with the ticketing system was the level of manual work our DevOps teams had to do.
Defining any operation with its actual execution made it very transparent between the developer and the infrastructure they wanted to consume. Day-two operations suddenly become much more streamlined. There’s no need to define ‘sub-sub-sub-tasks’ for each ticket type. Developers could re-configure their instance configuration in a self-serve fashion. It was a beautiful thing to watch.
The challenge was dynamic and varied with parameters so each team could provide the required data. We needed to avoid confusion but maintain the developers’ freedom.
Balancing DevOps <> Developer priorities is a fine line
As a person immersed in DevOps, I need to know that infrastructure is deployed ideally, well-monitored, and not hitting peaks. Don’t even mention things like “out of RAM.”
As more and more developers started using the platform we created, we noticed something. They didn’t care if resources were well limited and monitored; in fact, they hated meeting quotas and limitations. Instead, they want a Kafka Consumer that just works. They want a Kubernetes deployment that just scales and always returns that “200 OK” response.
For our team, being open to changes and criticism was vital. Personally, hearing the developer's experiences only ever made me want to improve the platform - even if teams didn’t immediately appreciate our hard work. I learned that a DevPortal will always be a work in progress. And that’s ok. Balancing these priorities will primarily be derived from your organizational culture and structure.
{{cyberark}}
Fit your solution to the organizational structure
This was the lesson that took the longest to learn. But, at the end of the day, it’s all about the people who run your organization and how they manage and communicate their infrastructure allocation. We gave developers out-of-the-box templates called "golden paths," but we also gave more technical teams the ability to consume infrastructure in dynamic ways. We wouldn’t restrict teams to work in one specific way.
Some teams were more “technological” than others. Some teams were more “production critical” than others, which could change depending on the business’s strategic priorities. These differences led us to develop features like quotas per team and product and several levels of visibility into the infrastructure and audit log.
Each DevOps team had a different level of confidence in their “automation readiness,” so we added another manual approval step that occurred when specific parameters were met. Win-win for everyone!
And, of course, the list doesn’t end there. The key takeaway here is to be curious about the differing needs of your various users.
Once the system stabilized, the possibilities were limitless!
I’ve thought long and hard about writing such a bold statement, but the long-term change was remarkable! Being well organized just made everything so much smoother and, well, more organized. The benefits were not only for the developers but also for the DevOps teams.
In addition, we took who performed the operation into consideration. For example: When a developer tried to create an S3 bucket, we provided him with a different set of options to create a bucket, but when the analyst tried to create a bucket, he had only one way to do it because we understood the other options would not be relevant for him in any case.
After going through the inevitable growing pains with adoption and platform positivity, we discussed with DevOps, team leaders, developers, and stakeholders. Then, taking advantage of the fact that everything infrastructure is centralized, we came up with awesome ideas for improvements.
Things like cost optimization, better resource usage, secure and predictable deployment, and code reuse on the self-serve side (e.g., authentication, form creation, task execution) all naturally improved from being more organized.
It was a beautiful thing to witness.
DevOps teams could focus on improving the infrastructure and deepening their expertise. Developers could focus on development and allocate their infrastructure without waiting times without being infrastructure experts. In addition, stakeholders got automated daily, weekly, and monthly reports about resource usage.
And we, as a team, maintained and improved the platform so everyone could enjoy a better development experience and be happier. Which is the ultimate goal, right?
All these lessons (and plenty more) led me to build Port. Working together with a fantastic team, we walk those fine lines daily. We’ve created a customizable platform to fit any organizational structure and workflow. Integration can happen gradually using your existing infrastructure tools, scripts, and processes.
We’ve been there, we know the challenges, and we can help your team experience the rewards of a self-serve DevPortal. Feel free to reach out to me personally on Twitter or LinkedIn.
{{cta_7}}
Book a demo right now to check out Port's developer portal yourself
It's a Trap - Jenkins as Self service UI
How do GitOps affect developer experience?
It's a Trap - Jenkins as Self service UI. Click her to download the eBook
Learning from CyberArk - building an internal developer platform in-house
Example JSON block
Order Domain
Cart System
Products System
Cart Resource
Cart API
Core Kafka Library
Core Payment Library
Cart Service JSON
Products Service JSON
Component Blueprint
Resource Blueprint
API Blueprint
Domain Blueprint
System Blueprint
Microservices SDLC
Scaffold a new microservice
Deploy (canary or blue-green)
Feature flagging
Revert
Lock deployments
Add Secret
Force merge pull request (skip tests on crises)
Add environment variable to service
Add IaC to the service
Upgrade package version
Development environments
Spin up a developer environment for 5 days
ETL mock data to environment
Invite developer to the environment
Extend TTL by 3 days
Cloud resources
Provision a cloud resource
Modify a cloud resource
Get permissions to access cloud resource
SRE actions
Update pod count
Update auto-scaling group
Execute incident response runbook automation
Data Engineering
Add / Remove / Update Column to table
Run Airflow DAG
Duplicate table
Backoffice
Change customer configuration
Update customer software version
Upgrade - Downgrade plan tier
Create - Delete customer
Machine learning actions
Train model
Pre-process dataset
Deploy
A/B testing traffic route
Revert
Spin up remote Jupyter notebook
Engineering tools
Observability
Tasks management
CI/CD
On-Call management
Troubleshooting tools
DevSecOps
Runbooks
Infrastructure
Cloud Resources
K8S
Containers & Serverless
IaC
Databases
Environments
Regions
Software and more
Microservices
Docker Images
Docs
APIs
3rd parties
Runbooks
Cron jobs
Check out Port's pre-populated demo and see what it's all about.
No email required
Contact sales for a technical product walkthrough
Open a free Port account. No credit card required
Watch Port live coding videos - setting up an internal developer portal & platform
Check out Port's pre-populated demo and see what it's all about.
(no email required)