DevOps Roadmap 2022

Feb 17, 2022

In the last few weeks, I met some folks in my mentoring sessions, who are new to DevOps or in the mid of their career, who were interested in knowing what to learn in 2022. DevOps skills are high in demand and there is constant learning required to keep yourself in sync with market demand.

This post is to share the notes that can help you. Let’s see some guidance based on my experience and understanding.

Roadmap

Be fundamentally strong in the networking technologies

Understand the concepts such as HTTP/2, QUIC or HTTP3, Layer 4 and Layer 7 protocols, mTLS, Proxies, DNS, BGP, how load balancing works, IP Tables, the working of Internet, IP addresses and schemes, and lastly the Network design. I found Julia Evans’s blog very useful and my go-to place when I need to understand stuff in a simple way. She has covered a wide variety of topics in her blog posts and zines.

Master the operating system fundamentals, particularly Linux

As most of the systems (VMs, Containers, etc) run Linux, it is important to know from top to bottom. Learn scheduling, systemd interface, init system, cgroups and namespaces, performance tuning, and mastering the command line utilities — awk, sed, jq, yq, curl, ssh, openssl etc., Learn performance troubleshooting from Brendan’s blog.

CI/CD

If you are still into Jenkins, it is fine. But, the world has moved to cloud-native pipelines. Conceptually not much has changed in this space, but you can look into Github Actions, Tekton etc. How to do releases better? Understand various deployment strategies such as blue-green and canary.

Containerisation and Virtualisation

Apart from the popular Docker runtime, try containerd, podman etc and knowing How to containerise applications, how to implement container security, and how to run and orchestrate VMs in Kubernetes, see KubeVirt project.

Container Orchestration

Kubernetes is now a de facto standard for running containers. There is a lot of content on the Internet to learn Kubernetes. Focus on configuration best practices, application design, security, and scheduling. Setting up a cluster is getting trivial now but the day 2 operational stuff such as setting up, monitoring, logging, CI/CD, how to scale the cluster, cost optimization, and security are some questions people might be expecting from you.

Get a big picture of what is underneath the Kubernetes iceberg in this article by Asankov. It is a series of articles that explores Kubernetes.

Observability at Scale

Most of the engineers are aware of the Prometheus Grafana stack or similar. Trends suggests that many organisations are consolidating their Kubernetes clusters and observability, both from the performance and cost perspective, this helps. Learn the advanced configuration and architectures of Prometheus, and how to scale them. Look into technologies like Thanos, Cortex, VictoriaMetrics, Datadog, and Loki. Continuous profiling tools such as Parca, periscope, hypertrace and distributed tracing with open telemetry. Service meshes such as Istio are popular ingredients in cloud-native recipes.

Platform team as a Product team

The function of the Platform team is becoming more like a centralized product team that is focussing on its internal platform customers such as Developers and testers. The goal is to improve the ways of working and bring some order to the teams. Try to improvise on the problems the Developer and QA team faces. You are the enabler for other teams, instead of taking all the work in a central team, coach the dev team to take up typical DevOps responsibilities. That way you can scale and don’t burn yourself too much.

Security

In many small organizations, security was a second-class citizen. Product features were given more priority. But, due to growing sophisticated attacks and various strict compliance requirements, companies are adapting to shift-left security strategies. End-to-end encryption, strong RBAC, IAM policies, governance and auditing, and implementation of benchmarks such as NIST, CIS, ISO27001 are common. Container security, Policy as code, Cloud Governance, and Supply chain security are hot topics.

Programming

DevOps or SRE role is now taking the cross-cutting concerns of the Developers and creating tooling that can help in improving their productivity while enforcing the standards. A good quality software engineering practice and skills are required to craft high-quality platform components.

I can’t give enough stress to this. Good organizations are looking for good programming experience in Platform engineers. It is important in site reliability engineering as well, where you need to be fluent in programming, able to read, understand and debug the code written by others and if necessary, fix it.

Python and Golang are the most popular ones. My suggestion is Golang due to features like strong concurrency, strict type checking, adoption in various organs, toolchain and as many major projects are built using Golang, it makes sense to learn that over Python.

A few simple things you can try:

Write a CLI in your programming language.
Learn to write a REST API and interact with databases
Parallelism and Concurrency

Infrastructure as Code

Terraform is a standard in the projects. Once you understand the concept, it is easy to adapt to any other tooling as most of them are based on DSL.

Cloud

Most of the cloud works in the same way. So if you know one cloud well, you can easily work with other cloud providers. Focus on how you can design applications using cloud-native components in a highly available, resilient, secured, and cost-effective way.

Technical Writing

You might be wondering why I am talking about technical writing when discussing DevOps. A lot of folks don’t give enough attention to this, but it is super important in how you communicate and work with other teams. The future of work is remote and emails, slack/teams, and chats are the primary channels to talk and convey ideas to others.

On a regular basis, you might be creating documents such as runbooks, postmortems, RFCs, architectural decision records, and software design docs to name a few. A clear, easy-to-understand document does wonders. It can help you save your and the reader’s time and improve overall productivity. Suggest you read this article.

Site Reliability Engineering

The boundary between DevOps and SRE is getting thin. In some organisations, the same person might be performing both roles. Understand the concepts behind SLI, SLO, and Error budgets and SRE practices. Each organisation does it differently, so I don’t recommend copy-paste someone else’s culture into your team. Refer to the Google SRE culture.

Conclusion

Personally, I am excited about following in this year. This is not a definitive list as it keeps changing with time.

Service Mesh — Istio, Cilium Sidecarless mesh, Tetrate and Solo’s Gloo mesh offering.
How to improve Developer Productivity? It is a mix of culture, automation and tools.
SRE Platforms — honeycomb, Last9.
DevPortals — again linked with the motive of improving productivity and bridging knowledge gap.
Observability — technologies such as open telemetry, hypertrace, Thanos, VictoriaMetrics, Vector.
Security — supply chain security, code signing, tightening cloud security.
Golang — improving the current skills.
Serverless computing and Event-driven architectures
Web3 — understanding the landscape related to DevOps and Infrastructure

Be curious and keep learning. Continuous bite-size learning is easy, which you can do along with your full-time job. If you still have any questions, feel free to book some time with me. I am more than happy to help.

Cloud Native Weekly

Edits:

7 July 2022 — added Demystifying Kubernetes series in the Container orchestration section.

If this post was helpful, please click the clap 👏 button below a few times to show your support for the author 👇

DevOps Roadmap 2022 was originally published in FAUN Publication on Medium, where people are continuing the conversation by highlighting and responding to this story.