Monday, January 30, 2023

How To Get Started As an Apache Kafka Developer

Getting Started as an Apache Kafka Developer

Intro – why then how

So, you’ve decided to pursue a career developing applications with Apache Kafka®. You’ve made a wise decision. Kafka is used by over 80% of Fortune 500 companies, and Kafka development ranks as one of the highest-paying job skills in IT. And besides all the practical stuff, it’s a lot of fun to work with!

Now you’re probably wondering what’s the best way to get started in this new career. In this article, we’ll discuss some tips, strategies, and resources to get you off to a great start on this exciting journey.

Developer vs. administrator

First, you should be clear about what type of Kafka work you’d like to pursue. There are a variety of roles in this space, but they mainly fall into one of two camps: developer or administrator. Since I’m a developer, that will be the main focus of this article, but if you lean more toward the administrator or operator role, you may still be able to glean some insights.

For developers, there’s a range of opportunities, such as event-driven microservices, real-time analytics, data pipelines, stream processing, and more. You can get a good feel for some of the ways to use Kafka by perusing these use cases.

Once you have an idea of the type of work you’d like to do, it’s time to drill into more specifics, like programming languages, industries, and types of companies.

Lost in translation

When we talk about software development, we tend to think in terms of the language we’re most familiar with, but there is Kafka client support for many languages so I want to be careful not to make assumptions. I mean, you’re probably programming in Haskell, but you might not be 🙂. So we need to consider language options and opportunities. The client libraries that ship with Kafka are in Java and will work with most JVM languages, but there are also very good libraries available for Python, C/C++, .NET, JavaScript, Go, and others.

The level of support and popularity of these different Kafka clients varies, with Java being the most popular and having the strongest support, both from Confluent and the community (e.g., external libraries, books, tutorials). So, if you don’t already have a preference, Java might be the way to go. A related consideration is who, in your area, is hiring and what language(s) are they using. A search on a job site, like Indeed or Dice, can be helpful there.

Learning resources

Once you’ve decided on a language to focus on, it’s time to start filling those knowledge gaps. Don’t be discouraged by this step. We all have knowledge gaps, and filling them can be very rewarding. First, you’ll want to make sure you have a good understanding of Kafka basics. Fortunately, there are many resources to help you with this. A web search will turn up many great books and other resources. And, of course, Confluent Developer offers interactive courses ranging from introductory (Apache Kafka 101) to advanced (Kafka Internals), full documentation, and other content to help you get started.

The Kafka ecosystem

The Kafka ecosystem is continually growing, but there are some key components that anyone looking to work with Kafka should be familiar with.

Clients

We’ve already mentioned the clients. In whichever language you choose, there will be some code available for producing data to Kafka, consuming data from Kafka, and administering resources and configurations on Kafka brokers. Developers should be proficient with these. Aspiring administrators should be familiar with them, as they will affect how users deal with your Kafka cluster.

Kafka Connect

Kafka Connect is a framework built on top of the Kafka Producer and Consumer client libraries. It allows you to integrate external systems, such as databases, analytics engines, and SaaS applications with Kakfa using plugins, called connectors, and some configuration. There are hundreds of connectors available, but the best source of vetted connectors is the Confluent Hub. There you will find over 200 connectors with all the information you need to use them.

Kafka Streams

Kafka Streams is a Java library that provides powerful APIs and an easy-to-use DSL for building stateless and stateful event streaming applications. Kafka Streams can only be used with JVM languages, but even developers not working with a JVM language would benefit from understanding how it works and the role it plays. There are similar libraries in other languages, and an understanding of Kafka Streams will help you evaluate them.

Schema Registry

Kafka producers and consumers execute separately and don’t know anything about each other, but they do need to know about and agree upon the format of the data they are working with. This is where schemas come into play, and if you’re using schemas with Kafka, you really should be using the Confluent Schema Registry. Schema Registry provides a way to store and retrieve schemas, as well as a means of versioning your schemas as they evolve. It works well with most Kafka client libraries and can also be accessed directly via HTTP. You can learn more about it with the Schema Registry 101 course on Confluent Developer.

ksqlDB

As we discussed earlier, Kafka Streams is only available on the JVM, but there is another powerful tool for creating event-streaming applications with Kafka, and that is ksqlDB. ksqlDB is an application that runs in its own cluster that allows you to build streaming applications with SQL. ksqlDB supports filtering, aggregation, transformation, and joining of event streams and tables based on Kafka topics. It’s REST API allows you to interact with those applications from just about any programming language. For more information check out ksqlDB 101 or Inside ksqlDB

Command Line and Graphical interfaces

There are many ways to work with Kafka interactively, ranging from the shell scripts that come with Kafka to extensive graphical user interfaces. There isn’t space here to cover them all, but here are a few that are worth checking out.

Command Line

Confluent CLI - https://docs.confluent.io/confluent-cli/current/overview.html

kcat (formerly KafkaCat) - https://github.com/edenhill/kcat

kcctl (CLI for Kafka Connect) - https://github.com/kcctl/kcctl

GUI

Confluent Control Center - https://docs.confluent.io/platform/current/control-center/index.html

Conduktor - https://www.conduktor.io

Kafdrop - https://github.com/obsidiandynamics/kafdrop

akHQ - https://github.com/tchiotludo/akhq

Strategy

That’s a lot to learn, and it might seem overwhelming, so here’s some advice for how to build your knowledge without burning out.

Warming up

Before jumping into building Kafka applications, depending on your level of experience, it will be helpful to warm up with some introductory material, for example, the courses on Confluent Developer, such as Kafka 101, Kafka Connect 101, and Schema Registry 101. These courses have both video and text content that provide you with a gentle introduction to these technologies. They even include exercises that allow you to get hands-on.

Working out

A great next step is to get more hands-on experience with Kafka using quick-start guides and tutorials, where a problem is presented, and you can use Kafka and your favorite language to solve it. Confluent Developer provides plenty of these. Most of the tutorials are based on Java, but there are also getting-started exercises available for Python, .Net, Go, and other languages.

Building a project

Now that you’ve got some experience working with and solving problems with Kafka, you can continue and accelerate your learning journey by building a project. It can be something for work, a side project, or even a cool demo idea that you can show off at a meetup (more about that in the next section). But whatever project you choose, it should be end-to-end so that you get the broadest possible experience from it.

The reason that a complete project is so much more helpful than exercises or tutorials is that the problems you are trying to solve are your own. This will provide a much stronger context for what you are learning. Context is like a hook on which to hang knowledge. Things learned without context tend to fade quickly. Just ask someone who’s been cramming for an exam.

Another benefit of building a project with Kafka is that you can use it to show what you know by hosting your project in a GitHub repository. One of the advantages that we have in the technology space is that it is much easier to show prospective employers what we are capable of, and one of the great ways to do this is via source code repositories.

Contributing to Apache Kafka

There are many opportunities for involvement in the open source Apache Kafka project. This is a great way to learn as well as to put your learning into practice. Check out the official contributors' guide or watch this video from Kafka Summit 2020 for more information and inspiration.

Certification

Another way to demonstrate your knowledge is certification. While not a silver bullet, when used in conjunction with examples of actual code you’ve written, certification can provide an extra level of comfort to prospective employers. In truth, the more valuable aspect of getting certified is the incentive it provides for learning. Confluent provides certificates for both developers and administrators, along with suggestions for how to prepare for the exams.

Show your work

Chronicling your learning journey by way of a blog can help you to learn faster by better organizing your thoughts. It can also help you to get valuable feedback from those who read it. And it’s a great way to show prospective employers what you’ve learned and your ability to communicate it.

Don’t go it alone

As you go about building your first project or preparing for your certification exam, you will undoubtedly run into questions. You can find some good answers on Google or Stackoverflow, but there is a large, active, and helpful community of developers, administrators, and just all-around good people in the Kafka community.

Getting involved in the community will make your learning journey more enjoyable and more productive. It will also provide you with invaluable networking opportunities that can lead to your first job as a Kafka developer.

The Confluent Developer Forum and Confluent Community Slack are two great places to introduce yourself to the community, ask questions, and learn from reading others' questions and answers. You can also subscribe to the Apache Kafka developer and user mailing lists.

But the community is about more than just learning. It’s also about helping, inspiring, collaborating, and building each other up. So, don’t just lurk, or post a question when you’re stuck. Get involved. Try to answer some questions, tell us about what you are working on, or the cool new thing you learned.

Besides the online text venues, another great way to get to know people in the community is at meetups and conferences. The Confluent meetup hub will show you what meetups are coming up. There may be one in your area, or you can join one of the many online meetups from anywhere. As you learn, you may even consider presenting at a meetup. Most of these are recorded, and a link to your presentation could be a great addition to your CV.

Not only do members of the community help each other with technical issues, they often have valuable career advice. Here are some tips from a few Community Catalysts.

Olena Kutsenko is a Senior Developer Advocate.

“Starting to work with a new technology is often tough, especially if it is as complex as Apache Kafka. Here is a list of things that I'd recommend to those who want to become an Apache Kafka developer:

  1. Don't be afraid to ask for help. The community around Apache Kafka is knowledgeable and very friendly. So if after reading the docs and trying different approaches, you're still struggling to find a working solution for your task, don't hesitate to ask a question. Who knows, maybe the perfect solution is not so far from what you have!
  2. Don't get frustrated if the learning process is not as quick as you hoped. The Apache Kafka ecosystem is very wide and it is normal that a single person won't know all the nuances of the system. In fact, it is better to accept the fact that the learning process is never over! That's why it is so vital for us to share knowledge with each other (See the next point ;) ).
  3. Share what you learned with others. This is the best way to solidify your knowledge and get different perspectives.”

Robert Zych is a Data Platform Engineer.

“My 1st tip for anyone (regardless of programming background) interested in using Kafka is to understand when it should and shouldn’t be used. My 2nd tip would be to get/develop experience with a JVM language such as Kotlin, Java, or Scala (in that order). My 3rd tip would be to take courses and CCDAK (Confluent Certified Developer for Apache Kafka). And my 4th tip would be to experiment and build something with Kafka (preferably with helpful friends like you and Neil 🙂)”

Neil Buesing is a Kafka and Kafka Streams expert. He shares a few tips from the perspective of a hiring manager.

  1. “Show interest in Kafka’s improvements (e.g. up to date on some KIPs) seeing someone being current sets them apart from others in that they can be seen as someone that “enjoys Kafka” vs just using Kafka.
  2. Operational Knowledge - I don’t need a developer to know how to manage a cluster, but understanding operational aspects is helpful; e.g. knowing that a compacted topic and how it gets compacted is useful; knowing performance issues when it comes to “commitSync()” after every reading a message — I was at a client that did a commitSync() after EVERY consumed message. This is a sign that Kafka is not well understood.
  3. If you use an additional framework be sure you know what is part of that framework vs what is Apache Kafka (e.g. KafkaTemplate is Spring Kafka not Kafka). — ok, maybe this is just a pet peeve of mine.”

Along with the Confluent forum and slack group, you can also find many members of the Kafka community on Twitter, LinkedIn, and other social media outlets. Not only can you find help, inspiration, and potential job leads, but you can also make some great friends. This, far from complete, Twitter list can help you get started meeting some amazing people.

Practical Tips

When it comes to the actual process of looking for a job, there’s no substitute for good old-fashioned shoe leather. The more doors you knock on, the better your chances. I know that most shoes are soled with rubber these days, and you rarely walk or knock on doors when applying for jobs, but you get my meaning. Apply for as many opportunities as you can find. Even the ones that don’t turn out to be what you were looking for will provide you with valuable practice. Job hunting, and more specifically, interviewing, is a skill, and like all skills, it requires practice.

Another important point about interviews is to take careful notes of any technical questions that stumped you. Research these areas and, if possible, include what you learn in one of your projects. Remember the value of context.

As far as where to look, aside from your personal network, Indeed.com is a good place to start. As of the time of this writing, they are showing over 14,000 job openings that include Kafka as a desired skill. Though not quite as large of a database, Dice.com has also yielded good results for me in the past. Both of these sites allow you to post your CV and specify the types of opportunities you are looking for but don’t stop at that. Take the initiative to reach out to recruiters and potential employers.

Enjoy the journey

We’ve talked about a lot of things that you can do, and it may seem a bit overwhelming, but don’t look at it as a checklist, but rather a menu of opportunities for a fun adventure. Learning, experimenting, networking, blogging, presenting, and even interviewing can all be a lot of fun. There will be problems to solve, challenges to overcome, people to meet, and milestones to pass along the way. So enjoy the journey, and keep us posted. We’re rooting for you!

No comments: