Monday, January 30, 2023

How To Get Started As an Apache Kafka Developer

Getting Started as an Apache Kafka Developer

Intro – why then how

So, you’ve decided to pursue a career developing applications with Apache Kafka®. You’ve made a wise decision. Kafka is used by over 80% of Fortune 500 companies, and Kafka development ranks as one of the highest-paying job skills in IT. And besides all the practical stuff, it’s a lot of fun to work with!

Now you’re probably wondering what’s the best way to get started in this new career. In this article, we’ll discuss some tips, strategies, and resources to get you off to a great start on this exciting journey.

Developer vs. administrator

First, you should be clear about what type of Kafka work you’d like to pursue. There are a variety of roles in this space, but they mainly fall into one of two camps: developer or administrator. Since I’m a developer, that will be the main focus of this article, but if you lean more toward the administrator or operator role, you may still be able to glean some insights.

For developers, there’s a range of opportunities, such as event-driven microservices, real-time analytics, data pipelines, stream processing, and more. You can get a good feel for some of the ways to use Kafka by perusing these use cases.

Once you have an idea of the type of work you’d like to do, it’s time to drill into more specifics, like programming languages, industries, and types of companies.

Lost in translation

When we talk about software development, we tend to think in terms of the language we’re most familiar with, but there is Kafka client support for many languages so I want to be careful not to make assumptions. I mean, you’re probably programming in Haskell, but you might not be 🙂. So we need to consider language options and opportunities. The client libraries that ship with Kafka are in Java and will work with most JVM languages, but there are also very good libraries available for Python, C/C++, .NET, JavaScript, Go, and others.

The level of support and popularity of these different Kafka clients varies, with Java being the most popular and having the strongest support, both from Confluent and the community (e.g., external libraries, books, tutorials). So, if you don’t already have a preference, Java might be the way to go. A related consideration is who, in your area, is hiring and what language(s) are they using. A search on a job site, like Indeed or Dice, can be helpful there.

Learning resources

Once you’ve decided on a language to focus on, it’s time to start filling those knowledge gaps. Don’t be discouraged by this step. We all have knowledge gaps, and filling them can be very rewarding. First, you’ll want to make sure you have a good understanding of Kafka basics. Fortunately, there are many resources to help you with this. A web search will turn up many great books and other resources. And, of course, Confluent Developer offers interactive courses ranging from introductory (Apache Kafka 101) to advanced (Kafka Internals), full documentation, and other content to help you get started.

The Kafka ecosystem

The Kafka ecosystem is continually growing, but there are some key components that anyone looking to work with Kafka should be familiar with.

Clients

We’ve already mentioned the clients. In whichever language you choose, there will be some code available for producing data to Kafka, consuming data from Kafka, and administering resources and configurations on Kafka brokers. Developers should be proficient with these. Aspiring administrators should be familiar with them, as they will affect how users deal with your Kafka cluster.

Kafka Connect

Kafka Connect is a framework built on top of the Kafka Producer and Consumer client libraries. It allows you to integrate external systems, such as databases, analytics engines, and SaaS applications with Kakfa using plugins, called connectors, and some configuration. There are hundreds of connectors available, but the best source of vetted connectors is the Confluent Hub. There you will find over 200 connectors with all the information you need to use them.

Kafka Streams

Kafka Streams is a Java library that provides powerful APIs and an easy-to-use DSL for building stateless and stateful event streaming applications. Kafka Streams can only be used with JVM languages, but even developers not working with a JVM language would benefit from understanding how it works and the role it plays. There are similar libraries in other languages, and an understanding of Kafka Streams will help you evaluate them.

Schema Registry

Kafka producers and consumers execute separately and don’t know anything about each other, but they do need to know about and agree upon the format of the data they are working with. This is where schemas come into play, and if you’re using schemas with Kafka, you really should be using the Confluent Schema Registry. Schema Registry provides a way to store and retrieve schemas, as well as a means of versioning your schemas as they evolve. It works well with most Kafka client libraries and can also be accessed directly via HTTP. You can learn more about it with the Schema Registry 101 course on Confluent Developer.

ksqlDB

As we discussed earlier, Kafka Streams is only available on the JVM, but there is another powerful tool for creating event-streaming applications with Kafka, and that is ksqlDB. ksqlDB is an application that runs in its own cluster that allows you to build streaming applications with SQL. ksqlDB supports filtering, aggregation, transformation, and joining of event streams and tables based on Kafka topics. It’s REST API allows you to interact with those applications from just about any programming language. For more information check out ksqlDB 101 or Inside ksqlDB

Command Line and Graphical interfaces

There are many ways to work with Kafka interactively, ranging from the shell scripts that come with Kafka to extensive graphical user interfaces. There isn’t space here to cover them all, but here are a few that are worth checking out.

Command Line

Confluent CLI - https://docs.confluent.io/confluent-cli/current/overview.html

kcat (formerly KafkaCat) - https://github.com/edenhill/kcat

kcctl (CLI for Kafka Connect) - https://github.com/kcctl/kcctl

GUI

Confluent Control Center - https://docs.confluent.io/platform/current/control-center/index.html

Conduktor - https://www.conduktor.io

Kafdrop - https://github.com/obsidiandynamics/kafdrop

akHQ - https://github.com/tchiotludo/akhq

Strategy

That’s a lot to learn, and it might seem overwhelming, so here’s some advice for how to build your knowledge without burning out.

Warming up

Before jumping into building Kafka applications, depending on your level of experience, it will be helpful to warm up with some introductory material, for example, the courses on Confluent Developer, such as Kafka 101, Kafka Connect 101, and Schema Registry 101. These courses have both video and text content that provide you with a gentle introduction to these technologies. They even include exercises that allow you to get hands-on.

Working out

A great next step is to get more hands-on experience with Kafka using quick-start guides and tutorials, where a problem is presented, and you can use Kafka and your favorite language to solve it. Confluent Developer provides plenty of these. Most of the tutorials are based on Java, but there are also getting-started exercises available for Python, .Net, Go, and other languages.

Building a project

Now that you’ve got some experience working with and solving problems with Kafka, you can continue and accelerate your learning journey by building a project. It can be something for work, a side project, or even a cool demo idea that you can show off at a meetup (more about that in the next section). But whatever project you choose, it should be end-to-end so that you get the broadest possible experience from it.

The reason that a complete project is so much more helpful than exercises or tutorials is that the problems you are trying to solve are your own. This will provide a much stronger context for what you are learning. Context is like a hook on which to hang knowledge. Things learned without context tend to fade quickly. Just ask someone who’s been cramming for an exam.

Another benefit of building a project with Kafka is that you can use it to show what you know by hosting your project in a GitHub repository. One of the advantages that we have in the technology space is that it is much easier to show prospective employers what we are capable of, and one of the great ways to do this is via source code repositories.

Contributing to Apache Kafka

There are many opportunities for involvement in the open source Apache Kafka project. This is a great way to learn as well as to put your learning into practice. Check out the official contributors' guide or watch this video from Kafka Summit 2020 for more information and inspiration.

Certification

Another way to demonstrate your knowledge is certification. While not a silver bullet, when used in conjunction with examples of actual code you’ve written, certification can provide an extra level of comfort to prospective employers. In truth, the more valuable aspect of getting certified is the incentive it provides for learning. Confluent provides certificates for both developers and administrators, along with suggestions for how to prepare for the exams.

Show your work

Chronicling your learning journey by way of a blog can help you to learn faster by better organizing your thoughts. It can also help you to get valuable feedback from those who read it. And it’s a great way to show prospective employers what you’ve learned and your ability to communicate it.

Don’t go it alone

As you go about building your first project or preparing for your certification exam, you will undoubtedly run into questions. You can find some good answers on Google or Stackoverflow, but there is a large, active, and helpful community of developers, administrators, and just all-around good people in the Kafka community.

Getting involved in the community will make your learning journey more enjoyable and more productive. It will also provide you with invaluable networking opportunities that can lead to your first job as a Kafka developer.

The Confluent Developer Forum and Confluent Community Slack are two great places to introduce yourself to the community, ask questions, and learn from reading others' questions and answers. You can also subscribe to the Apache Kafka developer and user mailing lists.

But the community is about more than just learning. It’s also about helping, inspiring, collaborating, and building each other up. So, don’t just lurk, or post a question when you’re stuck. Get involved. Try to answer some questions, tell us about what you are working on, or the cool new thing you learned.

Besides the online text venues, another great way to get to know people in the community is at meetups and conferences. The Confluent meetup hub will show you what meetups are coming up. There may be one in your area, or you can join one of the many online meetups from anywhere. As you learn, you may even consider presenting at a meetup. Most of these are recorded, and a link to your presentation could be a great addition to your CV.

Not only do members of the community help each other with technical issues, they often have valuable career advice. Here are some tips from a few Community Catalysts.

Olena Kutsenko is a Senior Developer Advocate.

“Starting to work with a new technology is often tough, especially if it is as complex as Apache Kafka. Here is a list of things that I'd recommend to those who want to become an Apache Kafka developer:

  1. Don't be afraid to ask for help. The community around Apache Kafka is knowledgeable and very friendly. So if after reading the docs and trying different approaches, you're still struggling to find a working solution for your task, don't hesitate to ask a question. Who knows, maybe the perfect solution is not so far from what you have!
  2. Don't get frustrated if the learning process is not as quick as you hoped. The Apache Kafka ecosystem is very wide and it is normal that a single person won't know all the nuances of the system. In fact, it is better to accept the fact that the learning process is never over! That's why it is so vital for us to share knowledge with each other (See the next point ;) ).
  3. Share what you learned with others. This is the best way to solidify your knowledge and get different perspectives.”

Robert Zych is a Data Platform Engineer.

“My 1st tip for anyone (regardless of programming background) interested in using Kafka is to understand when it should and shouldn’t be used. My 2nd tip would be to get/develop experience with a JVM language such as Kotlin, Java, or Scala (in that order). My 3rd tip would be to take courses and CCDAK (Confluent Certified Developer for Apache Kafka). And my 4th tip would be to experiment and build something with Kafka (preferably with helpful friends like you and Neil 🙂)”

Neil Buesing is a Kafka and Kafka Streams expert. He shares a few tips from the perspective of a hiring manager.

  1. “Show interest in Kafka’s improvements (e.g. up to date on some KIPs) seeing someone being current sets them apart from others in that they can be seen as someone that “enjoys Kafka” vs just using Kafka.
  2. Operational Knowledge - I don’t need a developer to know how to manage a cluster, but understanding operational aspects is helpful; e.g. knowing that a compacted topic and how it gets compacted is useful; knowing performance issues when it comes to “commitSync()” after every reading a message — I was at a client that did a commitSync() after EVERY consumed message. This is a sign that Kafka is not well understood.
  3. If you use an additional framework be sure you know what is part of that framework vs what is Apache Kafka (e.g. KafkaTemplate is Spring Kafka not Kafka). — ok, maybe this is just a pet peeve of mine.”

Along with the Confluent forum and slack group, you can also find many members of the Kafka community on Twitter, LinkedIn, and other social media outlets. Not only can you find help, inspiration, and potential job leads, but you can also make some great friends. This, far from complete, Twitter list can help you get started meeting some amazing people.

Practical Tips

When it comes to the actual process of looking for a job, there’s no substitute for good old-fashioned shoe leather. The more doors you knock on, the better your chances. I know that most shoes are soled with rubber these days, and you rarely walk or knock on doors when applying for jobs, but you get my meaning. Apply for as many opportunities as you can find. Even the ones that don’t turn out to be what you were looking for will provide you with valuable practice. Job hunting, and more specifically, interviewing, is a skill, and like all skills, it requires practice.

Another important point about interviews is to take careful notes of any technical questions that stumped you. Research these areas and, if possible, include what you learn in one of your projects. Remember the value of context.

As far as where to look, aside from your personal network, Indeed.com is a good place to start. As of the time of this writing, they are showing over 14,000 job openings that include Kafka as a desired skill. Though not quite as large of a database, Dice.com has also yielded good results for me in the past. Both of these sites allow you to post your CV and specify the types of opportunities you are looking for but don’t stop at that. Take the initiative to reach out to recruiters and potential employers.

Enjoy the journey

We’ve talked about a lot of things that you can do, and it may seem a bit overwhelming, but don’t look at it as a checklist, but rather a menu of opportunities for a fun adventure. Learning, experimenting, networking, blogging, presenting, and even interviewing can all be a lot of fun. There will be problems to solve, challenges to overcome, people to meet, and milestones to pass along the way. So enjoy the journey, and keep us posted. We’re rooting for you!

Friday, September 16, 2022

A Tech Conference at a Food Truck Park?!

Last week I had the pleasure of speaking at PyBay Food Truck Edition in San Francisco. It wasn't the first time I'd presented in a tent, but it was the first time doing so in a park surrounded by food trucks! 


It was a great event! The organizers and an army of volunteers took care of everything so attendees, speakers, and sponsors could enjoy the time of learning, networking, and eating! 

My presentation on building event-driven microservices with Python and Kafka was well-received, with some great follow-up questions. 



Another cool thing about this event was that we were also sponsors, which gave my co-worker Danica Fine and me opportunities to chat with dozens of developers and data engineers and hand out some cool swag!



Sadly, I couldn't attend sessions until near the end, when visitors to our table began to slow down. At that point was able to catch an excellent talk on the observer pattern by Aly Sivji.  

I was also hoping to catch the testing panel that Andy Knight, Brian Okken, and others were holding, but I had to miss it. At least I was able to get in on a selfie with Andy



The weather was perfect for an outdoor event in San Francisco. The people, as seems to be the norm in the Python community, were friendly and eager to learn and to help others learn. Overall, it was a great event, and while eleven hours on my feet was a bit rough, I am so grateful for the opportunity and am already looking forward to PyBay 2023!



Wednesday, August 17, 2022

The Joy of Community, an Anecdote

I am very grateful for the amazing people that I've met in the Apache Kafka community. Whether online or in person, my interactions with others in this community have always been rewarding. One of the people that I met through this community is Robert Zych. I "met" Robert at an online Kafka Meetup that we were both attending. Robert was one of the first to turn on his camera and get involved in the discussion. He was very interested in learning more about Kafka, and others in the community, such as Neil Buesing, Matthias Sax, and Anna McDonald, to name a few, were eager to help him. 

As time went on, I was able to get to know Robert better. We even worked together on an interesting side project that he came up with. And Robert became more involved in the community. He went on to write a guest blog post for the Confluent blog, and was even selected as a Confluent Community Catalyst! Then a while back, Robert asked me for my mailing address. About a week later, this lovely tie showed up in the mail.
 
Now, I don't wear ties very often, but I will probably make an exception for this one. Some of you might notice the avocado pattern on this tie, and might tie it to the fact that I am a developer advocate at Confluent. But that's only because you don't know the rest of the story. 

 Frequently, at Kafka meetups, we'll start out by asking people where they are connecting from. In the before times, meetups were location-based, but now people join from all over the world. (I like to attend meetups from Australia, so I can find out how tomorrow is looking.) At one of these events, Robert mentioned that he is in Sacramento, California. I grew up in Sacramento and still have family there, so when I mentioned to Robert that I was going to attend a nephew's wedding, he invited me to lunch on the afternoon before the event. He lived nearby the wedding venue, so it worked out great. 

We had a great time meeting in person and catching up. Now I was already dressed for the wedding in slacks and a white dress shirt. No tie, because, well, I don't wear ties. So, I was being very careful, even ordering something that I knew was not likely to drip and make a mess of my white shirt. However, Robert had ordered some delicious-looking guacamole and chips, and I just had to try some.

Of course, on my very first attempt, a blob of said delicious-looking guac fell right on my shirt! There was a small, but very noticeable green stain that I couldn't get rid of, and there was only a couple of hours until the wedding. Fortunately the stain was right in the center of the shirt, where a tie would normally go. So, Robert invited me over to his house, where I was able to not only meet his lovely family, but borrow a tie that perfectly covered the guacamole stain! When I received the package in the mail from Robert, and saw the pattern of the tie, I almost fell over laughing. The double meaning was obvious to me, and now it is to you too, because now you know... the rest of the story.


* My sincere apologies to the memory of Paul Harvey.

Monday, August 3, 2020

Using Kafka Streams Interactive Queries to Peek Inside of your KTables

Recently we found a bug in one of our Kafka Streams applications, and as I was looking into it, I found that we had a
Stream -> Table left join that was failing. This didn't make sense, as every indication was that the data, with the correct key, should have been in the KTable at the time that the join was attempted.

So, I set out to verify that. It was easy to see what was in the stream, but I was struggling to figure out how to see what was in the table. Using Kafkacat, I could see that the data was in the underlying topic, but I needed to see that in context with the KTable at runtime.

That's when I turned to the helpful geniuses on the Confluent Community Slack group. On there, someone suggested that I use an interactive query.

Now, to some, this might be a no-brainer, but I am still somewhat new to Kafka and had never used interactive queries. But there's a first time for everything, so I dug into it.

I guess I shouldn’t have been surprised by how easy it was, but Kafka Streams never ceases to amaze me. The following bit of code is all it took to give me a view inside my KTable:

Let's walk through this code.

(Line 3) First off I needed to get a hold of the KafkaStreams instance, in order to access the state store.

Since the bit of topology that I’m working on is in a different Java class from the one where the stream is created and launched, I have to make a call to get it.

(Line 4) To access the state store, I needed its name, so I called queryableStoreName() on the KTable.

(Line 5) Now I can get a hold of the state store itself, in the form of a ReadOnlyKeyValueStore, using KafkaStream's store() method.

(Line 6) To see all of the records in the store, I used a KeyValueIterator that is returned from the store.all() method.

(Line 7-10) For each record, I print the key and value, and then, on line 11, I close the state store.

I bundled that all up in a handy method called queryKTableStore().

Now I was able to add a peek() statement, calling this method, to my topology, right before the leftJoin that was failing.

That gave me output like this: key: 10001 Widget: {id:10001, name: Winding Widget, price: 299.95}
key: 10002 Widget: {id:10002, name: Whining Widget, price: 199.95}
key: 10003 Widget: {id:10003, name: Wonkey Widget, price: 499.95}

And of course, the key I was trying to join on, 10004, was not in the store, which means that it was not in the KTable. I added another peek() call after the failed join attempt, and now the output was more like this:

key: 10001 Widget: {id:10001, name: Winding Widget, price: 299.95}
key: 10002 Widget: {id:10002, name: Whining Widget, price: 199.95}
key: 10003 Widget: {id:10003, name: Wonkey Widget, price: 499.95}
key: 10004 Widget: {id:10004, name: Wonder Widget, price: 999.95}

Now it's there! Mystery solved! I have a timing problem on my hands... which is another mystery, but one for a different post. For now, I just wanted to point out this simple and powerful feature of Kafka Streams.

Before leaving I also wanted to point out that that the ReadOnlyKeyValueStore is limited to one application instance. In my case, running locally, I only had one instance, but in a distributed environment, things could get more complicated. Also, ReadOnlyKeyValueStore has another method for accessing data by key, if you already know the key. store.get(key) will return the value if it exists for that key. Of course there is more you can do and you can learn more about it in the Developer's Guide

Wednesday, May 27, 2020

Online Meetups Are Different, But Still A Valuable Resource

I’ve always been a fan of user groups, which are now mostly known as meetups.  In the past I’ve led Java and Groovy user groups, and they were always rewarding experiences.

More recently, I’ve been helping to organize the St. Louis Apache Kafka Meetup, hosted by Confluent.  We only had one in-person meeting before the COVID-19 rules came into play, and we had to convert subsequent meetings to online.

I was pretty bummed, thinking that online meetups would just be like webinars, or maybe recorded conference videos, which are great, but nothing like a live event. Now, after attending over a dozen online meetups around the world, I have to say I was pleasantly surprised.

The Confluent Community team does an excellent job of running these Zoom meetups. At the start of the meetup, everyone can be unmuted, so there is a great time of networking and catching up with friends, old and new. Then the host mutes the audience and the presenter gets started. During the presentation, attendees ask questions in the chat. Some presenters will pause to answer questions along the way, others will answer them at the end, but I have yet to see a question go ignored.

After the presentation, the host allows attendees to unmute again, and the discussions are just what you’d expect with an in-person meetup, except the participants might be in another country!

Another bonus for Confluent’s online meetups is that they are recorded.  You can watch videos from over twenty meetups from the past few months, on the Confluent Meetup Hub.  This site is a treasure trove, not just for the recordings, but because it also shows you which meetups are coming up, so you can join them live.

When you do join one of these online meetups, and I’m sure you will, you should consider turning on your video, if possible, and introducing yourself.  The Apache Kafka community is made up of some of the friendliest people I’ve ever met, so I can guarantee that you will be welcomed!  If you continue attending meetups in a particular time zone, you will get to know the regulars and even become one yourself.

So, while I still miss the in-person meetups, and I am looking forward to them returning, I am very grateful for the online meetups as well, and at the risk of being greedy, I am hoping, in the future, that we can have both!

And, speaking of in-person meetups, there are Apache Kafka meetups all over the world.  Find the one closest to you at https://www.confluent.io/community.  I would encourage you to join one (or more) of the meetup groups, so that you will hear when in-person meetups are beginning again.

Wednesday, March 4, 2020

Saint Louis Apache Kafka® Meetup by Confluent

One of the many ways that Confluent supports the developer community is by hosting Meetups around the world. For example, they just had Tim Berglund out in Paris (poor guy) for what looks like a great event!

They also host a meetup right here in St. Louis, and they've given me the great privilage of helping to organize it. 

All that to say this: 

Save the date!  On Tuesday, March 24th, we will have two great presenters at the Saint Louis Apache Kafka Meetup!

Mitch Henderson, a Technical Account Manager par excellence with Confluent, will talk about how to make our Kafka installations fault-tolerant, even to the datacenter level.

After that, Neil Buesing, the Director of Real-time Data at Object Partners, will show us how to build a web application using Kafka and Kafka Streams as our database. Prepare to have your mind blown on this one!

This is going to be a packed meeting, but we'll have plenty of pizza and soft drinks on hand in case it runs long.  So, if you're in the St. Louis area, or can get to the St. Louis area (you know you've always wanted to visit), please plan on joining us March 24th, at 6pm.  All of the details can be found on our Meetup page.

Oh, and you can follow us on Twitter too.

Wednesday, February 12, 2020

Confluent KSQL Workshop in Saint Louis

Recently Confluent and World Wide Technology held a hands-on workshop on Stream Processing with KSQL. Nick Dearden, from Confluent, led the training and did an excellent job.  He gave a very clear introduction to the problem space, and the role that KSQL plays.

Then we launched into the hands-on lab.  Wow!  I have never been to a hands-on that was so smooth.

There were 50 students in the rooms and each of us had a pre-assigned AWS user account, with which we could ssh into a server running KSQL and MySQL.  There was a data generator running, I believe using the Kafka Connect Datagen connector (though I  could be wrong on that).  So, everything was ready to go and we were all working through the exercises within minutes.

Along the way, if anyone got stuck, Brian Likosar and Cliff Gilmore were on hand to help out.  From what I could see, nobody was stuck for long.

The exercises were simple, yet detailed enough to show some of the cool features of KSQL. I had seen several video demos of KSQL before, but this was my first time trying it out.  It was pretty fun.

For me the highlight—beyond just being in a room with so much Kafka brainpower—was when we ran  explain on one of the queries we had written, and lo and behold, there's the KStream topology!  I guess I should have figured this, but it was still cool to see.  KSQL is basically a really slick Kafka Streams app.

So, the workshop was fun and informative, and KSQL is a pretty powerful tool, especially for those who are not living in the JVM. But the real take-away, for me, was that the Kafka Streams API is amazing!