Integrating Kafka into a large enterprise can feel like a long journey. It’s full of wonders but it’s also full of hidden dangers for the unaware traveler! Do you know where you stand in your Apache Kafka journey? Have you just finished your first PoC or are you just one step away from a fully integrated Apache Kafka self-service? It’s time for something where you can map your own journey, see where you are, and not get lost (or have different expectations than your stakeholders).
written by Lukas Zaugg
When we did our first integration project at a customer site years back, we were rather naive about what it means to integrate streaming data into an enterprise world. We took the idea of central nervous system events to heart and wanted to help our customers build it. We were surprised to find out the challenges we had to solve were less technical and more of an organisational nature. Similar to the challenges when adopting a DevOps culture.
Since then we have observed this pattern in nearly every enterprise we’ve worked or talked with. To help us and our customers we condensed these experiences into a model we call the Apache Kafka Pyramid. Similar to the Maslov Pyramid it depicts a linear journey through different stages of needs that must be solved before being able to achieve self-fulfillment.
The Model aka Apache Kafka Pyramid
When you integrate Kafka in your enterprise you will observe certain layers of challenges that we model as a Maslov’s pyramid below.
The maturity levels range from physiological needs or stable runtime to self-actualization or global impact. The technology and thus Apache Kafka, in this particular case, play a less and less important role towards the top. Let’s dive in together and determine the main challenges we observe and how we can usually get by them.
Physiological Needs — Apache Kafka in Production
Bring your first use case to production and maintain it
Let’s start with the basics. At the bottom of the Apache Kafka Pyramid, there’s the need for being able to run Apache Kafka in production. That includes having enough resources allocated (like memory, CPU, disk) for your first production-grade Apache Kafka Cluster. But it also means having enough technical skills to push it through to production.
There are different reasons to enter the Kafka domain, but eventually, you want to solve a technical (probably architectural) challenge pulled by business needs. It can be the implementation of event-driven-systems, -messaging or even -architecture.
In this stage, those people trying to solve a business case in a DevOps manner are the same ones taking care of Apache Kafka. There is very often not yet a need to handle advanced security and governance because access is granted only for a small number of people working on the business case (from development to production).
If you are running mission-critical applications in production with Kafka, you have already mastered this level.
Safety Needs — Apache Kafka Secured
Agree to rules for peaceful living next to each other
Because the implemented business case in production shows off the technical possibilities of the so-bright-future, the existing Apache Kafka cluster will get attention from early-adopters inside your organisation. If there’s no attention, there are probably other things to fix first (won’t go into details here).
With attention comes the motivation for other early-adopter teams to use Apache Kafka and the need for opening clusters to a broader audience is given. Guidelines and rules for a peaceful (and governed) living environment are needed. Questions like allowing the creation of topics with 10,000 partitions or retention time of -1 (indefinitely) have to be discussed.
Some rules were already in place during “Apache Kafka in production.” But there are tons more, probably also depending which stakeholder role you’re asking. Certain rules also need to be enforced and not just monitored, like being able to identify consumers or producers. It’s important that applications are isolated from the negative impacts of other applications’ misuse or misbehaviour in an Apache Kafka cluster and that tradeoffs between convenience and safety are taken into account.
As a consequence, moving Apache Kafka out of the business value stream towards the infrastructure is inevitable and becomes Kafka-as-a-Service (with a dedicated so-called Kafka Team).
To fulfill the need of security, governance and operational stability, the security mechanisms and certain restrictions have to be applied to Apache Kafka (for consumers and producers) and it’s Self-Service (for users like data engineers). The existing security architecture often misses some pieces for such data pipes (same for any pub/sub tech) and an update might be needed as well.
On this level, Apache Kafka is monitored on its own (Kafka-as-a-Service) and it has its own expectations and obligations (SLAs) to producers or consumers. Owners of those producers or consumers are still focused on their applications.
Securing Apache Kafka brings the support of the (security) architects, system operators, teams already using Kafka and people responsible for governance. Chaos can be prevented and risks of data leakage or misuse are mitigated (GDPR included).
Using an Apache Kafka service from a Cloud provider (like Confluent Cloud, AWS EKS, Aiven Apache Kafka) can drastically reduce the time to bring business cases to production. But using Cloud services requires you to have an enterprise Cloud strategy in place where an Apache Kafka Cloud service fits in. Understanding Apache Kafka from an architectural view is still a requirement, no matter if you have Apache Kafka On-Prem or in the Cloud.
Belongingness & Love Needs — Topic Love
Get love from your Kafka family and take part in other’s journey
The disadvantage of having Apache Kafka secured is that the use per se is no longer that easy and the responsibilities and processes are not clear for outsiders (To whom does this topic belong?, How can I get a new topic? or How do I get access to this topic?).
Certificates, roles, and accounts have to be created or assigned and credentials need lifecycle maintenance. The great pace of the Kafka journey in your enterprise has slowed down and the motivation for other teams and new applications to move to Apache Kafka has decreased because of rules, paperwork, people work, lack of information, and the missing support for the technology from non-early adopters (probably the majority) within the company.
Up to now, the Kafka-as-a-Service isn’t built around curation but governance. It’s built with rules in mind, not humans. The gap between the previous stage and this one is what we call the Apache Kafka chasm (something for another blog post — stay tuned).
So, what can you do to support further adoption? To be honest — it’s primarily an organisational topic.
Additionally, to embrace Kafka adoption further on the people level, you can make sure teams can act on their own with their pace and their level — of course still with guardrails. Support curation and foster living in Apache Kafka families, with different motivations. There are applications which process millions of data records in real-time or applications for training ML algorithms. Both can live in the Apache Kafka ecosystem but have different requirements from a Self-Service and lifecycle management perspective which need to be taken into account.
That includes convenient ways to manage topics without stepping on anyone’s toes, management of topic ownership, motivation for sharing and transparency about the transport and data, etc.
And keep in mind if these motivations, ideas and solutions are not shared constantly, then it’s like bringing sand to the beach.
Esteem Needs — Data-as-a-Product
Be confident and proud and lead others to do so as well
Now that there’s an awesome Apache Kafka Self-Service in place. A lot of effort was put into Apache Kafka enabling. But still, the data culture hasn’t taken over yet. Providing data is still seen as a necessity and not as a chance to prove that provided data comes with the exact quality your customers have requested.
Expectations can only be kept when owners are taking care of the data and therefore proud of what they do. Customers (owners of consumer applications) need confidence in the provided data and will request things like data SLAs or other obligations. You get confidence through proof and you get proof through transparency.
Transparency means seeing what this Kafka topic is all about in terms of stream quality (availability, latency, throughput, retention time, general configuration…), data quality (attribute, row, set or domain integrity — without getting into details here) and lifecycle management (ownership, schema change policy, versioning, support…).
Transparency also means to know more about consumers. An agreement always consists of expectations and obligations from both parties (producer and consumer) and that’s also true for consumers in regards of behaviour (like keeping up with the throughput, batch sizes) but also being aware of the (lifecycle) costs on the producing side if you consume a high availability, high throughput, low latency stream with high data quality. On the other hand, a consumer without any expectations on data quality has no coupling on data at all and is therefore much less expensive to serve than a consumer with strict expectations about a schema (from a producing point of view).
At this level, one can speak of Data-as-a-Product where the main focus is no longer the application and topic, but the data itself. Automation and monitoring are in place along with the data flow (where the data value pulls; SLO and SLAs on data). The governance has moved from protecting the infrastructure towards protecting the data and therefore data security (and encryption) is getting more attention again.
At this stage, Apache Kafka gets company from other technologies and data serving patterns (e.g. S3 or Hadoop serving static/batched data sets). Apache Kafka is currently one of the best bets when providing streaming and data sourcing endpoints in the Data as a Product world, targeting Data Engineers. Other technologies suited for that manner are e.g. AWS Kinesis, Google Pub/Sub, Apache RocketMQ or Apache Pulsar.
Self-actualisation — Data Ecosystem
Take part in global discussions for a better world
When an enterprise reaches the Data-as-a-Product level with Apache Kafka (or any other streaming technology in this manner), it already knows that there’s more, much more, to tackle. Does your enterprise have subsidiaries? Do you need to get data from a partner company? For subsidiaries, it’s probably solved because the enterprise can define the ruleset and security requirements. But for data exchange between enterprises? Did you go through the whole Apache Kafka journey just to find out that it’s still much easier to use the File Transfer Protocol (FTP) and files with comma-separated values (CSV) to transfer data between those parties (and used a shared secret to access the server)? This disillusionment probably comes already at level “Topic Love” but it got there through a much smaller context. Providing CSVs through FTP can also be a Data-as-a-Product, so no worries. But here we’re talking about Apache Kafka and therefore about streaming data. Which is a must, eventually.
This means you need to solve hybrid setups and cross-enterprise (or cloud) data exchange. That’s only possible by agreeing to the way transparency, governance, and data exchange are handled. So, basically the agreement on metadata definitions. That’s possible by using a third-party service or using global standards (or wait for them being developed, e.g. for access management in a global ecosystem for data). Agreements (SLA) between producers and consumers are recorded in contracts.
The organisational challenge becomes a global one. It looks like this stage is really similar to the third one (Topic Love), but on a much bigger scale. Instead of sharing the how-tos and best practices within your company, exchanging them in groups of equally footed organisations is the best route for future collaboration.
Whereas producers and consumers on the lower layer are clearly identifiable as an Apache Kafka producer or consumer, a producer or consumer in a Data as a Product world is no longer the technical implementation, but an organisational abstraction (like a team) and at the top of the pyramid, the organisation abstraction might be the organisation itself (think of a contract between two organisations).
When you integrate Kafka into your enterprise, you will witness different layers of challenges that resemble the layers of Maslow’s pyramid. The order of layers is linear, that means in order to reach the full maturation stage you can’t skip any previous level.
Depending on your use-cases certain layers will be stretched, others may be less relevant. If something’s not working, it might be that additional groundwork has to be done in layers below.
How far up do you want to get in your own Apache Kafka Pyramid and why do you want to get there?
Please send us a message if you have a story to share of your own Apache Kafka journey or are interested in mapping the Apache Kafka Pyramid via www.agoora.com, through Twitter, or just simply leave us a comment here.