Sitecore Integration With Kafka ~ Enlighten with Amit

In today’s world there are large number of web properties present in each domain and every web property trying to impress end-users/customers and working on their brand value to become market leader. In any of the case customers are main interaction points.

The end-users/customers will come to your web property to find latest information, and if you are providing irrelevant and non-customer centric data then user will not interact with your web property and will never return.

You can force Customers to return if you are providing personalized experiences based on their journey and serving the content which are updated and not obsolete.

Now a days most of the sites rely on the real time data in terms of feed from third party resources and for this most of the organizations dependent upon scheduled batch jobs because they are not aware about that when latest data will be available.

To get the real time data into the web property we can think of the use of Distributed Messaging System and Event Streaming Engine/Platform called Kafka to capture the data in real-time from many systems who generates the events and those systems can be e.g. customer interactions from some web property, monitoring details, microservices, cloud services or send/receive data from applications.

The Sitecore Experience Platform is flexible enough to integrate with external systems to increase the capabilities of Sitecore usage to achieve the required Business Use.

In this article, we will explore:

What is Kafka?

Useful information about Kafka

Kafka Architecture

Use cases in which we can use Kafka with Sitecore

Secure Kafka Instance

Prerequisite to use the Kafka

How to integrate Sitecore with Kafka?

Sitecore Items

Sitecore Kafka GitHub Solution

What is Kafka?

In usual implementation we will be using Database or any other data sources to capture the response of any event and later we will retrieve the data from these data sources.

If we need to know on which event these responses coming, then it’s difficult to explain/identify without storage of events. An event will occur when some action performed on any object. In this case, it’s not feasible to store the event details with response of any event into data storage instead of that we can store into the form of logs.

In case of Kafka, its being stored in the form of log with state information and event description, which will indicate event orders. And with the help of these logs we can identify which event happened at specific time. These logs can also be scale as per need which not possible in terms of databases which require more resources.

In other words, Apache Kafka is an event streaming platform based upon Event-driven Architecture (EDA), where event is data which represent the change in state of data and called as Topics. The Apache Kafka streaming platform can stream events in real-time from one or more sources to one or more destinations.

The real-world examples of events can be a thing happening in the like user add items into the shopping cart, some system adding items into the database, some user modify contact details, etc. and these represents an event with change state and sending notification to system in the form of topics which contains information about specific event.

These topics stored in distributed systems (with replication) to provide high Availability so that data will be persist in the systems and serve from any of the serves in case of any failure.

Useful information about Kafka

1	Apache Kafka is an open-source event streaming platform
2	It provides extremely high throughput and low latency
3	Entity which store information about object state in Kafka called Topics
4	Topics can be stored and consume by more than one system
5	Topics can be stored for shorter or longer time duration e.g. few hours, or days, or years, or hundreds of years or indefinitely
6	Topics can also be relatively small, or they can be enormous. There is no hard and fast rule or best practices or architectural suggestions about size of topics in Kafka
7	Kafka provides High throughput, Horizontal Scalability, Inbuilt partitioning, Easy and quick replication, High fault tolerance
8	Kafka’s programming model is based on the publish-subscribe pattern
9	With Kafka, publishers send messages to topics, which are named logical channels. A subscriber to a topic receives all the messages published to the topic
10	Kafka based on event-driven architecture, Kafka is used as an event router, and the microservices publish and subscribe to the events
11	In Kaka Platform, producer and consumer are loosely coupled and rely only on topic of interest and message schema
12	Kafka is written in JAVA

Kafka Architecture

The Kafka is distributed environment based on Event-driven Architecture (EDA) and consists of a cluster of servers.

Key building block of Kafka Platform:

1. Kafka Clusters:

Kafka will run in the form of a cluster of one or more servers that can span multiple data centers or cloud regions

2. Brokers:

The servers in Kafka cluster which perform Storage layer role called the Broker. The Brokers can be scaled as per need or auto scale

3. Topic:

Entity which store information about object state in Kafka called Topics and they will be send to or from Kaka Platform in the form of messages. The topics can be stored into the Kafka Brokers for any time interval or in any size depending upon business need.

4. Producer:

The client applications which will append records into the Kafka topics. Multiple producers can log into single topic in the Kafka brokers

5. Consumer:

A consumer is one that subscribes to the Kafka brokers to receive the messages. The consumers will listen for messages from the specific topic

6. Partition:

Several partitions can be configured at the Kafka level. Each topic can be divided into multiple partitions. The partitions get distributed across multiple brokers. Based on the replication factor, the partitions also get replicated between the clusters. Among a partition and its replications, one of them acts as a “leader” and others act as “followers.” When the leader fails, one of the followers automatically steps up to be a leader. This ensures high fault tolerance and less down time.

7. Offset:

Each message persisted inside a partition is assigned to a number offset value. In each partition, the messages are ordered by the offset value and then stored. When the message is consumed by a consumer, it also gets the partition ID and offset value of the received message

Credit: New to Big Data? Start with Kafka, AWS Kinesis vs Kafka comparison: Which is right for you? , Event-Driven Architecture with Apache Kafka for .NET Developers Part 1 - Event Producer

In the above diagram:
o The Apache Zookeeper work as configuration store for Kafka which includes metadata of Kafka cluster processes. The Zookeeper component is very important and perform many important tasks such as Leader selection process, store list of Consumers, access control policies for topics, etc.

o The Kafka topic work on FIFO (First In First Out) queues.

o There are three brokers, three partitions, and the replication factor is 3. The leader partition is marked in green and the followers are marked in orange. The partition expanded to show how the messages are stored in a partition and indexed with an offset value.

There are four major APIs in Kafka, namely:

The Producer API: sends streams of data to topics in the Kafka cluster

The Consumer API: reads streams of data from topics in the Kafka cluster

The Streams API: transforms streams of data from input topics to output topics

The Connect API: implements connectors that consistently pulls from some source system or app into Kafka or push from Kafka into others

Use cases in which we can use Kafka with Sitecore:

The Sitecore Experience Platform is a Microsoft Stack based product and it can be extended to integrate with any type of system/api. From my point of view, we can utilize Kafka with Sitecore while implementing any enterprise level or large organization based system where you have more than one systems which rely on the same set of data or where you have to push/pull real-time data, so some of the use cases of Sitecore with Kafka can be:

o Push user PII data captured from Sitecore to downstream systems or other third-party systems via Kafka

o Read user PII data from Kafka and push to Sitecore Contacts for better user experience

o Read real-time feed from Kafka in terms of Articles and push records into Sitecore

o Real real-time offer from Kafka and display at real-time on the Sitecore based Web Property

o Stream real-time data with respect to user (identified) behavior/user analytics from Sitecore to Kafka to ingest real-time log and tracking data for analytics, dashboards, and machine learning

o Read real-time analytics or user 360-degree view or recommendation from data lake, and provide personalized Sitecore user experience

o To support mission critical Sitecore Applications such as Financial Domain or eCommerce Domain push the critical event logs, event sourcing to Kafka so that downstream systems can process the details and take immediate actions

o If Sitecore is single source of truth for Content in enterprise organization then Kafka Streams API can be used to store and distribute published content in real-time to the various applications and systems that make it available to the readers

o In your Sitecore Commerce application, Kafka can be used for financial transaction and Business processes in real-time, so that you can provide immediate response to user with low latency

o You can use Apache Kafka as a core component, if you are looking for near real-time use cases

As per my understanding, above use cases are most common and usually present in most of the Sitecore based implementation. It will be good to consider Sitecore integration with Kafka in future projects.

Secure Kafka Instance

The Kafka instance can provide many options out-of-the-box to secure the instance as well as content, and it’s part of approved tools list in most of the organizations.

o Channel based encryption

o Authentication

o Authorization

Credentials are sent across the wire, its recommended to use channel-based encryption.

Prerequisite to use the Kafka

Following are the prerequisite which I followed to use the Kafka:

Changes done at Network level to access the Kafka broker:

Ensure that the ports that are used by the Kafka server are not blocked by a firewall.

The Kafka brokers are generally exposed at public IP for outside users to be able to connect to them and they are associated with Private IPs for communicating internally, so you should submit a firewall request to allow your IDs/AD group( is preferred) to be able to connect to Kafka broker public IPs if you are connecting via Internal Network. You can get the Public IPs by pinging on the Fully qualified domain name (FQDN)/hostname.

To enable client authentication between the Kafka consumers and a Kafka brokers, a key and certificate for each broker and client in the cluster must be generated. The certificates also need to be signed by a certificate authority (CA).

Download required Kafka Certificates:

In Kafka we can implement Authentication mechanism using SSL so the Kafka Brokers and Client communicate with each other over the SSL.

During Kafka SSL setup, certificate generated for each machine in the cluster which have common name (CN) matched with the fully qualified domain name (FQDN) of the server. The client compares the CN with the DNS domain name to ensure that it is indeed connecting to the desired server, not a malicious one.

To connect with Kafka brokers, client required Kafka Broker Certificate in .pem extension. This certificate can be provided by the Kafka Admin and store this certificate in the shared location or location where your program have access.

Sasl Credentials:

For Kafka Secure Sockets Layer (SSL), and Simple Authentication and Security Layer (SASL) setup in configuration we mentioned the SASL credentials (SASL User Name and Password), and same credentials required by the client to connect with Kafka Broker.

We can integrate Sitecore with Kafka using .NET Connectors. I will be using Cloudera .NET client library for Apache Kafka called Cloudera.Kafka.

The Cloudera.Kafka assembly can be downloaded and installed through Visual Studio or a command line interface or from nuget.org.

Using Cloudera.Kafka, we can Push the data to Kafka system using ProducerBuilder and Pull using ConsumerBuilder.

Push data to Kafka:

For this, we have to create the ProducerConfig object which contains the required Connectivity Details:

Once ProducerConfig is ready we can push data to Kafka using ProducerBuilder:

Pull data from Kafka:

For this, we have to create the ConsumerConfig object which contains the required Connectivity Details:

Once ConsumerConfig is ready we can push data to Kafka using ConsumerBuilder:

Sitecore Items

In the above code blocks, we are using configuration values and dynamic values. For Configuration values we can create the template and use as Data Source Item for the Sitecore Component/Rendering, and dynamic values (message only) can be passed from the code.

Sitecore Kafka GitHub Solution

I have created the ASP.NET MVC based solution which will help to validate the Sitecore integration with Kafka.

Credit/References:

Kafka .NET Client	Build MVC Applications with Connectivity to Kafka Data	How To Use Apache Kafka In .NET Application
Step-By-Step Installation And Configuration Guide Of Apache Kafka On Windows Operating System	Using Apache Kafka with .NET	ASP.Net Core Streaming Application Using Kafka Part 1
How to use Apache Kafka messaging in .Net	Everything you need to know about Kafka in 10 minutes	What is an Event Streaming Platform?
Introduction to Event-Driven Architecture	New to Big Data? Start with Kafka	Event-Driven Architecture with Apache Kafka for .NET Developers Part 1 - Event Producer
Apache Kafka in a Nutshell	AWS Kinesis vs Kafka comparison: Which is right for you?	Kafka usecases
Apache Kafka and Sitecore Integration	Sitecore Integrations
Apache Kafka vs Sitecore Experience Manager	Apache Kafka a tool for streaming data into the cluster	First create a simple Kafka producer and a Kafka consumer

Pingback:

Compare Apache Kafka vs Sitecore Experience Manager	kafka tutorial	kafkaesque
kafka on the shore	kafka interview questions	kafka architecture
kafkaesque meaning	kafka streams	kafka vs rabbitmq
kafka download	kafka connect	kafka alternatives
kafka apache	kafka academy	kafka aws
kafka architecture diagram	kafka acl	kafka api
a kafkatemplate is required to support replies	a kafka story	a kafka trap
a kafka quote	kafka broker	kafka basics
kafka book	kafka bootstrap server	kafka batch processing
kafka big data	kafka best practices	kafka broker configuration
kafka b cannot be cast to java.lang.string	kafka consumer	kafka consumer group
kafka certification	kafka consumer java example	kafka cluster
kafka client	kafka connector	c kafka client
kafka c sharp	kafka c client example	ten c kafka
c sharp kafka consumer	kafka docker	kafka documentation
kafka default port	kafka download for windows	kafka docker image
kafka delete topic	kafka database	kafka
kafka event	kafka exporter	kafka exactly once
kafka example	kafka etl	kafka exporter for prometheus
kafka equivalent in aws	kafka event bus	kafka for windows
kafka framework	kafka for beginners	kafka for windows download
kafka fundamentals	kafka features	kafka metamorfoze
kafka metamorphosis	kafka metamorfoze analize	kafka process
the trial of kafka
Sitecore Hardware and software requirements	Sitecore Sitecore Sizing Calculations	Sitecore GraphQL Examples
How to use Sitecore SwitchOnRebuildSolrSearchIndex on Solr slave Indexes which are replicated from Master Index
Sitecore Solr Performance	solr in sitecore	sitecore solr search
sitecore solr index configuration	sitecore solr performance	solr in sitecore
sitecore solr search	sitecore solr index configuration	solr setup in sitecore 9
Start using Sitecore GraphQL API	Sitecore Helix Recommendation and Conventions - Helix 2.0	What’s new in Sitecore 10
Analysis for Sitecore Experience Accelerator (SXA) based website implementation	Secure Sitecore JSS Site	Sitecore Experience Accelerator (SXA)
Sitecore Graphql tutorial	Sitecore Performance Tuning	Sitecore GraphQl Examples
What is SXA Page Design?	Sitecore Installation and Upgrade Guides	Sitecore Upgrade Services
Upgrade to Sitecore 9	Sitecore Version 10	Sitecore 10 Upgrade
Sitecore Upgrade 9.0 to 9.3	Sitecore Upgrade Approach	sitecore content migration
How to upgrade Sitecore CD environment	Sitecore custom logs not working since upgrade	Sitecore upgrade from 8.2 to 9.3
Sitecore upgrade from 8.2 to 9.2	Sitecore upgrade to 9.3	Sitecore upgrade 9.0 to 9.2
Sitecore upgrade from 8.2 to 9.1	Sitecore upgrade from 8.2 to 10	Sitecore upgrade azure
What is a Sitecore upgrade	Sitecore content migration tools	Content migration in Sitecore
Sitecore update center	Sitecore upgrade from 6.6 to 10.1	Sitecore Migration
Sitecore upgrade guide	Sitecore 10 upgrade guide	Sitecore 9 upgrade guide
Sitecore upgrade book	Sitecore 10 upgrade issues	Sitecore JSS Upgrade
Sitecore update license file	Sitecore powershell update	Sitecore update package
Sitecore upgrade to 10	Sitecore 10 update installation	Sitecore upgrade tool
Sitecore version upgrade	Sitecore 9 why upgrade	Sitecore 10 why upgrade
What is Sitecore upgrade	How to upgrade Sitecore	How to install Sitecore upgrade

Top Blogs

Mastering ASP.NET MVC Deployment: How .wpp.targets Files Revolutionize Sitecore Project Publishing

Boost Sitecore Performance: Vercel Caching Strategies for XM Cloud Rendering Host

Quickstart guide - All about Sitecore Experience Edge

Sitecore Experience Edge GraphQL Queries

Sitecore Experience Manager Cloud (XM Cloud) Building blocks

Sitecore GraphQL Queries

Sitecore Headless Website Architecture : Sitecore Headless CD, CM, and Headless Frontend Domains

Speed Up Sitecore Upgrade with PackageReference

Configuring Sitecore Next.js Headless SXA Multisite App in a Sitecore Container

Improve Sitecore Solr Search Performance with Solr High Availability

Sitecore Integration With Kafka

Amit Kumar

Post a Comment

Insights into Sitecore Search: A Definitive Introduction