My new blog present here.

Featured Post

Insights into Sitecore Search: A Definitive Introduction

A key component of digital experience management is effective information retrieval. A simplified approach is required for websites, applications, and platforms containing a lot of material so that consumers can easily get the data they require. This need is met by Sitecore, a well-known name in the field of digital experience platforms (DXPs), which provides powerful and comprehensive search functionality. We will travel into the realm of Sitecore Search in this article, learning about its capabilities, architecture , and the enormous value it offers both developers and end users. Introduction to Sitecore Search    A headless content discovery platform powered by AI , Sitecore Search enables you to build predictive and custom search experiences across various content sources. To extract and ...

Sitecore Integration With Kafka

In today’s world there are large number of web properties present in each domain and every web property trying to impress end-users/customers and working on their brand value to become market leader. In any of the case customers are main interaction points.

The end-users/customers will come to your web property to find latest information, and if you are providing irrelevant and non-customer centric data then user will not interact with your web property and will never return.

You can force Customers to return if you are providing personalized experiences based on their journey and serving the content which are updated and not obsolete.

Now a days most of the sites rely on the real time data in terms of feed from third party resources and for this most of the organizations dependent upon scheduled batch jobs because they are not aware about that when latest data will be available.

To get the real time data into the web property we can think of the use of Distributed Messaging System and Event Streaming Engine/Platform called Kafka to capture the data in real-time from many systems who generates the events and those systems can be e.g. customer interactions from some web property, monitoring details, microservices, cloud services or send/receive data from applications.

The Sitecore Experience Platform is flexible enough to integrate with external systems to increase the capabilities of Sitecore usage to achieve the required Business Use.

In this article, we will explore:

What is Kafka?
Useful information about Kafka
Kafka Architecture
Use cases in which we can use Kafka with Sitecore
Secure Kafka Instance
Prerequisite to use the Kafka
How to integrate Sitecore with Kafka?
Sitecore Items
Sitecore Kafka GitHub Solution
What is Kafka? 

In usual implementation we will be using Database or any other data sources to capture the response of any event and later we will retrieve the data from these data sources. 

If we need to know on which event these responses coming, then it’s difficult to explain/identify without storage of events. An event will occur when some action performed on any object. In this case, it’s not feasible to store the event details with response of any event into data storage instead of that we can store into the form of logs. 

In case of Kafka, its being stored in the form of log with state information and event description, which will indicate event orders. And with the help of these logs we can identify which event happened at specific time. These logs can also be scale as per need which not possible in terms of databases which require more resources. 

In other words, Apache Kafka is an event streaming platform based upon Event-driven Architecture (EDA), where event is data which represent the change in state of data and called as Topics. The Apache Kafka streaming platform can stream events in real-time from one or more sources to one or more destinations. 

The real-world examples of events can be a thing happening in the like user add items into the shopping cart, some system adding items into the database, some user modify contact details, etc. and these represents an event with change state and sending notification to system in the form of topics which contains information about specific event. 

These topics stored in distributed systems (with replication) to provide high Availability so that data will be persist in the systems and serve from any of the serves in case of any failure.

Useful information about Kafka 
1 Apache Kafka is an open-source event streaming platform
2 It provides extremely high throughput and low latency
3 Entity which store information about object state in Kafka called Topics
4 Topics can be stored and consume by more than one system
5 Topics can be stored for shorter or longer time duration e.g. few hours, or days, or years, or hundreds of years or indefinitely
6 Topics can also be relatively small, or they can be enormous. There is no hard and fast rule or best practices or architectural suggestions about size of topics in Kafka
7 Kafka provides High throughput, Horizontal Scalability, Inbuilt partitioning, Easy and quick replication, High fault tolerance
8 Kafka’s programming model is based on the publish-subscribe pattern
9 With Kafka, publishers send messages to topics, which are named logical channels. A subscriber to a topic receives all the messages published to the topic
10 Kafka based on event-driven architecture, Kafka is used as an event router, and the microservices publish and subscribe to the events
11 In Kaka Platform, producer and consumer are loosely coupled and rely only on topic of interest and message schema
12 Kafka is written in JAVA

Kafka Architecture 

The Kafka is distributed environment based on Event-driven Architecture (EDA) and consists of a cluster of servers. 

Key building block of Kafka Platform:

Kafka will run in the form of a cluster of one or more servers that can span multiple data centers or cloud regions The servers in Kafka cluster which perform Storage layer role called the Broker. The Brokers can be scaled as per need or auto scale Entity which store information about object state in Kafka called Topics and they will be send to or from Kaka Platform in the form of messages. The topics can be stored into the Kafka Brokers for any time interval or in any size depending upon business need. The client applications which will append records into the Kafka topics. Multiple producers can log into single topic in the Kafka brokers A consumer is one that subscribes to the Kafka brokers to receive the messages. The consumers will listen for messages from the specific topic Several partitions can be configured at the Kafka level. Each topic can be divided into multiple partitions. The partitions get distributed across multiple brokers. Based on the replication factor, the partitions also get replicated between the clusters. Among a partition and its replications, one of them acts as a “leader” and others act as “followers.” When the leader fails, one of the followers automatically steps up to be a leader. This ensures high fault tolerance and less down time. Each message persisted inside a partition is assigned to a number offset value. In each partition, the messages are ordered by the offset value and then stored. When the message is consumed by a consumer, it also gets the partition ID and offset value of the received message


In the above diagram:
o The Apache Zookeeper work as configuration store for Kafka which includes metadata of Kafka cluster processes. The Zookeeper component is very important and perform many important tasks such as Leader selection process, store list of Consumers, access control policies for topics, etc.

o The Kafka topic work on FIFO (First In First Out) queues.

o There are three brokers, three partitions, and the replication factor is 3. The leader partition is marked in green and the followers are marked in orange. The partition expanded to show how the messages are stored in a partition and indexed with an offset value.

There are four major APIs in Kafka, namely:  

The Producer API: sends streams of data to topics in the Kafka cluster

The Consumer API: reads streams of data from topics in the Kafka cluster 

The Streams API: transforms streams of data from input topics to output topics 

The Connect API: implements connectors that consistently pulls from some source system or app into Kafka or push from Kafka into others

Use cases in which we can use Kafka with Sitecore:

The Sitecore Experience Platform is a Microsoft Stack based product and it can be extended to integrate with any type of system/api. From my point of view, we can utilize Kafka with Sitecore while implementing any enterprise level or large organization based system where you have more than one systems which rely on the same set of data or where you have to push/pull real-time data, so some of the use cases of Sitecore with Kafka can be:

o Push user PII data captured from Sitecore to downstream systems or other third-party systems via Kafka

o Read user PII data from Kafka and push to Sitecore Contacts for better user experience

o Read real-time feed from Kafka in terms of Articles and push records into Sitecore

o Real real-time offer from Kafka and display at real-time on the Sitecore based Web Property

o Stream real-time data with respect to user (identified) behavior/user analytics from Sitecore to Kafka to ingest real-time log and tracking data for analytics, dashboards, and machine learning

o Read real-time analytics or user 360-degree view or recommendation from data lake, and provide personalized Sitecore user experience

o To support mission critical Sitecore Applications such as Financial Domain or eCommerce Domain push the critical event logs, event sourcing to Kafka so that downstream systems can process the details and take immediate actions

o If Sitecore is single source of truth for Content in enterprise organization then Kafka Streams API can be used to store and distribute published content in real-time to the various applications and systems that make it available to the readers

o In your Sitecore Commerce application, Kafka can be used for financial transaction and Business processes in real-time, so that you can provide immediate response to user with low latency

o You can use Apache Kafka as a core component, if you are looking for near real-time use cases

As per my understanding, above use cases are most common and usually present in most of the Sitecore based implementation. It will be good to consider Sitecore integration with Kafka in future projects.

Secure Kafka Instance 
The Kafka instance can provide many options out-of-the-box to secure the instance as well as content, and it’s part of approved tools list in most of the organizations.

             o Channel based encryption

             o Authentication

             o Authorization

            Credentials are sent across the wire, its recommended to use channel-based encryption.

Prerequisite to use the Kafka 
Following are the prerequisite which I followed to use the Kafka:
Changes done at Network level to access the Kafka broker:

Ensure that the ports that are used by the Kafka server are not blocked by a firewall. 

The Kafka brokers are generally exposed at public IP for outside users to be able to connect to them and they are associated with Private IPs for communicating internally, so you should submit a firewall request to allow your IDs/AD group( is preferred) to be able to connect to Kafka broker public IPs if you are connecting via Internal Network. You can get the Public IPs by pinging on the Fully qualified domain name (FQDN)/hostname. 

To enable client authentication between the Kafka consumers and a Kafka brokers, a key and certificate for each broker and client in the cluster must be generated. The certificates also need to be signed by a certificate authority (CA).

In Kafka we can implement Authentication mechanism using SSL so the Kafka Brokers and Client communicate with each other over the SSL. 

During Kafka SSL setup, certificate generated for each machine in the cluster which have common name (CN) matched with the fully qualified domain name (FQDN) of the server. The client compares the CN with the DNS domain name to ensure that it is indeed connecting to the desired server, not a malicious one. 

To connect with Kafka brokers, client required Kafka Broker Certificate in .pem extension. This certificate can be provided by the Kafka Admin and store this certificate in the shared location or location where your program have access.

For Kafka Secure Sockets Layer (SSL), and Simple Authentication and Security Layer (SASL) setup in configuration we mentioned the SASL credentials (SASL User Name and Password), and same credentials required by the client to connect with Kafka Broker.

How to integrate Sitecore with Kafka? 

We can integrate Sitecore with Kafka using .NET Connectors. I will be using Cloudera .NET client library for Apache Kafka called Cloudera.Kafka

The Cloudera.Kafka assembly can be downloaded and installed through Visual Studio or a command line interface or from nuget.org. 

Using Cloudera.Kafka, we can Push the data to Kafka system using ProducerBuilder and Pull using ConsumerBuilder.

For this, we have to create the ProducerConfig object which contains the required Connectivity Details:

Once ProducerConfig is ready we can push data to Kafka using ProducerBuilder:

For this, we have to create the ConsumerConfig object which contains the required Connectivity Details:

Once ConsumerConfig is ready we can push data to Kafka using ConsumerBuilder:
Sitecore Items 

In the above code blocks, we are using configuration values and dynamic values. For Configuration values we can create the template and use as Data Source Item for the Sitecore Component/Rendering, and dynamic values (message only) can be passed from the code.

Sitecore Kafka GitHub Solution 

I have created the ASP.NET MVC based solution which will help to validate the Sitecore integration with Kafka.


Credit/References:

Kafka .NET Client Build MVC Applications with Connectivity to Kafka Data How To Use Apache Kafka In .NET Application
Step-By-Step Installation And Configuration Guide Of Apache Kafka On Windows Operating System Using Apache Kafka with .NET ASP.Net Core Streaming Application Using Kafka Part 1
How to use Apache Kafka messaging in .Net Everything you need to know about Kafka in 10 minutes What is an Event Streaming Platform?
Introduction to Event-Driven Architecture New to Big Data? Start with Kafka Event-Driven Architecture with Apache Kafka for .NET Developers Part 1 - Event Producer
Apache Kafka in a Nutshell AWS Kinesis vs Kafka comparison: Which is right for you? Kafka usecases
Apache Kafka and Sitecore Integration Sitecore Integrations
Apache Kafka vs Sitecore Experience Manager Apache Kafka a tool for streaming data into the cluster First create a simple Kafka producer and a Kafka consumer

Pingback:

Compare Apache Kafka vs Sitecore Experience Manager kafka tutorial kafkaesque
kafka on the shore kafka interview questions kafka architecture
kafkaesque meaning kafka streams kafka vs rabbitmq
kafka download kafka connect kafka alternatives
kafka apache kafka academy kafka aws
kafka architecture diagram kafka acl kafka api
a kafkatemplate is required to support replies a kafka story a kafka trap
a kafka quote kafka broker kafka basics
kafka book kafka bootstrap server kafka batch processing
kafka big data kafka best practices kafka broker configuration
kafka b cannot be cast to java.lang.string kafka consumer kafka consumer group
kafka certification kafka consumer java example kafka cluster
kafka client kafka connector c kafka client
kafka c sharp kafka c client example ten c kafka
c sharp kafka consumer kafka docker kafka documentation
kafka default port kafka download for windows kafka docker image
kafka delete topic kafka database kafka
kafka event kafka exporter kafka exactly once
kafka example kafka etl kafka exporter for prometheus
kafka equivalent in aws kafka event bus kafka for windows
kafka framework kafka for beginners kafka for windows download
kafka fundamentals kafka features kafka metamorfoze
kafka metamorphosis kafka metamorfoze analize kafka process
the trial of kafka
Sitecore Hardware and software requirements Sitecore Sitecore Sizing Calculations Sitecore GraphQL Examples
How to use Sitecore SwitchOnRebuildSolrSearchIndex on Solr slave Indexes which are replicated from Master Index
Sitecore Solr Performance solr in sitecore sitecore solr search
sitecore solr index configuration sitecore solr performance solr in sitecore
sitecore solr search sitecore solr index configuration solr setup in sitecore 9
Start using Sitecore GraphQL API Sitecore Helix Recommendation and Conventions - Helix 2.0 What’s new in Sitecore 10
Analysis for Sitecore Experience Accelerator (SXA) based website implementation Secure Sitecore JSS Site Sitecore Experience Accelerator (SXA)
Sitecore Graphql tutorial Sitecore Performance Tuning Sitecore GraphQl Examples
What is SXA Page Design? Sitecore Installation and Upgrade Guides Sitecore Upgrade Services
Upgrade to Sitecore 9 Sitecore Version 10 Sitecore 10 Upgrade
Sitecore Upgrade 9.0 to 9.3 Sitecore Upgrade Approach sitecore content migration
How to upgrade Sitecore CD environment Sitecore custom logs not working since upgrade Sitecore upgrade from 8.2 to 9.3
Sitecore upgrade from 8.2 to 9.2 Sitecore upgrade to 9.3 Sitecore upgrade 9.0 to 9.2
Sitecore upgrade from 8.2 to 9.1 Sitecore upgrade from 8.2 to 10 Sitecore upgrade azure
What is a Sitecore upgrade Sitecore content migration tools Content migration in Sitecore
Sitecore update center Sitecore upgrade from 6.6 to 10.1 Sitecore Migration
Sitecore upgrade guide Sitecore 10 upgrade guide Sitecore 9 upgrade guide
Sitecore upgrade book Sitecore 10 upgrade issues Sitecore JSS Upgrade
Sitecore update license file Sitecore powershell update Sitecore update package
Sitecore upgrade to 10 Sitecore 10 update installation Sitecore upgrade tool
Sitecore version upgrade Sitecore 9 why upgrade Sitecore 10 why upgrade
What is Sitecore upgrade How to upgrade Sitecore How to install Sitecore upgrade

Comments

Popular posts from this blog

Sitecore GraphQL Queries

Sitecore Experience Manager Cloud (XM Cloud) Building blocks

Sitecore Experience Edge GraphQL Queries