Featured Post
Sitecore Integration With Kafka
- Get link
- Other Apps
In today’s world there are large number of web properties present in each domain and every web property trying to impress end-users/customers and working on their brand value to become market leader. In any of the case customers are main interaction points.
The end-users/customers will come to your web property to find latest information, and if you are providing irrelevant and non-customer centric data then user will not interact with your web property and will never return.
You can force Customers to return if you are providing personalized experiences based on their journey and serving the content which are updated and not obsolete.
Now a days most of the sites rely on the real time data in terms of feed from third party resources and for this most of the organizations dependent upon scheduled batch jobs because they are not aware about that when latest data will be available.
To get the real time data into the web property we can think of the use of Distributed Messaging System and Event Streaming Engine/Platform called Kafka to capture the data in real-time from many systems who generates the events and those systems can be e.g. customer interactions from some web property, monitoring details, microservices, cloud services or send/receive data from applications.
The Sitecore Experience Platform is flexible enough to integrate with external systems to increase the capabilities of Sitecore usage to achieve the required Business Use.
In this article, we will explore:
What is Kafka? Useful information about Kafka Kafka Architecture Use cases in which we can use Kafka with Sitecore Secure Kafka Instance Prerequisite to use the Kafka How to integrate Sitecore with Kafka? Sitecore Items Sitecore Kafka GitHub SolutionIn usual implementation we will be using Database or any other data sources to capture the response of any event and later we will retrieve the data from these data sources.
If we need to know on which event these responses coming, then it’s difficult to explain/identify without storage of events. An event will occur when some action performed on any object. In this case, it’s not feasible to store the event details with response of any event into data storage instead of that we can store into the form of logs.
In case of Kafka, its being stored in the form of log with state information and event description, which will indicate event orders. And with the help of these logs we can identify which event happened at specific time. These logs can also be scale as per need which not possible in terms of databases which require more resources.
In other words, Apache Kafka is an event streaming platform based upon Event-driven Architecture (EDA), where event is data which represent the change in state of data and called as Topics. The Apache Kafka streaming platform can stream events in real-time from one or more sources to one or more destinations.
The real-world examples of events can be a thing happening in the like user add items into the shopping cart, some system adding items into the database, some user modify contact details, etc. and these represents an event with change state and sending notification to system in the form of topics which contains information about specific event.
These topics stored in distributed systems (with replication) to provide high Availability so that data will be persist in the systems and serve from any of the serves in case of any failure.
1 | Apache Kafka is an open-source event streaming platform |
2 | It provides extremely high throughput and low latency |
3 | Entity which store information about object state in Kafka called Topics |
4 | Topics can be stored and consume by more than one system |
5 | Topics can be stored for shorter or longer time duration e.g. few hours, or days, or years, or hundreds of years or indefinitely |
6 | Topics can also be relatively small, or they can be enormous. There is no hard and fast rule or best practices or architectural suggestions about size of topics in Kafka |
7 | Kafka provides High throughput, Horizontal Scalability, Inbuilt partitioning, Easy and quick replication, High fault tolerance |
8 | Kafka’s programming model is based on the publish-subscribe pattern |
9 | With Kafka, publishers send messages to topics, which are named logical channels. A subscriber to a topic receives all the messages published to the topic |
10 | Kafka based on event-driven architecture, Kafka is used as an event router, and the microservices publish and subscribe to the events |
11 | In Kaka Platform, producer and consumer are loosely coupled and rely only on topic of interest and message schema |
12 | Kafka is written in JAVA |
The Kafka is distributed environment based on Event-driven Architecture (EDA) and consists of a cluster of servers.
Key building block of Kafka Platform:
Kafka will run in the form of a cluster of one or more servers that can span multiple data centers or cloud regions The servers in Kafka cluster which perform Storage layer role called the Broker. The Brokers can be scaled as per need or auto scale Entity which store information about object state in Kafka called Topics and they will be send to or from Kaka Platform in the form of messages. The topics can be stored into the Kafka Brokers for any time interval or in any size depending upon business need. The client applications which will append records into the Kafka topics. Multiple producers can log into single topic in the Kafka brokers A consumer is one that subscribes to the Kafka brokers to receive the messages. The consumers will listen for messages from the specific topic Several partitions can be configured at the Kafka level. Each topic can be divided into multiple partitions. The partitions get distributed across multiple brokers. Based on the replication factor, the partitions also get replicated between the clusters. Among a partition and its replications, one of them acts as a “leader” and others act as “followers.” When the leader fails, one of the followers automatically steps up to be a leader. This ensures high fault tolerance and less down time. Each message persisted inside a partition is assigned to a number offset value. In each partition, the messages are ordered by the offset value and then stored. When the message is consumed by a consumer, it also gets the partition ID and offset value of the received message
In the above diagram:
o The Apache Zookeeper work as configuration store for Kafka which includes metadata of Kafka cluster processes. The Zookeeper component is very important and perform many important tasks such as Leader selection process, store list of Consumers, access control policies for topics, etc.
o The Kafka topic work on FIFO (First In First Out) queues.
o There are three brokers, three partitions, and the replication factor is 3. The leader partition is marked in green and the followers are marked in orange. The partition expanded to show how the messages are stored in a partition and indexed with an offset value.
There are four major APIs in Kafka, namely:
The Producer API: sends streams of data to topics in the Kafka cluster
The Consumer API: reads streams of data from topics in the Kafka cluster
The Streams API: transforms streams of data from input topics to output topics
The Connect API: implements connectors that consistently pulls from some source system or app into Kafka or push from Kafka into others
The Sitecore Experience Platform is a Microsoft Stack based product and it can be extended to integrate with any type of system/api. From my point of view, we can utilize Kafka with Sitecore while implementing any enterprise level or large organization based system where you have more than one systems which rely on the same set of data or where you have to push/pull real-time data, so some of the use cases of Sitecore with Kafka can be:
o Push user PII data captured from Sitecore to downstream systems or other third-party systems via Kafka
o Read user PII data from Kafka and push to Sitecore Contacts for better user experience
o Read real-time feed from Kafka in terms of Articles and push records into Sitecore
o Real real-time offer from Kafka and display at real-time on the Sitecore based Web Property
o Stream real-time data with respect to user (identified) behavior/user analytics from Sitecore to Kafka to ingest real-time log and tracking data for analytics, dashboards, and machine learning
o Read real-time analytics or user 360-degree view or recommendation from data lake, and provide personalized Sitecore user experience
o To support mission critical Sitecore Applications such as Financial Domain or eCommerce Domain push the critical event logs, event sourcing to Kafka so that downstream systems can process the details and take immediate actions
o If Sitecore is single source of truth for Content in enterprise organization then Kafka Streams API can be used to store and distribute published content in real-time to the various applications and systems that make it available to the readers
o In your Sitecore Commerce application, Kafka can be used for financial transaction and Business processes in real-time, so that you can provide immediate response to user with low latency
o You can use Apache Kafka as a core component, if you are looking for near real-time use cases
As per my understanding, above use cases are most common and usually present in most of the Sitecore based implementation. It will be good to consider Sitecore integration with Kafka in future projects.
o Channel based encryption
o Authentication
o Authorization
Credentials are sent across the wire, its recommended to use channel-based encryption.
Following are the prerequisite which I followed to use the Kafka: Changes done at Network level to access the Kafka broker:Ensure that the ports that are used by the Kafka server are not blocked by a firewall.
The Kafka brokers are generally exposed at public IP for outside users to be able to connect to them and they are associated with Private IPs for communicating internally, so you should submit a firewall request to allow your IDs/AD group( is preferred) to be able to connect to Kafka broker public IPs if you are connecting via Internal Network. You can get the Public IPs by pinging on the Fully qualified domain name (FQDN)/hostname.
To enable client authentication between the Kafka consumers and a Kafka brokers, a key and certificate for each broker and client in the cluster must be generated. The certificates also need to be signed by a certificate authority (CA).
In Kafka we can implement Authentication mechanism using SSL so the Kafka Brokers and Client communicate with each other over the SSL.
During Kafka SSL setup, certificate generated for each machine in the cluster which have common name (CN) matched with the fully qualified domain name (FQDN) of the server. The client compares the CN with the DNS domain name to ensure that it is indeed connecting to the desired server, not a malicious one.
To connect with Kafka brokers, client required Kafka Broker Certificate in .pem extension. This certificate can be provided by the Kafka Admin and store this certificate in the shared location or location where your program have access.
For Kafka Secure Sockets Layer (SSL), and Simple Authentication and Security Layer (SASL) setup in configuration we mentioned the SASL credentials (SASL User Name and Password), and same credentials required by the client to connect with Kafka Broker.
We can integrate Sitecore with Kafka using .NET Connectors. I will be using Cloudera .NET client library for Apache Kafka called Cloudera.Kafka.
The Cloudera.Kafka assembly can be downloaded and installed through Visual Studio or a command line interface or from nuget.org.
Using Cloudera.Kafka, we can Push the data to Kafka system using ProducerBuilder and Pull using ConsumerBuilder.
For this, we have to create the ProducerConfig object which contains the required Connectivity Details:
Once ProducerConfig is ready we can push data to Kafka using ProducerBuilder:For this, we have to create the ConsumerConfig object which contains the required Connectivity Details:
Once ConsumerConfig is ready we can push data to Kafka using ConsumerBuilder:In the above code blocks, we are using configuration values and dynamic values. For Configuration values we can create the template and use as Data Source Item for the Sitecore Component/Rendering, and dynamic values (message only) can be passed from the code.
I have created the ASP.NET MVC based solution which will help to validate the Sitecore integration with Kafka.
- Get link
- Other Apps
Comments