Featured Post
Improve Sitecore Solr Search Performance with Solr High Availability
- Get link
- X
- Other Apps
The Content or Data is heart of any application and the Sitecore Experience Platform fulfill all the Business needs in terms of collecting and sharing the content.
In most of the organization or projects, the Sitecore Product just being used as a Content Management System and being utilized extensively, and in some organization the Sitecore used as Single source of Content.
For above cases, Content availability is MAJOR requirement, and we can scale our infrastructure to accomplish this. But there are scenarios when these things can’t help.
Let’s understand, how can we improve content availability while accessing Sitecore Content via Solr with the help of example.
The Contoso Corporation is Sitecore Experience Platform to build their corporate website. The website has around 60K visits per day, and website provides resources like documents, product details and articles which are span across multiple categories.
The website consumes the Article feed from the third-party API and store into the Sitecore System, after that Digital Marketers review the articles and making required changes so that Articles available for end-user.
There is Article section in the website which displays the Article Categories, and each Category have landing page with Featured Articles including Article details page. Also, website landing page have many content blocks to show the articles in the form of New, Featured, etc.
The content availability is major Non-Functional Requirement (NFR) for Contoso website.
The flow related to Article synchronization from third party feed to Solr:
Solution:
The main requirement here is zero downtime for the Article for end-users. To implement this, we have following options: Sitecore Infrastructure Scalability Solr Indexing Scalability
In any of the Sitecore implementation when we will start work then we have to consider all the NFRs with respect to implementation which will help to do the Sitecore Infrastructure Sizing, and some of the NFRs related to Sitecore Sizing are:
1 | Number of websites hosted on that Sitecore instance |
2 | Number of countries and languages |
3 | Number of visits per month |
4 | Expected no. of concurrent sessions |
5 | Expected yearly increase rate of no. of visitors |
6 | Expected yearly increase rate of concurrent sessions |
7 | Number of concurrent editors working with Sitecore |
8 | What is the expected growth? |
9 | Number of Editor/CM will be? |
10 | Amount of content pages planned |
11 | Is High-Availability required? |
12 | Is geo distribution required? |
13 | Is Zero-Downtime Deployment required? |
By considering, we can propose Contoso Corporation following Sitecore Infrastructure to start with and it can be refined with the help of Performance Load testing: With the help of above proposed Sitecore Deployment Architecture, we are ensuring that:
• The application will be available to end-user
• The required Sitecore Roles like Sitecore Content Delivery, Search which extensively utilized by the application has been load balanced with extra servers for high availability.
The Contoso Corporation Sitecore Application dependent on the Solr content, so in that case we have to think through what type of design consideration we will take so that there will no performance issue, because in Sitecore implementation, search is very important functionality which utilized by both Sitecore Content Management System and Sitecore Content Delivery.
In this case we can consider following options to setup Solr with High Availability to optimize Sitecore Solr Performance in Sitecore implementation so that there will be no issue in Sitecore Content Delivery via Solr:
o Solr Master-Slave Setup for Sitecore Solr Search Indexingo SolrCloud Setup for Sitecore Solr Search Indexing
o Sitecore Switch Solr Indexes (Solr SwitchOnRebuildSolrSearchIndex Strategy)
The Solr Master-Slave is one of the modes to setup Solr, in which Solr replication will be utilized. The Solr Replication distribute complete copies of a master index to one or more slave servers.
The Solr Master will perform the update operation into Solr indexes while Solr Slaves responds to queries. With this we can scale Solr Slave as per application load to improve the performance.
In the Sitecore Master-Slave Configuration diagram below, Solr Master will take care of all write operation to update the indexes in Solr Master and will push the updated indexes to Solr Slaves once update finished. The Solr Slaves will serve the content to end-user application which deployed Sitecore Content Delivery Servers. Also, the Communication between Solr Master and Solr Slave will be over the HTTP and rest of the communications between Solr Master, Sitecore Content Management System (CMS), Sitecore Content Delivery (CD) and Sitecore Slaves will be over the HTTPS:
The SolrCloud mode of Solr provides cluster of Solrs with highly available, index replication, fault tolerant environment and distributed queries with the help of ZooKeeper.
The Solr and SolrCloud are not different items; Solr is the application while SolrCloud is a mode of running Solr. The alternative to running Solr in SolrCloud mode is running it in standalone mode.
The Solr Master/Slave model also supports index replication and distributed queries without Zookeeper, and these activities are not automated. Also, to support the fault tolerant, you need to rely on extra nodes with load balancer with manual efforts if master is down then need to setup the master with verification of data to avoid data loss during downtime but in case of SolrCloud, it automatically start the process for next leader and divert the request to new leader if leader node is down without data loss and any manual efforts.
With the help of some tools like Solrj you can implement load balancing in SolrCloud and Solrj will not transfer any request for processing to node which is down with the help of ZooKeeper.
It’s good to rely on external load balancer instead of making things complicated.
In the below diagram, for the simplicity Solr and ZooKeeper installed on the same node, and to avoid performance issues its recommended to utilize different hard disks for both Solr and Zookeeper
The minimum requirement to start with SolrCloud is three instances of Solr and ZooKeeper.
Based on your application load and best practices, you can think of having Zookeepr instance running on separate servers for Production environments.
The ZooKeeper replicate itself to a set of hosts called an ensemble to process distributed queries.
You can connect your Client directly with Zookeepr server where Zookeeper installed, and internally each ZooKeeper service connected with each other. If ZooKeeper to which client connected become down then Client will connect with to different ZooKeeper Server.
Each replica in SolrCloud will send the status to ZooKeeper about its availability that it’s active or not. If ZooKeeper know that leader went down, it will initiate the leader election process and will redirect query request to active Solr node. With this feature SolrCloud will provide high availability for queries.
In our problem statement case, we are using third party API to sync the Articles, and sync process will take time because of insertion of records and then rebuilding the Sitecore Solr Indexes which leads to downtime of content via Solr.
In earlier steps we talked about increasing the high availability of Solr Searches but at the end every option rely on the Solr Documents, if MASTER Index does not have any documents then the changes are replicated to the Slave instance and it also gets empty. Due to this you will be getting EMPTY documents from your Solr indexes.
The Sitecore recommended to use Sitecore Switch on Rebuild option to avoid the downtime of Solr Indexes, and good to use on Sitecore Search Indexes where you are looking for High Availability.
The Sitecore provides Switch On Rebuild support for both Solr and SolrCloud mode with the help of SwitchOnRebuildSolrSearchIndex and SwitchOnRebuildSolrCloudSearchIndex respectively.
The Sitecore Switch On Rebuild support for SolrCloud available from Sitecore 9.0 update 2 and later.
With SwitchOnRebuild you can setup rebuilding of indexes in separate core so that while rebuilding index your application will be serving Content from the other Solr Core with this we can achieve Zero Downtime while rebuilding the index.
The application will always point to Primary index and active index details will be present at your Primary Index Core.properties file. At Sitecore application side, your Sitecore Search Index name present in your search specific configuration files.
The SolrCloud mode of Solr also support SwitchOnRebuild.
For better understanding, I have tried to explain about Sitecore SwitchOnRebuildSolrSearchIndex Process with the help of flow diagram:Please find below steps to setup Sitecore Switch on Rebuild: e.g. we will be creating new Custom Index called sitecore_article_index and will be creating secondary index with name sitecore_article_index_secondary
1 | Login to Solr Server |
2 | Stop the Solr Service |
3 | For new custom, take copy of the sitecore_web_index folder from [Solr Folder]\server\solr\* and rename it as sitecore_article_index |
4 | Open the file sitecore_article_index/core.properties file and set name of the new core as name=sitecore_article_index |
5 | To create the secondary core as sitecore_article_index_secondary, repeat the steps from 3 to 4 and add the core name as sitecore_article_index_secondary |
6 | Login to Sitecore system where Indexing role deployed, generally it’s on ContentManagement role |
7 | Create the Solr Index Configuration file at [WEB ROOT]\App_Config\Include\zzz\z.Sitecore.ContentSearch.Solr.Article.Configuration.config for Article index, which will be used to include/exclude details about Fields/Templates which needs to be indexed: |
8 | Take the copy of Web Index, and create the new Sitecore Search Index Configuration for Sitecore Solr Primary Index as [WEB ROOT]\App_Config\Include\zz\zz.Sitecore.ContentSearch.Solr.Index.Article.config:
You can also check following resources to create Custom Sitecore Solr Indexes: How to create a custom SOLR index Walkthrough: Creating a custom Sitecore Forms index Custom Sitecore index configuration for Solr implementation |
9 | In the Sitecore Solr Primary Index (zz.Sitecore.ContentSearch.Solr.Index.Article.config) we have added the reference of Custom Index Configuration defaultSolrArticleIndexConfiguration in Configuration node created in step#7. |
10 | Now we have to create Rebuild Core for Article index at [WEB ROOT]\App_Config\Include\zzz\zz.Sitecore.ContentSearch.Solr.Index.Article_Secondary.config: |
11 | In the Sitecore Solr Secondary Index (zz.Sitecore.ContentSearch.Solr.Index.Article_Secondary.config) we have added the secondary core for Article index in rebuild core as sitecore_article_index_secondary and also added the reference of Custom Index Configuration defaultSolrArticleIndexConfiguration in Configuration node created in step#7. |
12 | The files created above (in step# 8 and step# 10) are for Sitecore Solr Primary and Secondary Index with type as SwitchOnRebuildSolrSearchIndex and it should be present only to the Sitecore indexing instance not on Sitecore ContentDelivery instance (CD). It means Sitecore Content Delivery (CD) instance should not have any secondary core configuration or reference on CD instances and Sitecore Content Delivery instance (CD) should have only Sitecore Primary Index related configuration and reference of primary core name only without SwitchOnRebuildSolrSearchIndex and index strategy set as manual index. |
13 | Now create Sitecore Solr Primary Index configuration file for Sitecore Content Delivery instance at [WEB ROOT]\App_Config\Include\zz\zz.Sitecore.ContentSearch.Solr.Index.Article.CD.config : |
14 | Save all the Configuration file and start the Solr Service from Solr Server |
15 | Wait for sometime and verify all Solr Cores are loading at Solr Admin or not |
16 | Once all Solr Cores loaded, rebuild the index from Sitecore CMS |
Key Considerations:
With SolrCloud you can replicate individual Sitecore Indexes Collection into N-replicas on N-Solr nodes (https://blog.alpha-solutions.us/wp-content/uploads/2017/10/SolrCloud_5_replicas_all_indexes.png) for high availability |
The storage of ZooKeeper is in-memory, which helps in high throughput and low latency numbers |
Sitecore supports both standalone Solr, SolrCloud, Azure Search, Coveo and SearchStax as search providers |
In Solr Master/Slave Model, if Solr Master is down then Solr Slave can respond to queries but indexing from Source would be stopped because Solr Master is down, and new Data/Content will not be available to end-users until Solr Master is up |
The Solr Master/Slave and SolrCloud Modes uses the SolrReplication, but fault tolerant is automatic in SolrCloud but not in Solr Master/Slave model |
The SwitchOnRebuildSolrSearchIndex related should be present only on the Indexing Role only and most of the times its on Sitecore Content Management |
Sitecore Content Delivery instance (CD) should not have any secondary core configuration or reference on Sitecore Content Delivery (CD) instances and Sitecore Content Delivery instance (CD) should have only Sitecore Primary Index related configuration and reference of primary core name only without SwitchOnRebuildSolrSearchIndex and manual index update strategy |
- Get link
- X
- Other Apps
Comments