Solr

Solr is an open-source enterprise-search platform which is written in Java which is used to build search applications. It was built on top of Apache Lucene project. Apache Lucene is the java library which provides indexing and search functionality. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration and rich document handling. Search and Navigation module is one of the most important and intensively used in hybris. It is based on Apache SOLR. It is purely for search and content delivery, not for persistent data storage.

Hybris provide in-build solr server. solrserver extension contain solr server in hybris. Solr in Hybris is used for making faster search in the website. Setting up the infrastructure to host Solr is used for search and navigation requirements

Embedded versus Standalone versus Cloud Solr Server

There are three ways to set up a Solr integration with Hybris CX, depending on the version.

  • Embedded - It means Solr will execute as an external process inside the same JVM that runs SAP Hybris CX process. this mode is suitable for development, it's not recommended for Production environments, Because It could be crashes or fatal or both and would be difficult to setup, monitor and scale individually.

  • Standalone - Standalone mode is the commonly recommended setup for Production. Solr is executed in its own JVM which makes it much easier to monitor reliability and scalability. Production environment should always use a Solr cluster, which runs multiple Solr instances running in standalone mode.

  • Solr Cloud - For SAP Hybris Cloud on public infrastructure, Solr Cloud is the default. Support for Solr Cloud was introduced in SAP Commerce v6.2 as a new way to set up Solr. It complements the standalone cluster mode for production use for scalability and availability. Solr Cloud leverages Apache Zookeeper, index sharding and replicas to manage large index scalability with ease.

We can use internal or external Standalone solr server in hybris by using small configuration.

Hybris OOTB Solr configuration

Hybris OOTB, you can find solr setup at hybris/bin/ext-commerce/solrserver/resources/solr/.

The default configuration is as follows:

solrserver.instances.default.autostart=true

solrserver.instances.default.mode=standalone

solrserver.instances.default.hostname=localhost

solrserver.instances.default.port=8983

solrserver.instances.default.memory=512m

Here, we can see autostart=true which tell Solr server to start and stop together with the Hybris platform.

External Standalone Solr Server setup

Solr server is already configured as standalone server mode=standalone. It means, solr setup is already present in Hybris suite which we can use or download and setup our own solr server. Now, the only thing we want is to start/stop it independently from Hybris instance. To do so we need disables the autostart for the default Solr instance using below properties.

solrserver.instances.default.autostart=false

Start/stop solr server

ant startSolrServer

ant stopSolrServer

We can verify it by accessing it in the web using below url

URL : http://localhost:8983/

Cloud Solr Server setup

you have to disable the autostart in the project.properties


solrserver.instances.default.autostart=false

solrserver.instances.cloud.autostart=true

you can override the following default configuration included in the local.properties file to make it suitable to your needs:

# disables the autostart for the default Solr instance

solrserver.instances.default.autostart=false


solrserver.instances.cloud.autostart=true

solrserver.instances.cloud.mode=cloud

solrserver.instances.cloud.hostname=localhost

solrserver.instances.cloud.port=8983

solrserver.instances.cloud.memory=512m

solrserver.instances.cloud.zk.host=

solrserver.instances.cloud.zk.upconfig=true


Data is stored in the database in normalize form. This means data is divided and linked among multiple tables to improve resource usage and performance. But , Solr works differently by storing data in documents form instead of tables. The information is indexed for swift access and search capacity.

Index

An index consists of documents which are essentially a collection of fields. If we compare data model to relational database, we found the following similarities:

  • An index is roughly the same as a database table.

  • A document is similar to the a row of a database table.

  • A field means the same as a column of a database table.

Each field of a document can be either indexed, stored or both. An indexed field is a field which is searchable and sortable. Solr performs text analysis on certain content and search queries in order to determine similar words, understand and match synonyms. When a new document is added to Solr, indexed and stored fields are processed in a different way.

  • Indexed fields goes to an analysis phase which breaks the text into words and applies different transformations to it. The results of this analysis phase are saved to the Solr index.

  • The values of stored fields are saved as is.


Indexing

Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database etc. In Hybris, Indexing process starts with exporting data from SAP Hybris [CX] database to Solr. Solr requires some time for the index operations. Once completed, the index is then replicated to the registered Solr replicas. The index can be built or updated using different supported indexing strategies such as:

  • Full Indexing: It will stop replication and delete all current documents and rebuild the index completely from scratch. This operation can take some time to complete for large data sets, so it should be runs once a day. it will be support only two modes of commit.

    • Direct Mode: It will add or update every document in the index and if the operation fails, previous entries that had been successfully committed will remain available. Index replication is disabled before this operation begins and then resumes later after it finishes.

    • Two-phase Mode: It works atomically and in the case of failure, everything will be rolled back to the initial state. To accomplish this behavior, Solr will create an additional core, which by default uses the same name as the main core, with an added suffix. Such a core will be used to store documents during the indexing operation. Once it's done, a swap operation is performed, and the new index becomes online in the primary node and ready for replication.

  • Update Indexing: This operation won't require replication to stop and is usually executed more frequently since it performs faster only on targeted documents. If needed, hot updates on specific documents can be triggered manually or programmed using the API. Updates can also happen partially, only covering specific attributes on a document.

  • Delete Indexing: This simply removes documents from the index. it can sometimes be more efficient to run delete jobs periodically to maintain accurate data and remove unwanted indexed data in Solr. Keep in mind that a full index operation can take care of excluding documents on its own since it is rebuilding the index from scratch.

Partial and Update Indexing

Update indexing performs an indexing operation for a specific document. There are two ways to modify the content of an existing Solr document.

  • Default Update Indexing: This indexing strategy modifies all the attributes of a Solr document. For example, if a document has 20 Solr index properties, then this approach will update all 20 Solr index properties.

  • Partial Indexing: This approach modifies only specific attributes of a Solr document, which is called a "partial update". For example, a partial update can be configured to only update price and stock Solr index properties. During a partial update indexing process, only stock and price properties of the Solr document will be modified. That means the remaining 18 properties will remain unchanged and unprocessed.

Value Providers, Value Resolvers, Identity Provider and Results Converter

  • Value provider: SAP Hybris has two way of indexing in Solr. First form , data is not requires any change or transformation. The data remain unchanged and direct index in Solr. and second form of data, that does require a change or customization required for allows for filtration to be undertaken with customized logic. Value providers is used for handle the conversion between Hybris database data and Solr document values. There are multiple value providers already available for the most commonly used types, and custom ones can be created if needed. Keep in mind that value providers are usually a little longer to do indexing operation on data.

  • Value Resolver: It is a more efficient replacement for the current value providers. It groups the indexed properties that use the same value provider.

  • Identity provider: It can handle how to identify a document uniquely in Solr index. The out-of-the-box implementation works for most product item type use cases.

  • Results converter: It is similar to the service layer converter concept and can transform the search results of Solr Documents into corresponding DTOs (data transfer object) to be used in the storefront.