Solr Schema

Solr organizes its data into documents, which consist of fields. Solr schema define how values are stored and queried. It contains information about Types, their Fields (regular and dynamics), and field Analyzers (one optional char filter, one tokenizer, and several filters). The schema.xml file contains information about the Solr fields and how they are analyzed and filtered during searches. Different field types can contain different types of data. Solr uses the schema.xml file to determine how to build indexes from the input documents, and how to perform index and query time processing.

Field type

Field types define a list of different data types for values. You can define strings, numeric types, or new types.

For example:

<fieldType name="boolean" class="solr.BoolField" docValues="true" sortMissingLast="true" />

Analyzers

It is reformat the queried terms for processing. Text analyzers map the source string of text and the final list of tokens. This process occurs during indexing and querying. You can have different analyzer chains for indexing and querying, depending on your business needs. For example:

  • The MappingCharFilter removes special characters, such as letters with diacritic marks.

  • The WhitespaceTokenizeerFactory breaks the queried string into individual words or terms.

  • The StopFilterFactory removes stopwords from the query.

  • The LowerCaseFilterFactory changes capital letters to lower-case letters.


<fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index">

<tokenizer class="solr.StandardTokenizerFactory" />

<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

<filter class="solr.LowerCaseFilterFactory" />

<filter class="solr.RemoveDuplicatesTokenFilterFactory" />

</analyzer>

<analyzer type="query">

<tokenizer class="solr.StandardTokenizerFactory" />

<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />

<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.LowerCaseFilterFactory" />

<filter class="solr.RemoveDuplicatesTokenFilterFactory" />

</analyzer>

</fieldType>

  • Tokenizers: Tokenizers break field data into tokens

  • Filters: are used after tokenizers to examine a stream of tokens and either keep them as-is, transform or discard them, or create new ones. Tokenizers and filters can be combined to form pipelines, or chains, where the output of one becomes the input for the next. A sequence of tokenizers and filters is called an analyzer, and the resulting output of an analyzer is used to match query results or build indexes.

Field

Every field must declare a unique name and associate it with one of the previously-defined types. For example:


<fields>

<field name="_version_" type="long" indexed="true" stored="true" multiValued="false" />

<field name="indexOperationId" type="long" indexed="true" stored="true" multiValued="false" />

</fields>

Dynamic fields

Dynamic fields allow you to index data without defining the name of the field. Instead, the name of the field is defined by a wildcard (*). You can use prefixes and suffixes so that the actual name of the field is accepted at runtime. However, the field type must be defined.


<dynamicField name="*_boolean" type="boolean" indexed="true" stored="true" />

<dynamicField name="*_boolean_mv" type="boolean" indexed="true" stored="true" multiValued="true" />

Copy fields

Copy fields are used when the content of a source field needs to be added and indexed on different destination fields. The following example copies the autosuggest field to the autosuggest_en destination field:

<field name="autosuggest_en" type="text_spell_en" indexed="true" stored="true" multiValued="true" />

<copyField source="autosuggest" dest="autosuggest_en" />

Handlers

It is allow the sending and retrieving of information in Solr.

  • The Request Handler processes search features (select, query and get). It uses a query parser to interpret search terms and query parameters. It uses a response writer to format output to formats like XML, JSON, and others.

  • The Update Handler receives information from external sources (like a relational database). It pushes the transformation into a document, then executes the indexing operation.

  • SAP Commerce Cloud adds Synonym and Stopword Handlers on top of these, allowing administration of these features through the Backoffice.