Because of the wide variations between languages, there is not a single, consistent model to follow when defining a field in Solr to properly handle a language. Some languages require their own stemming filters, others require multiple filters to handle different language characteristics (such as normalization of characters, removal of accents, and even custom lowercasing functionality), and some languages require their own tokenizers due to the complexity of parsing the language.
---------------------------------------------------------------------------------------------------------------------------------
Searching content in multiple languages
techniques for searching multilingual content in which you may have different languages across your documents or even multiple languages within a single document or field. There are three primary ways to implement these kinds of multilingual search capabilities:
- Create a separate field per language, and spread your query across each of them.
- Use multiple Solr indexes containing the same field name, with each index having the field configured to handle a different language.
- Implement a field type which is natively able to index and search across multiple languages at the same time.
|