Skip to content
waynegraham edited this page Aug 16, 2012 · 1 revision

The SolrSearch plugin is configured by default to index English, but has configurations for the following languages:

  • Arabic
  • Bulgarian
  • Catalan
  • CJK bigram
  • Czech
  • Danish
  • German
  • Greek
  • Spanish
  • Basque
  • Persian
  • Finnish
  • French
  • Irish
  • Galician
  • Hindi
  • Hungarian
  • Armenian
  • Indonesian
  • Italian
  • Hebrew
  • Japanese (using morphological analysis)
  • Latvian
  • Dutch
  • Norwegian
  • Portuguese
  • Romanian
  • Russian
  • Swedish
  • Thai
  • Turkish

To use a different indexing schema, you will need to change the SolrSearch's fulltext entry in the schema.xml. Find the line that reads

<field name="fulltext" type="text_en" indexed="true" stored="false" multiValued="true"/>

You will need to change the type attribute to the language you need to handle in the format 'text_ISOcode'. For example, if you have a collection of Irish Gaelic texts, you would change the line to the following:

<field name="fulltext" type="text_ga" indexed="true" stored="false" multiValued="true"/>

UTF-8 Support

If you are using a non-western character set, make sure the server running Solr (e.g. Tomcat or Jetty) is configured to provide UTF-8 support. See Configuring Tomcat to provide UTF-8 support for Solr for an example for Tomcat.

Clone this wiki locally