The Java and Alfresco World: Getting started with SOLR 5

Installing SOLR:

Prerequisite: JDK 6 or Above should be installed before starting with SOLR.

Installing On Windows:

Execute following steps to Install SOLR on windows:

1.1- Download the SOLR distribution

1.2- Go to http://www.apache.org/dyn/closer.lua/lucene/solr/5.5.

1.3- Select a mirror and download the zip file.

e.g. http://a.mbbsindia.com/lucene/solr/5.5.0/solr-5.5.0.zip

1.4- Extract the zip file in ‘C’ drive, e.g. ‘C:\solr-5.5.0’

2. Installing On Windows:

Execute following steps to Install SOLR on Linux:

2.1- Download the SOLR distribution, execute below command:

wget http://download.nextag.com/apache/lucene/solr/5.3.0/solr-5.3.0.tgz

2.2- After download completion, extract the zip to ‘/local/’. Execute below command:

tar -zxvf solr-5.3.0.tgz

Starting SOLR:

You can use below given commands to start SOLR:

solr start

solr start –all

solr start -p <port-number> e.g. solr start -p 8983

You can execute following command to get usage details of solr start:

solr start -help

Verifying SOLR Status:

Execute below given command to verify the status of SOLR:

solr status

It will display following details on command prompt:

Found Solr process 32928 running on port 8983

{

"solr_home":"C:\\solr-5.5.0\\server\\solr",

"version":"5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16

15:22:52",

"startTime":"2016-03-18T14:23:46.737Z",

"uptime":"0 days, 0 hours, 0 minutes, 4 seconds",

"memory":"57.5 MB (%11.7) of 490.7 MB"}

Running Examples:

For simplicity let’s go with OOTB examples. Solr comes with an example directory which contains some sample files we can use.

The available examples are:

· cloud: SolrCloud example

· dih: Data Import Handler (rdbms, mail, rss, tika)

· schemaless : Schema-less example (schema is inferred from data during indexing)

· techproducts : Kitchen sink example providing comprehensive examples of Solr features

Execute the following steps to run an example on windows:

1. Open command prompt and go to ‘C:\solr-5.5.0\bin’ directory

2. Use the following command to run an example:

solr -e [example]

Here [example] is one of the above given examples. Let’s run ‘techproducts’.

C:\solr-5.5.0\bin>solr -e techproducts

You should see solr will execute below given steps in the terminal.

Creating Solr home directory C:\solr-5.5.0\example\techproducts\solr

Starting up Solr on port 8983 using command:

C:\solr-5.5.0\bin\solr.cmd start -p 8983 -s "C:\solr-5.5.0\example\techproducts\

solr"

Waiting up to 30 to see Solr running on port 8983

Started Solr server on port 8983. Happy searching!

Copying configuration to new core instance directory:

C:\solr-5.5.0\example\techproducts\solr\techproducts

Creating new core 'techproducts' using command:

http://localhost:8983/solr/admin/cores?action=CREATE&name=techproducts&instanceDir=techproducts

{

"responseHeader":{

"status":0,

"QTime":3899},

"core":"techproducts"

}

Indexing tech product example docs from C:\solr-5.5.0\example\exampledocs

SimplePostTool version 5.0.0

Posting files to [base] url http://localhost:8983/solr/techproducts/update using

content-type application/xml...

POSTing file gb18030-example.xml to [base]

POSTing file hd.xml to [base]

POSTing file ipod_other.xml to [base]

POSTing file ipod_video.xml to [base]

POSTing file manufacturers.xml to [base]

POSTing file mem.xml to [base]

POSTing file money.xml to [base]

POSTing file monitor.xml to [base]

POSTing file monitor2.xml to [base]

POSTing file mp500.xml to [base]

POSTing file sd500.xml to [base]

POSTing file solr.xml to [base]

POSTing file utf8-example.xml to [base]

POSTing file vidcard.xml to [base]

14 files indexed.

COMMITting Solr index changes to http://localhost:8983/solr/techproducts/update

Time spent: 0:00:01.410

Solr techproducts example launched successfully. Direct your Web browser to

http://localhost:8983/solr to visit the Solr Admin UI

Searchingdocuments in techproducts example:

Let's try to retrieve the document we just added as part of ‘techproducts’. Solr accepts HTTP requests, you can use your web browser to communicate with Solr:

Here 'q' is query parameter and value is ‘currency’ (some keywordfrom money.xml) and 'wt' response format and value is 'json'. Alternatively we can also pass wt=xml in order to get xml response.

It returns following xml result (wt=xml is passed to get xml response:

<response>

<lst name="responseHeader">

<int name="status">0</int>

<int name="QTime">2</int>

<lst name="params">

<str name="q">currency</str>

<str name="wt">xml</str>

</lst>

</lst><result name="response" numFound="4" start="0">

<doc>

<str name="id">USD</str>

<str name="name">One Dollar</str>

<str name="manu">Bank of America</str>

<str name="manu_id_s">boa</str>

<arr name="cat">

<str>currency</str>

</arr>

<arr name="features">

<str>Coins and notes</str>

</arr>

<str name="price_c">1,USD</str>

<bool name="inStock">true</bool>

<long name="_version_">1528426460326395904</long>

</doc>

<doc>

<str name="id">EUR</str>

<str name="name">One Euro</str>

<str name="manu">European Union</str>

<str name="manu_id_s">eu</str>

<arr name="cat">

<str>currency</str>

</arr>

<arr name="features">

<str>Coins and notes</str>

</arr>

<str name="price_c">1,EUR</str>

<bool name="inStock">true</bool>

<long name="_version_">1528426460333735936</long>

</doc>

<doc>

<str name="id">GBP</str>

<str name="name">One British Pound</str>

<str name="manu">U.K.</str>

<str name="manu_id_s">uk</str>

<arr name="cat">

<str>currency</str>

</arr>

<arr name="features">

<str>Coins and notes</str>

</arr>

<str name="price_c">1,GBP</str>

<bool name="inStock">true</bool>

<long name="_version_">1528426460334784512</long>

</doc>

<doc>

<str name="id">NOK</str>

<str name="name">One Krone</str>

<str name="manu">Bank of Norway</str>

<str name="manu_id_s">nor</str>

<arr name="cat">

<str>currency</str>

</arr>

<arr name="features">

<str>Coins and notes</str>

</arr>

<str name="price_c">1,NOK</str>

<bool name="inStock">true</bool>

<long name="_version_">1528426460335833088</long>

</doc>

</result></response>

Now let’s try to search for all documents having ‘inStock=true’ and get the name,id and manu in result.

http://localhost:8983/solr/techproducts/select?q=inStock:true&wt=xml&fl=name,id,manu

It will return following result:

<response>

<lst name="responseHeader">

<int name="status">0</int>

<int name="QTime">1</int>

<lst name="params">

<str name="fl">name,id,manu</str>

<str name="q">inStock:true</str>

<str name="wt">xml</str>

</lst>

<result name="response" numFound="17" start="0">

<doc>

<str name="id">GB18030TEST</str>

<str name="name">Test with some GB18030 encoded characters</str>

</doc>

<doc>

<str name="id">SP2514N</str>

<str name="name">

Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133

</str>

<str name="manu">Samsung Electronics Co. Ltd.</str>

</doc>

<doc>

<str name="id">6H500F0</str>

<str name="name">

Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300

</str>

<str name="manu">Maxtor Corp.</str>

</doc>

<doc>

<str name="id">MA147LL/A</str>

<str name="name">Apple 60 GB iPod with Video Playback Black</str>

<str name="manu">Apple Computer Inc.</str>

</doc>

<doc>

<str name="id">TWINX2048-3200PRO</str>

<str name="name">

CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail

</str>

<str name="manu">Corsair Microsystems Inc.</str>

</doc>

<doc>

<str name="id">VS1GB400C3</str>

<str name="name">

CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail

</str>

<str name="manu">Corsair Microsystems Inc.</str>

</doc>

<doc>

<str name="id">VDBDB1A16</str>

<str name="name">

A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM

</str>

<str name="manu">A-DATA Technology Inc.</str>

</doc>

<doc>

<str name="id">3007WFP</str>

<str name="name">Dell Widescreen UltraSharp 3007WFP</str>

<str name="manu">Dell, Inc.</str>

</doc>

<doc>

<str name="id">VA902B</str>

<str name="name">ViewSonic VA902B - flat panel display - TFT - 19"</str>

<str name="manu">ViewSonic Corp.</str>

</doc>

<doc>

<str name="id">0579B002</str>

<str name="name">Canon PIXMA MP500 All-In-One Photo Printer</str>

<str name="manu">Canon Inc.</str>

</doc>

</result></response>

Shutting down SOLR:

To shutdown Solr,go to ‘<solr-installation-directory>/bin’directory and use below given command. This will shut down Solr. You can specify the specific solr instance port if want to shutdown specific instance, else use ‘-all’ to shut down all instances.

solr stop -all

solr stop -p <port-number> e.g. solr stop -p 8983

Setting SOLR HOME:

You can set solr home (solr.sole.home) directory while starting the solr. To know what is current location of ”solr_home”, start solr using “solr start” command and execute “solr status”, it will display “solr_home” path on console. By default it will be ‘<solr-installation-directory>/ server/solr’.

To set the new “solr_home” directory execute following command while starting solr.

solr start -s C:\mycores

solr start -s C:\mycores -p 8983 (If running the instance on specific port)

This will set the “solr_home” to ‘C:\mycores’. Note that before you set up your solr home directory you should create “solr.xml” inside the directory which you want to make solr_home. Contents of the solr.xml should contain at least below given configuration.

<solrcloud>

<str name="host">${host:}</str>

<str name="hostContext">${hostContext:solr}</str>

<int name="hostPort">${jetty.port:8984}</int>

</solrcloud>

Creating Custom Cores:

You can create your own ‘cores’ using following ways:

1- Create schema.xml, solrconfig.xml and core.properties under ‘<solr-installation-directory>/server/solr’ directory.

Core will be loaded automatically when solr server will start. In this case it is mandatory to create configurations inside <solr-installation-directory>/server/solrdirectory as given above.

ü Go to ‘<solr-installation-directory>/server/solr’directory.It default solr_home directory where all solr instances and cores will be created by default. You will also notice that there is a ‘solr.xml’ file which comes with OOTB solr. This ‘solr.xml’ file is for configuring one or more Solr Cores, as well as allowing Cores to be added, removed, and reloaded via HTTP requests. ‘solr.xml’ contains a‘<solrcloud/>’ configuration. When SOLR finds this tag, it searches for core.properties in all its subfolders and loads them. You can also create your own configuration in ‘solr.xml’;

We will create our own solr.xml later.For more info see:http://wiki.apache.org/solr/CoreAdmin

ü Create a directory e.g. books_search.The name of this directory and name of ‘core’ should be same. In other words create the directory with same name which you want to use to create solr ‘core’.

ü Go to newly created directory‘books_search’, e.g. '<solr-installation-directory>/server/solr/books_search’ or 'C:\solr-5.5.0\server\solr\books_search’.

ü Create a directory with name ‘data’.This directory will be used by solr to maintain indexes for the data created under your core.

ü Create a directory with name ‘conf’. Here we will have core specific configurations such as ‘schema.xml’ and ‘solrconfig.xml’.

ü Go to newly created directory ‘conf’ and create ‘solrconfig.xml’.

Visit http://wiki.apache.org/solr/SolrConfigXml for details on configurations.

For now we will keep below given configuration.

<?xml version='1.0' encoding='UTF-8' ?>

<!-- For more details about configurations options that may appear in

this file, see http://wiki.apache.org/solr/SolrConfigXml-->

<config>

<luceneMatchVersion>5.5.0</luceneMatchVersion>

<!-- 'dataDir' parameter is used to specify an alternate directory to hold all index data other than the default './data' under the Solr home. If replication is in use, this should match the replication configuration.If this directory is not absolute, then it is relative to the directory you're in when you start SOLR.

If you do not specify dataDir then 'data' folder will be automatically created under your core parallel to 'conf' folder.-->

<!-- We need to register query handler in order to query data.

Here 'standard' request handler is a query handler and implicitly mapped on path '/select' -->

<requestHandler name="standard" class="solr.StandardRequestHandler" />

<admin>

<defaultQuery>*:*</defaultQuery>

</admin>

</config>

ü Go to created directory ‘conf’ and create ‘schema.xml’.For details on field configuration visit: http://wiki.apache.org/solr/SchemaXml.

For now we will keep following configuration.

<?xml version='1.0' encoding='UTF-8' ?>

<schema name='books_search' version='1.6'>

<types><fieldtype name="string" class="solr.StrField" />

<fieldtype name="long" class="solr.TrieLongField" />

</types>

<fields>

<!-- indexed=true makes a field searchable (and sortable and facetable).

For e.g., if you have a field named test1 with indexed=true, then you can search it like q=test1:foo, where foo is the value you are searching for.

If indexed=false for field test1 then that query will return no results, even if you have a document in Solr with test1's value being foo. -->

<field name="id" type="long" indexed="true" stored="true" required="true" />

<field name="title" type="string" indexed="true" stored="true" required="true" />

<field name="author" type="string" indexed="true" stored="true" required="false"/>

<dynamicField name="*_str" type="string" indexed="true" stored="true"multiValued="true"/>

<field name="fullText" type="string" indexed="true" multiValued="true"/></fields>

<copyField source="*" dest="fullText"/>

<uniqueKey>id</uniqueKey>

</schema>

ü Go back to ‘<solr-installation-directory>/server/solr/books_search’ e.g. ‘C:\solr-5.5.0\server\solr\books_search’ directory.

ü Create ‘core.properties’ file under ‘books_search’ directory and add following configurations. Please note that this file should contain at least ‘name’ property. Other properties will be taken default if not provided externally in core.properties. E.g. if config=mysolrconfig.xml (you can provide any name) is not provided SOLR will search for ‘solrconfig.xml’ (a default solr config file name) in <core>/conf directory.

#For more details on core.properties visit: #https://cwiki.apache.org/confluence/display/solr/Defining+core.properties

#Name of the core

name=book_search

#SOLR Config xml name, It can be any name e.g. mysolrconfig.xml.

config=mysolrconfig.xml

#SOLR schema xml name, It can be any name e.g. myschema.xml

schema=myschema.xml

#Indicates SOLR that, load this core on startup.

loadOnStartup=true

#Data Directory, we can provide data directory here as well.

#dataDir=C:\mycores\solr\books_search\data

ü Start the SOLR instance using 'solr start -p 8983’ command. It should automatically load the core configurations and you can see the newly created core in Admin UI. Open http://localhost:8983/solr/#/~cores/. You will see below details on admin UI.

Note: You should either use this way to create cores or use below given approaches to create cores. Otherwise configurations as part of “solr.xml” will override the configurations given in “solr.xml” in below mentioned approaches.In other words “solr.xml” in‘<solr-installation-directory>/server/solr’ e.g. ‘C:\solr-5.5.0\server\solr’directory will take preference.

2. Create solr.xml, schema.xml and solrconfig.xml outside solr installation directory.

ü Go to any preferred location on file system e.g. ‘/local/mycores/solr’ or ‘C:\mycores\solr’ (on windows) directory.

ü Set the solr_hometo ‘/local/mycores/solr’ or ‘C:\mycores\solr’ (on windows) and start SOLR.

Use ‘solr start -s C:\mycores\solr’ command to set solr_homeand start SOLR.

ü Create ‘solr.xml’ (as discussed above that we can create our own configuration) with below given contents. This enables automatic discovery of solr cores within ‘/local/mycores/solr’ or ‘C:\mycores\solr’ (on windows) directory.

<?xml version="1.0" encoding="UTF-8" ?>

<!--

This is a sample of a simple "solr.xml" file for configuring one or

more Solr Cores, as well as allowing Cores to be added, removed, and

reloaded via HTTP requests. More information about options available in this configuration file, and Solr Core administration can be found online: http://wiki.apache.org/solr/CoreAdmin

-->

<solr>

<solrcloud/>

</solr>

ü Create a directory e.g. search. The name of this directory and name of ‘core’ should be same. In other words create the directory with same name which you want to use to create solr ‘core’.

ü Go to newly created directory ‘search’. e.g. ‘/local/mycores/solr/search’ or ‘C:\mycores\solr\search’ (on windows).

ü Create a directory with name ‘data’. This directory will be used by solr to maintain indexes for the data created under your core.

ü Create a directory with name ‘conf’. Here we will have core specific configurations such as ‘schema.xml’and ‘solrconfig.xml’.

ü Go to newly created directory ‘conf’ and create ‘solrconfig.xml’. Visit http://wiki.apache.org/solr/SolrConfigXml for details on configurations.

For now we will keep below given configuration.

<?xml version='1.0' encoding='UTF-8' ?>

<config>

<luceneMatchVersion>5.5.0</luceneMatchVersion>

<!—Please note that this data directory should already exist, otherwise solr will throw error-->

<!-- We need to register query handler in order to query data.

Here 'standard' request handler is a query handler and implicitly mapped on path '/select' -->

<requestHandler name="standard" class="solr.StandardRequestHandler" />

<admin>

<defaultQuery>*:*</defaultQuery>

</admin>

</config>

ü Go to created directory ‘conf’ and create ‘schema.xml’. For details on field configuration visit: http://wiki.apache.org/solr/SchemaXml

For now we will keep following configuration.

<?xml version='1.0' encoding='UTF-8' ?>

<schema name='search' version='1.6'>

<types>

<fieldtype name="string" class="solr.StrField" />

<fieldtype name="long" class="solr.TrieLongField" />

</types>

<fields>

<field name="id" type="long" indexed="true" stored="true" required="true" />

<field name="title" type="string" indexed="true" stored="true" required="true" />

<field name="author" type="string" indexed="true" stored="true" required="false"/>

<dynamicField name="*_str" type="string" indexed="true" stored="true"multiValued="true"/>

<field name="fullText" type="string" indexed="true"multiValued="true"/>

</fields>

<copyField source="*" dest="fullText"/>

<uniqueKey>id</uniqueKey></schema>

ü Go to http://localhost:8983/solr/#/~cores

ü Click on ‘Add Core’

ü Provide core ‘name’, ‘instanceDir’ (instance directory) and ‘dataDir’ (data directory) values and leave other fields as is. Here I am creating core with name ‘search’.

ü Click on “Add Core”, it will create new core ‘search’. You can view the core details by visiting: http://localhost:8983/solr/#/~cores/search

3. Create solr.xml, schema.xml and solrconfig.xml outside solr installation directory, and use command line.

ü Go to any preferred location on file system e.g. ‘/local/mycores/solr’ or ‘C:\mycores\solr’ (on windows) directory.

ü Set the solr_hometo ‘/local/mycores/solr’ or ‘C:\mycores\solr’ (on windows) and start SOLR. Use ‘solr start -s C:\mycores\solr’ command to set solr_homeand start SOLR.

ü Create ‘solr.xml’with below given contents.

<?xml version="1.0" encoding="UTF-8" ?>

<solr>

<solrcloud/>

</solr>

ü Create a directory e.g. example. The name of this directory and name of core should be same. In other words create the directory with same name which you want to use to create solr core.

ü Go to newly created directory ‘example’. e.g. ‘/local/mycores/solr/example’ or ‘C:\mycores\solr\example’ (on windows).

ü Create a directory with name ‘data’. This directory will be used by solr to maintain indexes for the data created under your core.

ü Create a directory with name ‘conf’. Here we will have core specific configurations such as‘schema.xml’ and ‘solrconfig.xml’.

ü Go to newly created directory ‘conf’ and create ‘solrconfig.xml’. Visit http://wiki.apache.org/solr/SolrConfigXml for details on configurations.

For now we will keep below given configuration.

<?xml version='1.0' encoding='UTF-8' ?>

<config>

<luceneMatchVersion>5.5.0</luceneMatchVersion>

<dataDir>C:\mycores\solr\example\data</dataDir>

<!-- We need to register query handler in order to query data.

Here 'standard' request handler is a query handler and implicitly mapped on path '/select' -->

<requestHandler name="standard" class="solr.StandardRequestHandler" />

<admin>

<defaultQuery>*:*</defaultQuery>

</admin>

</config>

ü Go to created directory ‘conf’ and create ‘schema.xml’. For details on field configuration visit: http://wiki.apache.org/solr/SchemaXml,

For now we will keep following configuration.

<?xml version='1.0' encoding='UTF-8' ?>

<schema name='example' version='1.6'>

<types>

<fieldtype name="string" class="solr.StrField" />

<fieldtype name="long" class="solr.TrieLongField" />

</types>

<fields>

<field name="id" type="long" indexed="true" stored="true" required="true" />

<field name="title" type="string" indexed="true" stored="true" required="true" />

<field name="author" type="string" indexed="true" stored="true" required="false"/>

<dynamicField name="*_str" type="string" indexed="true" stored="true"multiValued="true"/>

<field name="fullText" type="string" indexed="true"multiValued="true"/>

</fields>

<copyField source="*" dest="fullText"/>

<uniqueKey>id</uniqueKey>

</schema>

ü Open command prompt, Go to ‘<solr-installation-directory>/bin’ and type below given command.

solr create_core [-c name] [-d confdir] [-p port]

e.g.solr create_core-c example -d C:\mycores\solr\example -p 8983

or(for Linux)

solr create_core -c example -d /local/mycores/solr/example -p 8983

The Solr create command has following options:

§ -c <name> - Name of the core or collection to create (required).

§ -d <confdir> - The configuration directory, useful in the SolrCloud mode.

§ -n <configName> – The configuration name. This defaults to the same name as the core or collection.

§ -p <port> – Port of a local Solr instance to send the create command to; by default the script tries to detect the port by looking for running Solr instances.

§ -s <shards> – Number of shards to split a collection into, default is 1.

§ -rf <replicas> – Number of copies of each document in the collection. The default is 1.

ü You can see below response after executing below command:

Copying configuration to new core instance directory:

C:\solr-5.5.0\server\solr\example

Creating new core 'example' using command:

http://localhost:8983/solr/admin/cores?action=CREATE&name=example&instanceDir=example

{

"responseHeader":{

"status":0,

"QTime":218},

"core":"example"

}

ü It will create new core ‘example’. You can view the core details by visiting: http://localhost:8983/solr/#/~cores/example

4. Create solr.xml, schema.xml & solrconfig.xml outside solr installation directory, and use http rest call.

ü Go to any preferred location on file system e.g. ‘/local/mycores/solr’ or ‘C:\mycores\solr’ (on windows) directory.

ü Create ‘solr.xml’with below given contents.

<?xml version="1.0" encoding="UTF-8" ?>

<solr>

<solrcloud/>

</solr>

ü Create a directory e.g. example2. The name of this directory and name of core should be same. In other words create the directory with same name which you want to use to create solr core.

ü Go to newly created directory ‘example2’. e.g. ‘/local/mycores/solr/example2’ or ‘C:\mycores\solr\example2’ (on windows).

ü Create a directory with name ‘data’. This directory will be used by solr to maintain indexes for the data created under your core.

ü Create a directory with name ‘conf’. Here we will have core specific configurations such as‘schema.xml’ and ‘solrconfig.xml’.

ü Go to newly created directory ‘conf’ and create ‘solrconfig.xml’. Visit http://wiki.apache.org/solr/SolrConfigXml for details on configurations.

For now we will keep below given configuration.

<?xml version='1.0' encoding='UTF-8' ?>

<config>

<luceneMatchVersion>5.5.0</luceneMatchVersion>

<dataDir>C:\mycores\solr\example2\data</dataDir>

<!-- We need to register query handler in order to query data.

Here 'standard' request handler is a query handler and implicitly mapped on path '/select' -->

<requestHandler name="standard" class="solr.StandardRequestHandler" />

<admin>

<defaultQuery>*:*</defaultQuery>

</admin>

</config>

ü Go to created directory ‘conf’ and create ‘schema.xml’. For details on field configuration visit: http://wiki.apache.org/solr/SchemaXml

For now we will keep following configuration.

<?xml version='1.0' encoding='UTF-8' ?>

<schema name='examples' version='1.6'>

<types>

<fieldtype name="string" class="solr.StrField" />

<fieldtype name="long" class="solr.TrieLongField" />

</types>

<fields>

<field name="id" type="long" indexed="true" stored="true" required="true" />

<field name="title" type="string" indexed="true" stored="true" required="true" />

<field name="author" type="string" indexed="true" stored="true" required="false"/>

<dynamicField name="*_str" type="string" indexed="true" stored="true"multiValued="true"/>

<field name="fullText" type="string" indexed="true"multiValued="true"/>

</fields>

<copyField source="*" dest="fullText"/>

<uniqueKey>id</uniqueKey>

</schema>

ü Open web browser and use below REST URL to create core.

http://localhost:8983/solr/admin/cores?action=CREATE&name=example2&instanceDir=C:\mycores\solr\example2&dataDir=C:\mycores\solr\example2\data

ü It will create new core with name ‘example2’ and will return below response.

<response>

<lst name="responseHeader">

<int name="status">0</int>

<int name="QTime">162</int>

</lst>

<str name="core">example2</str>

</response>

ü You can view the core details by visiting: http://localhost:8983/solr/#/~cores/example2

Creating documents and indexes:

In Solr, a Document is the unit of search and index.Solr can achieve fast search results because instead on searching on actual text it searches on Indexes.

This is similar to finding pages in a book related to a keyword by scanning the index/glossary at the back of a book, instead of searching every word of every page of the book.

This type of index is called an inverted index, because it inverts a page-centric data structure (page->words) to a keyword-centric data structure (word->pages).

Solr stores this index in a directory called “index” in the data directory.

Schema:

Before adding documents to Solr, you need to specify the schema, represented in a file called schema.xml. Please note that, we should not change the schema after documents are added to index. It is not advisable.

The schema declares:

What type of fields there are
Which field should be used as the unique/primary key
Which fields are required
How to index and search each field

Field Types:

In Solr, every field has a type. Examples of basic field types available in Solr include, float, long, double, date, text etc.

Defining a field:

A field in SOLR schema will look something like this:

<field name="id" type="long" indexed="true" stored="true" multiValued="true"/>

name: Name of the field
type: Field type
indexed: Should this field be added to the inverted index?
stored: Should the original value of this field be stored?
multiValued: Can this field have multiple values?

The indexed and stored attributes are important because "indexed=true" makes a field searchable (and sortable and facetable).

For e.g., if i have a field named "title" with indexed=true, then you can search it like "q=title:Java",where "Java" is the value you are searching for. If indexed=false for field "title" then that query will return no results, even if you have a document in Solr with title's value being foo.

Whereas"stored=true" means you can retrieve the field when you search. If you want to explicitly retrieve the value of a field in your query, you will use the "fl" param in your query like "fl=title" (Default is fl=* meaning retrieve all stored fields). Only if "stored=true" for title, the value will be returned. Else it will not be returned.

It is also advised that you should not store all the fields all the time, because storing fields increases the size of the index, and the larger the index, the slower the search. Larger index requires more disk IO to get to the same amount of data.

What happens when you load documents to SOLR for indexing?

When you add documents to SOLR, it goes through various transformations such as include lower-casing, removing word stems etc. before being added to the index.

This is known as analysis phase. After the analysis phase, a sequence of tokens will be generated which will be added to the index. Tokens are not the original text; these are keywords which you may use for search query. Indexed fields (fields which are configured to indexed="true" in schema.xml) are fields which will undergo an analysis phase, and are added to the index at the end.

Also note that, if a field is not indexed, it cannot be searched on.

Let’s say that I have created a new core called ‘books’, If you will open solr admin UI and open core configurations, you will notice that “numDocs”and “maxDoc” count is 0. It means there are no documents loaded. Now let’s try to load documents for creating indexes. We can load XML, JSON & CSV into SOLR, and solr will create indexes based on schema defined in “schema.xml” (Refer schema definitions explained above).

SOLR accepts XML, JSON and CSV documents in a specified format. It has certain schema/format which we need to follow to create XML, JSON and CSV for loading. So let’s see how XMLs, JSONs and CSVs will look like.

Format of XML:

We need to create input XML for SOLR in below format. Start element will be <add> always and inside <add> element you can have multiple <doc> elements. Also note that input XML should be aligned with the “schema.xml” used while creating core.

If you see the sample below and compare schema.xml used above while creating cores, you will notice that field names are same. Exactly same field names we have to use inside <doc> element.

<add>

<doc>

<field name="id">101</field>

<field name="title">Java</field>

<field name="author">Balaguruswami</field>

<field name="language_str">English</field>

<field name="language_str">French</field>

<field name="bookstore_str">Hyderabad</field>

<field name="bookstore_str">Bangalore</field>

<field name="fullText">This is a java book.</field>

....

</doc>

<doc>

<field name="id">102</field>

<field name="title">C</field>

<field name="author">Yashwant Kanetkar</field>

<field name="language_str">English</field>

<field name="language_str">French</field>

<field name="bookstore_str">Hyderabad</field>

<field name="bookstore_str">Delhi</field>

<field name="fullText">This is a C book.</field>

...

</doc>

...more <doc> elements

</add>

Format of JSON:

We need to create JSON in below given format.

[

{

"id" : "103",

"title" : "The Lightning Thief",

"author" : "Rick Riordan",

"language_str" :"English",

"language_str" :"French",

“bookstore_str" : "London",

"bookstore_str" : "Paris"

}

{

"id" : "104",

"title" : "The Sea of Monsters",

"author" : "Rick Riordan",

"language_str" :"English",

"language_str" :"French",

"bookstore_str" : "London",

"bookstore_str" : "Paris"

}

…..more json data

]

Format of CSV:

Create a CSV file (comma separated as given below) with column name as per the fields defined in schema.xml. See sample below.

id,title,author,language_str,bookstore_str,fullText,

105,A Storm of Swords,George R.R. Martin,English,New york,A Storm of Swords is the third of seven planned novels in A Song of Ice and Fire, a fantasy series by American author

106,The Black Company,Glen Cook,English,New york,The Black Company is a series of fantasy novels by author Glen Cook. The series combines elements of epic fantasy and dark fantasy as it follows an elite mercenary unit

There are following ways to load documents into SOLR:

1- Using command line:

We can load documents into SOLR using command line.

ü Go to <solr-installation-directory>/example/exampledocs i.e. C:\solr-5.5.0\example\exampledocs (on windows).

ü Find the “post.jar” file and copy to your data input directory e.g. “C:\solr_input_data”.

You can use following command to get usage details of “post.jar”. It is a simple java program which takes input data, parses it and uploads to SOLR using Http REST services.

java -jar post.jar --help

ü Run below given command for XML:

java -Dc=books -jar -Dtype=application/xml post.jar books.xml

java -Dc=books -jar -Dtype=application/xml post.jar *.xml (To upload multiple xml files)

ü You will see following response once data is loaded.

SimplePostTool version 5.0.0

Posting files to [base] url http://localhost:8983/solr/books/update using content-type application/xml...

POSTing file books.xml to [base]

1 files indexed.

COMMITting Solr index changes to http://localhost:8983/solr/books/update...

Time spent: 0:00:00.669

ü Now let’s see the status of core ‘books’ on solr admin UI.

ü In the above screen shot you can see that ‘numDocs’ and ‘maxDoc’ count has been increased to 2, since we added two <doc> elements (Refer sample given above). Similarly you can upload all type for files such as JSON, CSV etc.

ü Run below given command for JSON:

java -Dc=books -jar -Dtype=application/jsonpost.jar books.json

java -Dc=books -jar -Dtype=application/jsonpost.jar *.json (To upload multiple json docs)

ü Run below given command for CSV:

java -Dc=books -jar -Dtype=text/csv post.jar books.csv

java -Dc=books -jar -Dtype=text/csv post.jar *.csv (To upload multiple csv docs)

ü Now if you look at the core details on admin UI (as shown above), you will notice that the no. of documents is increased to 6. It is because we added 2 documents using xml, 2 using JSON and 2 using CSV.

2- Using Http REST Client or CURL:

We can load documents into SOLR using curl command or any RESTClient such as SOAPUI, POSTMAN etc.

ü Open command prompt and run below curl commands to load documents to SOLR.

Syntax:

curl -X POST "http://<host>:<port>/solr/<core-name>/update?commit=true" -H 'Content-Type:<MimeType-Of-Document> -d @<document-name>

For XML:

curl -X POST "http://localhost:8983/solr/books/update?commit=true" -H 'Content-Type:application/xml' -d @books.xml

For JSON:

curl -X POST "http://localhost:8983/solr/books/update?commit=true" -H 'Content-Type:application/json' -d @books.json

For CSV:

curl -X POST "http://localhost:8983/solr/books/update?commit=true" -H 'Content-Type:text/csv' -d @books.csv

3- Using SOLR Admin UI:

We can load documents into SOLR using admin UI.

ü Open admin UI (http://localhost:8983/solr/#/)

ü Go to ‘Core Selector’ dropdown menu displayed as given below.

ü Select the core where you want to upload documents. E.g. books

ü Once you select core you will see following menu:

ü Go to Documents menu. you will see following screen:

ü You can see "/update" request handler on screen, It will post the documents to SOLR in selected formats such as JSON, XML, CSV etc. (see Document Type dropdown). It is same as calling "http://localhost:8983/solr/books/update" from http rest client. Let’s upload adocument in JSON format.

ü Add a JSON document in "Documents" text box, selected "Document Type" as "JSON", leave other field with default value and click on “Submit Document“.

ü Add XML documents in "Documents" text box, selected "Document Type" as "XML", leave other field with default value and click on "Submit Document".

ü Add CSV (comma separated text) document in "Documents" text box, selected "Document Type" as "CSV", leave other field with default value and click on "Submit Document".

There are many other ways to load documents in SOLR such as:

Index binary documents such as Word and PDF with Solr Cell (ExtractingRequestHandler).Solr doesn't "end up store the PDF file" itself.However it can store the text contents of the PDF extracted from the PDF using a text-extractor such as Tika (if indeed the field is marked as stored in the schema). If you wish to store the PDF file in its entirety you will need to convert the PDF into (for example) Base64 representation and persist the base64 string as a "Stored" field. So when you access the doc you convert back from Base64 to PDF.
Use SolrJ for Java or other Solr clients to programmatically create documents to send to Solr.

Search:

Let’s try to search the recently uploaded documents.

Open SOLR admin UI; http://localhost:8983/solr
Select the core (e.g. books) and go to query menu. You can directly navigate to query console using following URL: http://localhost:8983/solr/#/books/query

Default search query (q=*:*) i.e. http://localhost:8983/solr/books/select?q=*:*&wt=json will return all the documents available in solr index.
Get all books which are available in “English” language and get only ‘id, author and title’ in result. Search query will look like:

http://localhost:8983/solr/books/select?q=language_str:English&fl=id,title,author&wt=json&indent=true

Result:

{ "responseHeader":{

"status":0,

"QTime":1},

"response":{"numFound":7,"start":0,"docs":[

{

"id":104,

"title":"The Sea of Monsters",

"author":["Rick Riordan"]},

{

"id":103,

"title":"The Lightning Thief",

"author":["Rick Riordan"]},

{

"id":105,

"title":"A Storm of Swords",

"author":["George R.R. Martin"]},

{

"id":101,

"title":"Java",

"author":["Balaguruswami"]},

{

"id":102,

"title":"C",

"author":["Yashwant"]},

{

"id":107,

"title":"C++",

"author":["Balaguruswami"]},

{

"id":108,

"title":"Phthon",

"author":["XYZ"]}]

}}

Phrase Query:

A phrase query matches multiple terms (words) in sequence.

Get all books whose title has "The Lightning Thief" phrase. Following will be the search query:

http://localhost:8983/solr/books/select?q=title:"The Lightning Thief"&wt=json&indent=true

Boolean Operators:

Lucene supports AND, OR, NOT, "+" (must occur),and "-"(must not occur) as Boolean operators (Note: Boolean operators must be ALL CAPS).

The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND.

Get all books which are available in “English” and “French” language and get only ‘id, author and title’ in result. Following will be the search query:

http://localhost:8983/solr/books/select?q=language_str:English AND language_str:French&fl=id,title,author&wt=json&indent=true

The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using set. The symbol || can be used in place of the word OR.

Get all books whose book stores are in “Hyderabad” and are available in “English” or “French” language and get only ‘id, author and title’ in result. Following will be the search query:

http://localhost:8983/solr/books/select?q=bookstore_str :Hyderabad AND (language_str:EnglishOR language_str:French) &fl=id,title,author&wt=json&indent=true

The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol “!” can be used in place of the word NOT.

Get all books whose book stores are in “Hyderabad” and not in “Delhi”. Following will be the search query:

http://localhost:8983/solr/books/select?q=bookstore_str :Hyderabad NOT bookstore_str:Delhi&wt=json&indent=true

Note: The NOT operator cannot be used with just one term.

The "+" or occur must requires that the term after the "+" symbol exist somewhere in the field of a single document.

Get all books whose author name must contain“Yashwant” and may contain “Balaguruswami”. Following will be the search query:

http://localhost:8983/solr/books/select?q=+YashwantBalaguruswami&df=author&wt=json&indent=true

Result:

  "responseHeader":{

    "status":0,

    "QTime":2},

  "response":{"numFound":2,"start":0,"docs":[

        "id":101,

        "title":"Java",

        "author":"Balaguruswami"},

        "id":102,

        "title":"C",

        "author":"Yashwant"}]

}}

The "-" or must not occur excludes documents that contain the term after the "-" symbol.

Get all books whose author name must not contain “Yashwant” and may contain “Balaguruswami”. Following will be the search query:

http://localhost:8983/solr/books/select?q=Yashwant -Balaguruswami&df=author&fl=id,title,author&wt=json&indent=true

Result:

  "responseHeader":{

    "status":0,

    "QTime":2},

  "response":{"numFound":1,"start":0,"docs":[

        "id":102,

        "title":"C",

        "author":"Yashwant"}]

}}

Wildcards:

Get all books whose title starts with “The”, Following will be the search query:

http://localhost:8983/solr/books/select?q=title:The*&wt=json&indent=true

Get all books whose title starts with “The” and ends with “Thief”, Following will be the search query:

http://localhost:8983/solr/books/select?q=title:The*Thief&wt=json&indent=true

Note: Lucene doesn't support using a "*" symbol as the first character of a search. e.g." *Thief"

Range search:

Get all books whose ids are between 101 to 104, Following will be the search query:

http://localhost:8983/solr/books/select?q=id:[101 TO 104]&wt=json&indent=true

Sorting:

Get all books which are available in “French” language, get only ‘id, author and title’ in result, sort the result on ‘id’ in ascending order and get only 5 results in response. Following will be the search query:

http://localhost:8983/solr/books/select?q=language_str:French &start=0&rows=5&sort=id asc&fl=id,title,author&wt=json&indent=true

Result:

  "responseHeader":{

    "status":0,

    "QTime":2},

"response":{"numFound":6,"start":0,"docs":[

        "id":101,

        "title":"Java",

        "author":["Balaguruswami"]},

        "id":102,

        "title":"C",

        "author":["Yashwant"]},

        "id":103,

        "title":"The Lightning Thief",

        "author":["Rick Riordan"]},

        "id":104,

        "title":"The Sea of Monsters",

        "author":["Rick Riordan"]},

        "id":107,

        "title":"C++",

        "author":["Balaguruswami"]}]

}}

Get all books which are available in “French” language, get only ‘id, author and title’ in result, sort the result on ‘id’ in descending order and get only 10 results in response. Following will be the search query:

http://localhost:8983/solr/books/select?q=language_str:French &start=0&rows=10&sort=id desc&fl=id,title,author&wt=json&indent=true

Default Field (df):

Get all books which are available in “French” language using default query (df), get only ‘id, author and title’ in result and sort the result on ‘id’ in ascending order. Following will be the search query:

http://localhost:8983/solr/books/select?q=French&sort=id asc&fl=id,title,author&df=language_str&wt=json&indent=true

Notice in the query, we are passing "df=language_str" and "q=French". If "df" is not provided then an exception will be raised by SOLR.

Facets:

Get all books which are available in “English” language, get only ‘id, author and title’ in result, sort the result on ‘id’ in ascending order and get only 10 results in response with facets on id and author fields. Following will be the search query:

http://localhost:8983/solr/books/select?q=language_str:French&sort=id asc&start=0&rows=10&fl=id,title,author&wt=json&indent=true&facet=true&facet.field=id&facet.field=author

Result (see facets in bold):

  "responseHeader":{

    "status":0,

    "QTime":14},

  "response":{"numFound":6,"start":0,"docs":[

        "id":101,

        "title":"Java",

        "author":["Balaguruswami"]},

        "id":102,

        "title":"C",

        "author":["Yashwant"]},

        "id":103,

        "title":"The Lightning Thief",

        "author":["Rick Riordan"]},

        "id":104,

        "title":"The Sea of Monsters",

        "author":["Rick Riordan"]},

        "id":107,

        "title":"C++",

        "author":["Balaguruswami"]},

        "id":108,

        "title":"Phthon",

        "author":["XYZ"]}]

},

  "facet_counts":{

    "facet_queries":{},

    "facet_fields":{

      "id":[

        "101",1,

        "102",1,

        "103",1,

        "104",1,

        "107",1,

        "108",1,

        "105",0],

      "author":[

        "Balaguruswami",2,

        "Rick Riordan",2,

        "XYZ",1,

        "Yashwant Kanetkar",1,

        "George R.R. Martin",0]},

    "facet_dates":{},

    "facet_ranges":{},

    "facet_intervals":{},

    "facet_heatmaps":{}}}

Highlighting (hl):

Take a look at the parameters used in the Solr query of the html.

hl – When set to true, enables highlighted snippets to be generated in the query response.
hl.q – Specifies an overriding query term for highlighting.
hl.fl – Specifies a list of fields to highlight.
hl.simple.pre – Specifies the text that should appear before a highlighted term. Default is "<em>"
hl.simple.post – Specifies the text that should appear after a highlighted term. Default is "</em>"
hl.snippets – Tells solr the number of highlighted snipets to be shown in your result xml, if you specify a number 4 that solr will highlight starting 4 entries of search keyword in your search result.

Get all books which are available in “French” language using, get only ‘id, author and title’ in result and sort the result on ‘id’ in ascending order and highlight the results on all matching fields. Following will be the search query:

http://localhost:8983/solr/books/select?q=language_str:French&sort=id asc&fl=id,title,author&wt=json&indent=true&hl=true&hl.fl=*&hl.simple.pre=<b>&hl.simple.post=</b>&hl.snippets=5

Result:

  "responseHeader":{

    "status":0,

    "QTime":54},

  "response":{"numFound":4,"start":0,"docs":[

        "id":102,

        "title":"C",

        "author":["Yashwant Kanetkar"]},

        "id":103,

        "title":"The Lightning Thief",

        "author":["Rick Riordan"]},

        "id":104,

        "title":"The Sea of Monsters",

        "author":["Rick Riordan"]},

        "id":108,

        "title":"Phthon",

        "author":["XYZ"]}]

},

  "highlighting":{

    "102":{

      "fullText":["<b>French</b>"],

      "language_str":["<b>French</b>"]},

    "103":{

      "fullText":["<b>French</b>"],

      "language_str":["<b>French</b>"]},

    "104":{

      "fullText":["<b>French</b>"],

      "language_str":["<b>French</b>"]},

    "108":{

      "fullText":["<b>French</b>"],

      "language_str":["<b>French</b>"]}}}

Refer to following for search handler configuration:

1- http://wiki.apache.org/solr/SearchHandler

Refer to following for more details on queries:

1- http://lucene.apache.org/core/3_5_0/queryparsersyntax.html

2- https://wiki.apache.org/solr/CommonQueryParameters

3- https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters

4- https://wiki.apache.org/solr/SolrQuerySyntax

Refer to following for more details on facets:

1- http://wiki.apache.org/solr/SolrFacetingOverview

2- http://wiki.apache.org/solr/SimpleFacetParameters

Refer to following for more details on highlighting:

1- http://wiki.apache.org/solr/HighlightingParameters

2- https://cwiki.apache.org/confluence/display/solr/Highlighting

Refer to following for more details on configuring tokanizers/analysers/synonyms/stop words:

1- https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

2- http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

Deleting documents:

You can delete data by sending a POST request with<delete> command to the update URL and specifying the value of the document's unique key field, or a query that matches multiple documents (be careful with that one!). Since these commands are smaller, we will specify them right on the command line rather than reference an XML file.

Use following command to delete a specific document with unique key field ‘101’ but allow search for this document to work until update is not committed externally:

java -Dc=books -Ddata=args -Dcommit=false -jar post.jar "<delete><id>101</id></delete>"

Use following command to delete a specific document with query‘title:Java’but allow search for this document to work until update is not committed externally:

java -Dc=books -Ddata=args -Dcommit=false -jar post.jar "<delete><query>title:java</query></delete>"

It will print following output on console:

SimplePostTool version 5.0.0

POSTing args to http://localhost:8983/solr/books/update...

Time spent: 0:00:00.039

Because we have specified "commit=false", a search for id:101or title:Java we still find the document we have deleted.

Use following command to permanently delete a specific document with unique key field ‘101’:

java -Dc=books -Ddata=args -Dcommit=true -jar post.jar "<delete><id>101</id></delete>"

Use following command to permanently delete a specific document with query‘title:Java’:

java -Dc=books -Ddata=args -Dcommit=true -jar post.jar "<delete><query>title:java</query></delete>"

It will print following output on console:

SimplePostTool version 5.0.0

POSTing args to http://localhost:8983/solr/books/update...

COMMITting Solr index changes to http://localhost:8983/solr/books/update...

Time spent: 0:00:00.208

Because we have specified "commit=true", a search for id:101or title:Java we not find the document we have deleted. It is deleted permanently.

You can delete a document using CURL command as well:

curl -X POST "http://localhost:8983/solr/books/update?commit=true" -H 'Content-Type:application/xml' -d "<delete><query>author:Balaguruswami</query></delete>"

curl -X POST "http://localhost:8983/solr/books/update?commit=true" -H 'Content-Type:application/xml' -d "<delete><id>101</id></delete>"

Know about some common search terminologies:

The Java and Alfresco World

Friday, March 25, 2016

Getting started with SOLR 5

No comments:

Post a Comment

Popular Posts

Search This Blog

Featured Post

Setup ACS-7.x, ASS-2.x and Local Transformation Service using distribution package step by step Part-1

Friday, March 25, 2016

Getting started with SOLR 5

No comments:

Post a Comment

Subscribe To

Popular Posts

Search This Blog

Featured Post

Setup ACS-7.x, ASS-2.x and Local Transformation Service using distribution package step by step Part-1