Information Retrieval,  Python,  Solr

Apache Solr 7.5 create core, alter schema and query for API data

With reference to previous tutorial:

Core

A single Solr instance, which represents a single Solr index. A core has a different set of configuration files and schema definitions than other cores.

Now lets create a new core, after downloading and installing the files, have a look at the readme file in solr-7.5.0 directory (my pc; C:\solr_test\solr-7.5.0\solr-7.5.0):

Stop Solr if is running in another mode i.e cloud.

Great, now back to the readme file.

Getting Started
—————

To start Solr for the first time after installation, simply do:

bin/solr start

Lets check that no core is present:

Great, next:

This will launch a standalone Solr server in the background of your shell,
listening on port 8983. (Alternatively, you can launch Solr in “cloud” mode,
which allows you to scale out using sharding and replication. To launch Solr
in cloud mode, do:

bin/solr start -cloud)

To see all available options for starting Solr, please do:

bin/solr start -help

After starting Solr, create either a core or collection depending on whether
Solr is running in standalone (core) or SolrCloud mode (collection) by doing:

bin/solr create -c <name>

This will create a collection that uses a data-driven schema which tries to guess
the correct field type when you add documents to the index. To see all available
options for creating a new collection, execute:

bin/solr create -help

We need some help to make a core, config and schema…

You can also create a core in the admin:

Creating a core with cmd, stop Solr and start Solr to see that the core is there:

bin\solr start / bin\solr stop -all

bin\solr create -c newcore

 

Ok, now the newcore is stored and files is created:

 

 

Core properties:

#Written by CorePropertiesLocator
#Mon Oct 29 18:37:26 UTC 2018
name=newcore

Our solrconfig.xml and managed-schema in conf folder (solr-7.5.0\server\solr\newcore\conf), the empty index is in the data folder (solr-7.5.0\server\solr\newcore\data).

Now let’s modify the deafult managed-schema and a field tag, insert one tag and view the tag:

 

Json>

{
“id”:111,
“tag”:”Test1″
}

 

 

And here we have our Test1 tag, now lets look at managed-schema to find the tag field:

And there it is, so this is how you create a new core (one index) and alter the managed-schema.

Ok, then we have created a env named solrhttp in Python and installed:

certifi==2018.10.15
chardet==3.0.4
idna==2.7
requests==2.20.0
urllib3==1.24
Lets get some data from HTTP using requests lib:

 

Great data is there, now lets add a field to Solr schema and also a a new doc, then get the result back in Python.

We will add desc and insert the following Json:

[{
“id”: 111,
“tag”: “Test1”,
“desc”: “First test tag”
},
{
“id”: 112,
“tag”: “Test2”,
“desc”: “Second test tag”
}
]

Solr will throw an error if you use the gui /update in the format above, we need to clear/remove the array []:

{
“id”: 111,
“tag”: “Test1”,
“desc”: “First test tag”
}
,
{
“id”: 112,
“tag”: “Test2”,
“desc”: “Second test tag”
}

 

And here is the result as expected.

 

Set up in production:

Host Solr in Tomcat

Use DataImportHandler to feed data

Talk to Solr via HTTP

Parse the Json response using requests

The best pratice is to get the doc id from Solr and fetch the other information from source that has the data.

Comments Off on Apache Solr 7.5 create core, alter schema and query for API data