Index Handlers are Request Handlers designed to add, delete and update documents to the index. In addition to having plugins for importing rich documents using Tika or from structured data sources using the Data Import Handler, Solr natively supports indexing structured documents in XML, CSV and JSON.
The default in solrconfig.xml is an UpdateHandler:
<!– The default high-performance update handler –>
DirectUpdateHandler2 implements an UpdateHandler where documents are added directly to the main Lucene index as opposed to adding to a separate smaller index.
Now it is going to be a bit code, so the code for Python can be found at github:
Ok, lets index a bunch of data, I have modified the schema abit:
We now have, desc, id, plant and tag field, we can run the get_fields in run_schema.py (from here on I will just refer to the method and not the file from the code at github):
Ok, lets generate some test data, run the genereate_test_data(), it will make a txt file with 2000 rows we can insert.
Ok, now the index is empty, we can check that with status_core():
Ok, lets index the 2000 rows, run index_dt_test_data():
Great, all done, now lets check solr, change the start, rows in the Query Gui to 0-2000:
Great, lets have a look in server\logs\solr.log and add the 2000 item to see what is happening using Baretail:
In order to get all the docs we need to alter the HTTP query the same way we did in th Query Gui, http://localhost:8983/solr/newcore/select?q=*:*&rows=2001
I made the get_docs_max() with a default=10000:
Lets run the status_core() to see the change in index size:
Great, all done with indexing 2000 items, but lets also try to delete all!!.
Run the index_remove_all(), it is easy to rebuild.
And then we are done, I will commit this tutorial with “Index build with DirectUpdateHandler2 HTTP” so it is easy to get the version of the repos.