Skip to content
Menu
e-lo [IT Engineer life]
  • Home
    • Note
  • Database
    • T-SQL
    • SQL Server quick
    • SQL server docs
    • MySql quick sheet
    • Postgre
    • InfluxDB
  • Programming
    • MS Azure Powershell
    • MS Azure Command-Line Interface (CLI) doc
    • Python Docs
    • Python Logging
    • Python-cheat-sheet
    • Git-guide
  • Azure
    • MS Windows virtual machines in Azure
    • MS ARM Docs
    • MS ARM Template Docs
    • MS ARM Functions
    • MS Bicep+ARM
    • MS ARM Tutorial
    • MS Deployment scripts (intern/extern)
    • MS Virtual Network
  • Az-nutshell
    • ms-technology-choices-compute-decision-tree
    • ms-data-store-decision-tree
    • ms-data-explorer
    • ms-storage-explorer
    • ms-azure-sql
    • ms-common-data-services
    • ms-azure-mysql-daas
    • ms-sla
    • az paas
    • az glossary-quicksheet
    • az-test-vm-script-quickguide
  • Linux
    • Top CMD’s
    • Useful CMD Linux
    • ss64 Linux
    • Ubuntu
    • 30 things Ubuntu 18.04
    • Bootable Ubuntu USB
    • LinuxFilesystemTreeOverview
  • Sys Admin
    • System Administrator
    • Sys News
  • Zen
    • Not thinking about anything is Zen
e-lo [IT Engineer life]

Apache Solr 7.5 Index build with DirectUpdateHandler2 HTTP

Posted on November 3, 2018November 4, 2018 by espenk

http://lucene.apache.org/solr/guide/7_5/uploading-data-with-index-handlers.html

 

Index Handlers are Request Handlers designed to add, delete and update documents to the index. In addition to having plugins for importing rich documents using Tika or from structured data sources using the Data Import Handler, Solr natively supports indexing structured documents in XML, CSV and JSON.

The default in solrconfig.xml is an UpdateHandler:

<!– The default high-performance update handler –>
<updateHandler class=”solr.DirectUpdateHandler2″>

DirectUpdateHandler2 implements an UpdateHandler where documents are added directly to the main Lucene index as opposed to adding to a separate smaller index.

Now it is going to be a bit code, so the code for Python can be found at github:

https://github.com/spawnmarvel/solrhttp

Ok, lets index a bunch of data, I have modified the schema abit:

 

We now have, desc, id, plant and tag field, we can run the get_fields in run_schema.py (from here on I will just refer to the method and not the file from the code at github):

Ok, lets generate some test data, run the genereate_test_data(), it will make a txt file with 2000 rows we can insert.

Ok, now the index is empty, we can check that with status_core():

Ok, lets index the 2000 rows, run index_dt_test_data():

 

Great, all done, now lets check solr, change the start, rows in the Query Gui to 0-2000:

 

Great, lets have a look in server\logs\solr.log and add the 2000 item to see what is happening using Baretail:

 

In order to get all the docs we need to alter the HTTP query the same way we did in th Query Gui, http://localhost:8983/solr/newcore/select?q=*:*&rows=2001

I made the get_docs_max() with a default=10000:

Lets run the status_core() to see the change in index size:

Great, all done with indexing 2000 items, but lets also try to delete all!!.

Run the index_remove_all(), it is easy to rebuild.

 

And then we are done, I will commit this tutorial with “Index build with DirectUpdateHandler2 HTTP” so it is easy to get the version of the repos.

 

 

RSS Azure

  • Scale your cloud-native apps and accelerate app modernization with Azure, the best cloud for your apps May 24, 2022

RSS RabbitMQ

  • RabbitMQ 3.8.15 release

RSS Python

  • PEP 691: JSON-based Simple API for Python Package Indexes May 4, 2022

Tags

5 min (26) Ansible (1) ARM (10) azure (40) cmd (3) Django (4) Docker (1) e-lo (2) Flask (2) Github (9) Grafana (2) Information (7) Information Retrieval (13) JAVA (1) kivy (2) Kotlin (6) linux (15) mobile (2) Natural Language Prossesing (NLP) (2) Net.Core (1) Networking and Security (6) OPC (2) PEP8 (1) Philosophy (3) Python (47) Python Networking and Security (5) Reason (2) RMQ (3) Solr (11) Sql (10) Uncategorized (2) VSC (1) Warframe (2) WMVARE (4) Zabbix (7)

Recent Posts

  • 5 min Logic App Storage Table
  • 5 min Logic App PSQL
  • 5 min Logic App
  • 5 MIN Azure Data Explorer
  • TODO Build a Hash Table in Python With TDD Real Python

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Photo by Markus Spiske from Pexels "Matrix"

©2022 e-lo [IT Engineer life] | Powered by WordPress & Superb Themes