Information Retrieval,  Solr

Apache Solr 7.5 (techproducts tutorial) 2

In this tutorial I have moved the environment outside the virtualbox.

I installed the same version of Solr and JRE, and added the path for environment.

And also created two bat files for easy start and stop of the server.

I go over the steps in the tutorial below Index techproducts example data:

https://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-1

To launch Solr

bin\solr.cmd start -e cloud on Windows

The first prompt asks how many nodes we want to run. Note the [2] at the end of the last line; that is the default number of nodes. Two is what we want for this example, so you can simply press enter.

This will be the port that the first node runs on. Unless you know you have something else running on port 8983 on your machine, accept this default option also by pressing enter. If something is already using that port, you will be asked to choose another port.

This is the port the second node will run on. Again, unless you know you have something else running on port 8983 on your machine, accept this default option also by pressing enter. If something is already using that port, you will be asked to choose another port.

Solr will now initialize itself and start running on those two nodes. The script will print the commands it uses for your reference.

 

 

 

Now we can vist both the nodes on url http://localhost:8983 and 7574 or the port you specified.

 

 

Solr create some folders for us, lets have a look at them.

 

 

Notice that two instances of Solr have started on two nodes. Because we are starting in SolrCloud mode, and did not define any details about an external ZooKeeper cluster, Solr launches its own ZooKeeper and connects both nodes to it.

(ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers (we call these registers znodes), much like a file system, With SolrCloud your configuration files are kept in ZooKeeper, https://cwiki.apache.org/confluence/display/ZOOKEEPER/ProjectDescription)

After startup is complete, you’ll be prompted to create a collection to use for indexing data.

Here’s the first place where we’ll deviate from the default options. This tutorial will ask you to index some sample data included with Solr, called the “techproducts” data. Let’s name our collection “techproducts” so it’s easy to differentiate from other collections we’ll create later. Enter techproducts at the prompt and hit enter.

 

 

This is asking how many shards you want to split your index into across the two nodes. Choosing “2” (the default) means we will split the index relatively evenly across both nodes, which is a good way to start. Accept the default by hitting enter.

A replica is a copy of the index that’s used for failover (see also the Solr Glossary definition). Again, the default of “2” is fine to start with here also, so accept the default by hitting enter

We’ve reached another point where we will deviate from the default option. Solr has two sample sets of configuration files (called a configSet) available out-of-the-box.

A collection must have a configSet, which at a minimum includes the two main configuration files for Solr: the schema file (named either managed-schema or schema.xml), and solrconfig.xml. The question here is which configSet you would like to start with. The _default is a bare-bones option, but note there’s one whose name includes “techproducts”, the same as we named our collection. This configSet is specifically designed to support the sample data we want to use, so enter sample_techproducts_configs at the prompt and hit enter.

At this point, Solr will create the collection and again output to the screen the commands it issues.

Congratulations! Solr is ready for data!

You can see that Solr is running by launching the Solr Admin UI in your web browser:

Solr will now be running two “nodes”, one on port 7574 and one on port 8983. There is one collection created automatically, techproducts, a two shard collection, each with two replicas.

 

 

Index the Techproducts Data

Your Solr server is up and running, but it doesn’t contain any data yet, so we can’t do any queries.

Follow tutorial 3 Index the Techproducts Data

Comments Off on Apache Solr 7.5 (techproducts tutorial) 2