Information Retrieval

  • 5 min,  Information Retrieval

    5 min Excel find duplicates without removing them

    How to find duplicate values without removing them in Excel? (extendoffice.com) In Excel, the Remove Duplicates function can help users to quickly remove all duplicate values from a list. But in some times, you may only want to find and identify the duplicate values without removing them as below screenshot shown, how can you quickly handle it? The Conditional Formatting can quickly highlight the duplicate values in a list. Select the values you want to find duplicates, click Home > Conditional Formatting > Highlight Cells Rules > Duplicate Values.In the popping Duplicate Values dialog, select the highlighting option as you need from the right drop down list.Click OK. And the…

    Comments Off on 5 min Excel find duplicates without removing them
  • azure,  Information Retrieval,  Python

    Measure your VM with typeperf (Pandas py) perfmon

    Azure disk-types https://docs.microsoft.com/en-us/azure/virtual-machines/disks-types Ultra, Premium SSD, Standard SSD, Standard HDD. Max disk size 65,536 gibibyte (GiB) 32,767 GiB 32,767 GiB 32,767 GiB Max throughput 2,000 MB/s 900 MB/s 750 MB/s 500 MB/s Max IOPS 160,000 20,000 6,000 2,000 Premium SSD Azure premium SSDs deliver high-performance and low-latency disk support for virtual machines (VMs) with input/output (IO)-intensive workloads. […] Premium SSDs can only be used with VM series that are premium storage-compatible. Standard SSD Azure standard SSDs are a cost-effective storage option optimized for workloads that need consistent performance at lower IOPS levels. […] Like standard HDDs, standard SSDs are available on all Azure VMs. Standard SSD sizes E1 E2 E3…

    Comments Off on Measure your VM with typeperf (Pandas py) perfmon
  • Information Retrieval,  Networking and Security,  Python

    Python multicasting UDP wireshark

    https://www.wireshark.org/ A multicast address is a logical identifier for a group of hosts in a computer network that are available to process datagrams or frames intended to be multicast for a designated network service. Multicast addressing can be used in the link layer (layer 2 in the OSI model), such as Ethernet multicast, and at the internet layer (layer 3 for OSI) for Internet Protocol Version 4 (IPv4) or Version 6 (IPv6) multicast.   Point-to-point connections handle a lot of communication needs, but passing the same information between many peers becomes challenging as the number of direct connections grows. Sending messages separately to each recipient consumes additional processing time and…

    Comments Off on Python multicasting UDP wireshark
  • Information Retrieval,  Python,  Solr

    Apache Solr 7.5 The Standard Query Parser focus searching

    http://lucene.apache.org/solr/guide/7_5/the-standard-query-parser.html Solr’s default Query Parser is also known as the “lucene” parser. The key advantage of the standard query parser is that it supports a robust and fairly intuitive syntax allowing you to create a variety of structured queries. The largest disadvantage is that it’s very intolerant of syntax errors, as compared with something like the DisMax query parser which is designed to throw as few errors as possible. Standard Query Parser Parameters In addition to the Common Query Parameters, Faceting Parameters, Highlighting Parameters, and MoreLikeThis Parameters, the standard query parser supports the parameters described in the table below. q: Defines a query using standard query syntax. This parameter is…

    Comments Off on Apache Solr 7.5 The Standard Query Parser focus searching
  • Github,  Information Retrieval,  Python,  Solr

    Apache Solr 7.5 Index build with DirectUpdateHandler2 HTTP (with delete index)

    http://lucene.apache.org/solr/guide/7_5/uploading-data-with-index-handlers.html Index Handlers are Request Handlers designed to add, delete and update documents to the index. In addition to having plugins for importing rich documents using Tika or from structured data sources using the Data Import Handler, Solr natively supports indexing structured documents in XML, CSV and JSON. The default in solrconfig.xml is an UpdateHandler: <!– The default high-performance update handler –><updateHandler class=”solr.DirectUpdateHandler2″> DirectUpdateHandler2 implements an UpdateHandler where documents are added directly to the main Lucene index as opposed to adding to a separate smaller index. Now it is going to be a bit code, so the code for Python can be found at github: https://github.com/spawnmarvel/solrhttp Ok, lets index a…

    Comments Off on Apache Solr 7.5 Index build with DirectUpdateHandler2 HTTP (with delete index)
  • Information Retrieval,  Python,  Solr

    Apache Solr 7.5 create core, alter schema and query for API data

    With reference to previous tutorial: Core A single Solr instance, which represents a single Solr index. A core has a different set of configuration files and schema definitions than other cores. Now lets create a new core, after downloading and installing the files, have a look at the readme file in solr-7.5.0 directory (my pc; C:\solr_test\solr-7.5.0\solr-7.5.0): Stop Solr if is running in another mode i.e cloud. Great, now back to the readme file. Getting Started ————— To start Solr for the first time after installation, simply do: bin/solr start Lets check that no core is present: Great, next: This will launch a standalone Solr server in the background of your…

    Comments Off on Apache Solr 7.5 create core, alter schema and query for API data
  • Information Retrieval,  Solr

    Solr Terminology: Cores, Collections & Nodes

    https://doc.lucidworks.com/lucidworks-hdpsearch/2.5/Guide-Solr.html Solr is the popular open source search solution. Solr can index content from many sources and has integration points for Apache Tika to index rich text documents (Office documents, PDFs, etc.), JSON files, CSV files and Solr-specific XML. Cores, Collections and Clusters Generally speaking, if you use Solr in standalone mode, you have a single core for each index. You can have multiple cores, but they would all be separate indexes. Generally speaking, if you use Solr in standalone mode, you have a single core for each index. YouIf you use Solr in SolrCloud mode, which is how this documentation suggests you use Solr with Hadoop, you would have…

    Comments Off on Solr Terminology: Cores, Collections & Nodes
  • Information Retrieval,  Solr

    Apache Solr 7.5 Build book_store index backup / restore

    https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html If you are worried about data loss, and of course you should be, you need a way to back up your Solr indexes so that you can recover quickly in case of catastrophic failure. Solr provides two approaches to backing up and restoring Solr cores or collections, depending on how you are running Solr. If you run in SolrCloud mode, you will use the Collections API. If you run Solr in standalone mode, you will use the replication handler.   The backup API requires sending a command to the /replication handler to back up the system. You can trigger a back-up with an HTTP command like this (replace “gettingstarted”…

    Comments Off on Apache Solr 7.5 Build book_store index backup / restore
  • Information Retrieval,  Solr

    Apache Solr 7.5 Build book_store index insert/update/delete

    We are starting from: Your Solr server is up and running, but it doesn’t contain any data yet, so we can’t do any queries, but we will use our own data and create our own index. in: https://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-1   Quick recap: Start cloud mode: bin\solr.cmd start -e cloud We will create a collection called book_store with configuration _default. Ie following the parameter in tutorial 2, but changed the name and the configuration. We start with 1 node on default port bin\solr.cmd start -e cloud (if the collection is existing just type for 1 for use existing, stop node1, bin\solr.cmd stop -p 8983) 1 node, running on default port 8983, shared…

    Comments Off on Apache Solr 7.5 Build book_store index insert/update/delete
  • Information Retrieval,  Solr

    Apache Solr 7.5 (techproducts tutorial) 2

    In this tutorial I have moved the environment outside the virtualbox. I installed the same version of Solr and JRE, and added the path for environment. And also created two bat files for easy start and stop of the server. I go over the steps in the tutorial below Index techproducts example data: https://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-1 To launch Solr bin\solr.cmd start -e cloud on Windows The first prompt asks how many nodes we want to run. Note the [2] at the end of the last line; that is the default number of nodes. Two is what we want for this example, so you can simply press enter. This will be the port…

    Comments Off on Apache Solr 7.5 (techproducts tutorial) 2