Skip to content
Menu
e-lo [IT Engineer life]
  • Home
    • Note
  • Database
    • T-SQL
    • SQL Server quick
    • SQL server docs
    • MySql quick sheet
    • Postgre
    • InfluxDB
  • Programming
    • MS Azure Powershell
    • MS Azure Command-Line Interface (CLI) doc
    • Python Docs
    • Python Logging
    • Python-cheat-sheet
    • Git-guide
  • Azure
    • MS Windows virtual machines in Azure
    • MS ARM Docs
    • MS ARM Template Docs
    • MS ARM Functions
    • MS Bicep+ARM
    • MS ARM Tutorial
    • MS Deployment scripts (intern/extern)
    • MS Virtual Network
  • Az-nutshell
    • ms-technology-choices-compute-decision-tree
    • ms-data-store-decision-tree
    • ms-data-explorer
    • ms-storage-explorer
    • ms-azure-sql
    • ms-common-data-services
    • ms-azure-mysql-daas
    • ms-sla
    • az paas
    • az glossary-quicksheet
    • az-test-vm-script-quickguide
  • Linux
    • Top CMD’s
    • Useful CMD Linux
    • ss64 Linux
    • Ubuntu
    • 30 things Ubuntu 18.04
    • Bootable Ubuntu USB
    • LinuxFilesystemTreeOverview
  • Sys Admin
    • System Administrator
    • Sys News
  • Zen
    • Not thinking about anything is Zen
e-lo [IT Engineer life]

Information Retrieval chp1 (Information Retrieval, Manning, Raghavan, Schütze) 2

Posted on September 17, 2018October 1, 2018 by espenk

Ad hoc retrieval

Let us now consider a more realistic scenario, simultaneously using the opportunity to introduce some terminology and notation.

We will refer to the group of documents over which we perform retrieval as the (document) collection . It is sometimes also referred to as a corpus (a body of texts)
Suppose each document is about 1000 words long (2-3 book pages). If we assume an average of 6 bytes per word including spaces and punctuation, then this is a document collection about 6 GB in size. Typically, there might be about M = 500 000 distinct terms in these documents.
Our goal is to develop a system to address the ad hoc retrieval task. This is the most standard IR task. In it, a system aims to provide documents from within the collection that are relevant to an arbitrary user information need, communicated to the system by means of a one-off, user-initiated query. An information need is the topic about which the user desires to know more, and is differentiated from a query , which is what the user conveys to the computer in an attempt to communicate the information need.

To assess the effectiveness of an IR system (i.e., the quality of its search results), a user will usually want to know two key statistics about the system’s returned results for a query:

Precision : What fraction of the returned results are relevant to the information need?
Recall : What fraction of the relevant documents in the collection were returned by the system?

First major concept in information retrieval, the inverted index . The name is actually redundant: an index always maps back from terms to the parts of a document where they occur. Nevertheless, inverted index, or sometimes inverted file , has become the standard term in information retrieval.

TBC….

 

RSS Azure

  • Scale your cloud-native apps and accelerate app modernization with Azure, the best cloud for your apps May 24, 2022

RSS RabbitMQ

  • RabbitMQ 3.8.15 release

RSS Python

  • PEP 691: JSON-based Simple API for Python Package Indexes May 4, 2022

Tags

5 min (26) Ansible (1) ARM (10) azure (40) cmd (3) Django (4) Docker (1) e-lo (2) Flask (2) Github (9) Grafana (2) Information (7) Information Retrieval (13) JAVA (1) kivy (2) Kotlin (6) linux (15) mobile (2) Natural Language Prossesing (NLP) (2) Net.Core (1) Networking and Security (6) OPC (2) PEP8 (1) Philosophy (3) Python (47) Python Networking and Security (5) Reason (2) RMQ (3) Solr (11) Sql (10) Uncategorized (2) VSC (1) Warframe (2) WMVARE (4) Zabbix (7)

Recent Posts

  • 5 min Logic App Storage Table
  • 5 min Logic App PSQL
  • 5 min Logic App
  • 5 MIN Azure Data Explorer
  • TODO Build a Hash Table in Python With TDD Real Python

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Photo by Markus Spiske from Pexels "Matrix"

©2022 e-lo [IT Engineer life] | Powered by WordPress & Superb Themes