- Information retrieval (IR) is the activity of obtaining information system resources relevant to an information need from a collection of information resources.
- Searches can be based on full-text or other content-based indexing.
- Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for metadata that describe data, and for databases of texts, images or sounds.
I tried to run the following code, but it turns out that it was a bit late. I forgot to download the corpora.
from textblob import TextBlob
blob = TextBlob(MyCommand.elo.data_txt(book))
The MyCommand.elo.data_txt(book) is returning the actual txt saved in db. The actual text saved here is Leviathan, by Thomas Hobbes (1651), the introduction chapter.
But the cmd yield was:
- For the uninitiated – practical work in Natural Language Processing typically uses large bodies of linguistic data, or corpora.
- In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (nowadays usually electronically stored and processed).
- In corpus linguistics, they are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.
- A corpus may contain texts in a single language (monolingual corpus) or text data in multiple languages (multilingual corpus).
Since we extracted the words in the previous section, instead of that we can just extract out the noun phrases from the textblob.
Noun Phrase extraction is particularly important when you want to analyze the “who” in a sentence.