E-lo method txt_summary
So now we have made som progress using math and TextBlob.
We are now able to read text, index it to database, get it back and apply TextBlob functions to it.
sql_create_raw_text = “create table if not exists raw_txt(id INTEGER PRIMARY KEY AUTOINCREMENT, title text check(length(title) <= 25) NOT NULL, note text NOT NULL, sentences INTEGER, words INTEGER, avg_words_sent INTEGER)”
Here we store the id, title, text, amount of sentences, amount of words in total, and words / sentences.
Plain English also recommends short words. Even if the average sentence length of a document is 15-20 words, readability is not guaranteed. Polysyllabic words are likely to make the meaning of the document difficult to grasp. So we also need a guideline for average word length.
We store words / sentences beacuse that will tell us how many words before the author uses a dot to finish the sentence.
If the author uses many words, it could be an old text or the author is inspired by old text. So it can tell using something about the time.
Here is the result for txt_summary:
The red line separates the two different texts.
The summary consists of nouns(substantiv) with TextBlob lemmatize(), i.e seat becomes seats.
So here we can see what the text is about, amount of sentences, words and the words/sentences to get an idea of the time the text was written.
All that without reading the text, well it needs some work but we are getting there.
The limit for sentences:
avg_word_sentences = data
if avg_word_sentences >20:
print(“Long sentences, avg for word in each sentences “+format(avg_word_sentences))
print(“Normal sentences, avg for word in each sentences “+format(avg_word_sentences))