(no title)
thoughtlede | 1 year ago
In keyword-based indexing solutions, a document vector is created using "term frequency inverse document frequency" scores. The idea is to pump up the document on the dimension where the document is unique compared to the other documents in the corpus. So when a query is issued with emphasis on a certain dimension, only documents that has higher scores in that dimension are returned.
But the uniqueness in those solutions is based on keywords being used in the document, not concepts.
What we need here to eliminate "blandness" is conceptual uniqueness. Maybe TF-IDF is still relevant to get there. Something to think about.
No comments yet.