(no title)
klavinski | 1 year ago
Here is the list of technological problems:
1. When is a page ready to be indexed? Many websites are dynamic.
2. How to find the relevant content? (To avoid indexing noise)
3. How to keep an acceptable performance? Computing embeddings on each page is enough to transform a laptop into a small helicopter with its fans. (I used 384 as the embedding dimension. Below, too imprecise; above, too compute-heavy).
4. How to chunk a page? It is not enough to split the content into sentences. You must add context to them.
5. How to rank the results of a search? PageRank is not applicable here.
No comments yet.