top | item 40386457

(no title)

klavinski | 1 year ago

I have built a Chrome extension to do this one year ago: [0]

Here is the list of technological problems:

1. When is a page ready to be indexed? Many websites are dynamic.

2. How to find the relevant content? (To avoid indexing noise)

3. How to keep an acceptable performance? Computing embeddings on each page is enough to transform a laptop into a small helicopter with its fans. (I used 384 as the embedding dimension. Below, too imprecise; above, too compute-heavy).

4. How to chunk a page? It is not enough to split the content into sentences. You must add context to them.

5. How to rank the results of a search? PageRank is not applicable here.

[0] https://www.youtube.com/watch?v=GYwJu5Kv-rA

discuss

order

No comments yet.