top | item 36045472

(no title)

Vasyl_R | 2 years ago

apologies for that. Let me check with my co-founder. It will be there. Would be great to know your thoughts about our GUI for editing and embeddings + joining and splitting chunks, as well as filtering out punctuations and stop-words with one word. You can have a look on it in the /embedditor repo or in our web embedditor. ai.

discuss

PaulHoule|2 years ago

My take is you need to do more work on the value proposition.

My first take is that I can compute embeddings with one line of Python using sbeet.net and from there it is an automated process, I have a script that generates embeddings for 80,000 documents that runs every day and I barely think about it.

I think of GUI and I think somebody has to click through 80,000 documents to do this same and to get the same throughout I’d have to raise venture capital and hire an army of people to go click… click… click… That is it takes something easy and scalable and makes it difficult and expensive, It makes me think if the text retrieval experiments that Salton did with documents on IBM cards in the 1960s.

I know there is more to it than that, this simple approach is not so simple when you consider chunking and other choices that could make a big difference but i still think there would be some programming language function that takes a document and gives an embedding but some kind of suite to determine the parameters of that function (on the level of a document collection not individual documents) could be quite useful but I think a lot of people will want something that doesn’t have many knobs to turn.

Vasyl_R|2 years ago

hello Paul, thanks a lot for your feedback. I really appreciate that and we'll take it into consideration for our next steps. We saw people struggling with vector search that retrieves half of the relevant paragraph, just because it was chunked base on the qty of tokens. So our first step is to give users (I'm not talking about people that know Python, NLTK, and LangChain) can pre-process their embeddings, adding there images, and making cleansing, removing at least punctuations and stop-words, with a few clicks. But you're totally right - now we have to think not about a single document pre-processing but about embedding large set of documents.

Really appreciate your time and hope to have your star or see you among our watchers.