top | item 42406289

(no title)

sachou | 1 year ago

what would be your specific requirement?

Right now adding chunk size, model for embedding, what else?

Image is a great challenge with OCR can be solve as you mentioned

discuss

olup|1 year ago

We, as many players, have custom pipelines on embedding. We don't split docs based on chunk size but do semantic chunking and chunk augmentation. We embed everything with two embeddings services to always have a fallback if one provider is not available.

If I were in your shoes I would not think embedding and inserting in a vector store would be my responsibility, especially since there are so many different stores on the market.