top | item 44639305

(no title)

pilooch | 7 months ago

Some colleagues and myself did implemented exactly this six months ago for a French gov agency.

It's open source and available here: https://github.com/jolibrain/colette

It's not our primary business so it's just lying there and we don't advertise much, but it works, somehow and with some tweaks to get it really efficient.

The true genius though is that the whole thing can be made fully differentiable, unlocking the ability to finetune the viz rag on targeted datasets.

The layout model can also be customized for fine grained document understanding.

discuss

order

ted_dunning|7 months ago

You don't have a license in your repository top-level. That means that nobody who takes licensing at all seriously can use your stuff, even just for reference.

pilooch|7 months ago

Good catch, will add it tomorrow. License is Apache2.

deadbabe|7 months ago

Standard practice now is to just have an LLM read the whole repo and write a new original version in a different language. It’s code laundering.

JSR_FDED|7 months ago

Great, thanks for sharing your code. Could you please add a license so I and others can understand if we're able to use it?

Adityav369|7 months ago

Yeah the fine tuning is definitely the best part.

Often, the blocker becomes high quality eval sets (which I guess always is the blocker).