top | item 40767802

(no title)

polshaw | 1 year ago

This is cool, as far as a practical issue though (aside from the 280gb TTF file!) is that it makes it incompatible with all other fonts; if you copy and paste your "improved" text then it will no longer say what you thought it did. It just alters the presentation, not the content. I guess you would have to ocr to get the content as you see it.

I was wondering why this was never used for an simpler autocorrect, but i guess that's why.

Also perhaps someone more educated on LLMs could tell me; this wouldn't always be consistent right? Like "once upon a time _____" wouldn't always output the same thing, yes? If so even copying and pasting in your own system using the correct font could change the content.

discuss

order

magnat|1 year ago

> if you copy and paste your "improved" text then it will no longer say what you thought it did

It's not a bug, it's a feature - a DRM. Your content can now be consumed, but cannot be copied or modified - all without external tools, as long as you embed that TTF somehow.

Which kind of reminds me of a PDF invoices I got from my electricity provider. It looked and printed perfectly fine, but used weird codepoint mapping which resulted in complete garbage when trying to copy any text from it. Fun times, especially when pasting account number to a banking app.

mbb70|1 year ago

This is while pretty much all software that extracts structured data from PDFs throws away the text and just OCRs the page. Too many tricks with layouts and fonts.

yjftsjthsd-h|1 year ago

Eh, what AI taketh, AI can give; modern OCR has gotten mostly decent. If you're on Windows you should try the powertools OCR tool.

nacs|1 year ago

The small model/TTF is only 60MB.

The 280GB you saw is the Llama3-70B model which is basically chatgpt level (if not better).

Retr0id|1 year ago

If there's any randomness involved in inference, it ought to be deterministic as long as the same seed is used each time.

furyofantares|1 year ago

Is there even any possibility of using a different seed? I'd doubt the WASM shaper has accesss to any source of non-determinism.

NoobSaibot135|1 year ago

> this wouldn't always be consistent right? Like "once upon a time _____" wouldn't always output the same thing, yes?

Would be cool if you could turn up/down the LLM’s temperature by pressing different keys other than just !!!!

Say pressing keyword numbers 0-9