top | item 38834679

(no title)

weijiacheng | 2 years ago

I am one of the SE editors/regular contributors and I did play around with this a bit for a poetry collection: https://groups.google.com/g/standardebooks/c/IUvGLmvZrmM/m/s...

I'm sure someone sufficiently determined and good at prompt engineering, and integrating LLMs into a larger toolset, could come up with something even better. I'm personally very skeptical of LLMs as a technology, but even I have to admit that this was a pretty ideal and unobjectionable use of LLMs.

That being said, though it was a fun experiment, I later found that it was easier (and less wasteful of natural resources) to just do the same thing with a bit of custom markup and a search and replace script.

discuss

order

duskwuff|2 years ago

I don't think that's quite what the parent had in mind.

The most natural application of a language model in proofreading is to compute perplexity across the text; if all goes well, errors should be detectable as points of unusually high perplexity. (In principle, this should even be able to spot otherwise undetectable errors like missing words.)

weijiacheng|2 years ago

I could see how that would be helpful, but at least for my use case I'm more interested in seeing how LLMs integrated with computer vision can speed up transcriptions. Since a thorough proofread by a human is already baked into the SE production process (and is indeed one of the major selling points), having more automated tools to aid proofreading is nice but doesn't do anything fundamentally different, from my point of view. Whereas if LLMs can be leveraged for transcription SE producers no longer need to depend on external projects like Project Gutenberg or Wikisource to produce texts (which can take months) or transcribe texts from OCR results by hand (very tedious and error-prone--believe me, I'm speaking from experience!). It would drastically open up the range of possible books someone could reasonably produce (in a timely fashion) for SE.