top | item 39910866

(no title)

phaer | 1 year ago

But this doesn't read the file char-by-char, but uses buffering to read it into a string

discuss

What would you expect? There's no OS API for "read one character", except in say ASCII where 1 byte = 1 code point = 1 character. And it'd be hideously inefficient anyway. So you either loop over getting the next N bytes and getting all complete characters so far (with some extra complexity around characters that cross chunk boundaries) or you read the whole thing into a single buffer and iterate the characters. This code does the latter. If this tool doesn't have the ability to respond by asking requirements questions, I'd consider either choice valid.

Of course, in real life, I do expect to get requirements questions back from an engineer when I assign a task. Seems more practical than anticipating everything up-front into the perfect specification/prompt. Why shouldn't I expect the same from an LLM-based tool? Are any of them set up to do that?

1letterunixname|1 year ago

There most certainly is getwchar() and fgetwc()/getwc() on anything that's POSIX C95, so that's more or less everything that's not a vintage antique.

Reading individual UTF-8 codepoints is a trivial exercise if byte width getchar() were available, and portable C code to do so would be able to run on anything made after 1982. IIRC, they don't teach how to write portable C code in Comp Sci programs anymore and it's a shame.

Never read a file completely into memory at once unless there is zero chance of it being a huge file because this is an obvious DoS vector and waste of resources.