top | item 26349764

(no title)

michaf | 5 years ago

This way you would trade in a null-byte-terminated variable length string for essentially a null-bit-terminated variable length number (plus the remaining string). I am not convinced that this actually would be much safer.

discuss

order

selfhoster11|5 years ago

Unicode does variable length bit strings too, so I'm not a visionary or anything. It would be safer for no other reason than that such a pattern could only occur at the start of the string, with zero special handling, while a null could occur anywhere in a zero-terminated string.

pertymcpert|5 years ago

This is just the LEB128 format, which is used commonly used and I don't think there's any serious problems with it.

selfhoster11|5 years ago

Interesting. Thank you for sharing this!

ascar|5 years ago

At least you don't have (obvious) performance problems with it, because you will effectively never need more than 9 (usually 2 or 3) of these bytes.

But sure on modern 64 bit systems just using a 64 bit integer makes much more sense. On a small embedded 8 bit oder 16 bit microcontroller it might make sense.

selfhoster11|5 years ago

You are correct, I was trying to show that such a scheme was practical even in the early 1980s when zero-termination was beginning to dominate. This could well be used on 64-bit systems (just with a larger word size than a byte), though the utility of such a thing is questionable.

konjin|5 years ago

In a toy language I once wrote I got around that by encoding binary values as quaternary values and using a ternary system on top of that with a termination character: 11 = 1; 01 = 0; 00 = end; 10 was unused.

Having truly unbounded integers was rather fun. Of course performance was abysmal.