top | item 39742481

(no title)

RaisingSpear | 1 year ago

I don't quite understand what you're saying, but it should be possible to infer the length from the number of bytes received.

Assuming n is an integer:

  * 5n bytes received = 4n bytes data
  * 5n+1 bytes received is [invalid]
  * 5n+2 bytes received = 4n+1 bytes data
  * 5n+3 bytes received = 4n+2 bytes data
  * 5n+4 bytes received = 4n+3 bytes data
This is like modified Base64, which doesn't need any padding.

discuss

order

nly|1 year ago

You never need padding as long as you know how many input characters are missing. My point is that if you encode the single byte binary input 0x01 as "00001" (big endian) instead of "10000" (little endian) you avoid the temptation for people to trim off the zeroes (leaving "1"). This means your decode() input will always be a multiple of 5 character chars by construction.

This comes down to whether there should be 5 valid encodings ("10000", "1000", "100", "10", "1") of a single 0x01 byte, or one. The variable length encoding of integers in Protocol Buffers has the same malleability problem

It's also not clear to me why you say 6 char input is invalid.

jameshart|1 year ago

In your scheme you can't tell the difference between the single byte binary input 0x01, and the four byte binary input 0x00,0x00,0x00,0x01.

Those are the same if you're treating the binary data as a stream of 32 bit numbers, but not if it's a stream of an arbitrary number of octets.

Your parent is suggesting that if after chunking the input into 5s, your last chunk is "10" you would treat that as 0x01, "100" as 0x00,0x01, "1000" as 0x00,0x00,0x01 and only "10000" as 0x00,0x00,0x00,0x01. That's not four encodings of the same value at all.

Treating "1" (or any single leftover character) as invalid in such a scheme makes sense because a single character can only encode 85 values, from 0x00 to 0x54.