top | item 32952084

(no title)

I'm sure someone will come along and explain why I have no idea what I'm talking about, but so far my understanding is those names exist because of the difference in CPU word size. Typically "int" represents the natural word size for that CPU, which matches the register size as well, so 'int plus int' is as fast as addition can run by default, on a variety of CPUs. That's one reason chars and shorts are promoted to ints automatically in C.

Let's say you want to work with numbers and you want your program to run as fast as possible. If you specify the number of bits you want, like i32, then the compiler must make sure on 64bit CPUs, where the register holding this value has an extra 32bits available, that the extra bits are not garbage and cannot influence a subsequent operation (like signed right shift), so the compiler might be forced to insert an instruction to clear the upper 32bits, and you end up with 2 instructions for a single operation, meaning that your code now runs slower on that machine.

However, had you used 'int' in your code, the compiler would have chosen to represent those values with a 64bit data type on 64bit machines, and 32bit data type on 32bit machines, and your code would run optimally, regardless of the CPU. This of course means it's up to you to make sure that whatever values your program handles fit in 32bit data types, and sometimes that's difficult to guarantee.

If you decide to have your cake and eat it too by saying "fine, I'll just select i32 or i64 at compile time with a condition" and you add some alias, like "word" -> either i32 or i64, "half word" -> either i16 or i32, etc depending on the target CPU, then congrats, you've just reinvented 'int', 'short', 'long', et.al.

Personally, I'm finding it useful to use fixed integer sizes (e.g. int32_t) when writing and reading binary files, to be able to know how many bytes of data to read when loading the file, but once those values are read, I cast them to (int) so that the rest of the program can use the values optimally regardless of the CPU the program is running on.

discuss

nicoburns|3 years ago

That explains "int", but it doesn't explain short or long or long long. Rust has "usize" for the "int" case, and then fixed sizes for everything else, which works much better. If you want portable software, it's usually more important to know how many bits you have available for your calculation than it is to know how efficiently that calculation will happen.

creativemonkeys|3 years ago

I suppose short and long have to do with register sizes being available as half word and dword, and there are instructions that work with smaller data sizes on both x86 and ARM, but I agree that in today's world, you want to know the number of bits. On those weak 4MHz machines, squeezing a few extra cycles was typically very important.