I never deal with such low level issues, so I don't have to read this, but... reading these posts by antirez is such a joy. He makes this topic so clear and understandable, he doesn't assume much, he doesn't use overly complex explanations, he just "says it like it is" :-)
I fondly remember unaligned access faults "back in the day" with FreeBSD/alpha. We implemented a fixup for applications, but not for the kernel. I seem to recall that even though x86 could deal with unaligned accesses, it caused minor performance problems, so fixing alignment issues on alpha would benefit x86 as well.
Most (definitely not all) of the mis-alignment problems were in the network stack, and were centered around the fact that ethernet headers are 14 bytes, while nearly all other protocols had headers that were a multiple of at least 4 bytes.
I've said it before, and I'll say it again: If I had a time machine, I would not kill Hitler. I'd go back to the 70s and make the ethernet header be 16 bytes long, rather than 14.
There is a funny mode on ARM processors (turned on in some images, by default) which causes unaligned reads to silently return bogus data (just increasing a kernel counter).
PowerPC, and really, most non-x86 architectures, do this one way or another.
PowerPC (and POWER) has reasonable hardware support for unaligned memory access, at least for 32-bit data, and if the data is in the data cache. Depending on the processor, the exceptions that reach the OS can be more or less frequent.
ARM v6-A and later (except for some microcontrollers, like Cortex M0/R0, that don't support hardware unaligned access at all, triggering a exception) is similar to the Intel x86 case (reference in transparent unaligned memory access -except for SIMD, where x86 can raise exceptions, too, in the case of unaligned load/store opcodes-), where there is hardware support for unaligned memory access.
For software that uses intensive non-aligned data access, e.g. data compression algorithms doing string search, PowerPC, ARM v6-A (and later ARM Application processors), new MIPS with HW support for unaligned memory access, and Intel are pretty much the same (i.e. b = * (uint32_t * )(a + 23) will take 1-2 cycles, not requiring doing a memcpy(&b, a + 23, sizeof(uint32_t))).
For SIMD, though, there is no transparent fix, although there are specific opcodes for aligned and unaligned memory access (e.g. load/store, unaligned load/store).
I'm probably the only weirdo that thinks this, but if you support byte-addressing you'd better as well be happy with byte-alignment. Atomics being the only place where it's reasonable to be different.
Which brings me to padding. I wonder what percentage of memory of the average 64-bit user's system is padding? I'm afraid of the answer. The heroes of yesteryear could've coded miracles in the ignored spaces in our data.
> if you support byte-addressing you'd better as well be happy with byte-alignment
All ARM processors do this. The concept is called "natural alignment" and it's pretty common on non-x86. See e.g. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.... . The problem here is that a lot of code written for x86 wants more than that, e.g. byte addressing for non-byte-wide values.
Alignment requirements are and have historically been very common -- you can see them on the PDP-11, the 680x0, and so on. It's only because a few very popular architectures like x86 have had very loose or no alignment requirements that we've ended up with a lot of code that assumes there is no alignment requirement, and this has dragged other architectures down the "we need to support this" path. If your architecture faults on misaligned accesses it's really not hard to deal with -- you have to be doing something a bit odd to even run into the problem usually.
> Redis is adding a “Stream” data type that is specifically suited for streams of data and time series storage, at this point the specification is near complete and work to implement it will start in the next weeks.
This sounds like it could be really exciting. Is there anywhere I can find out more?
Specifically, I've been struggling to find an appropriate backend for HTTP Server-Sent Events, could this feature help with that?
I'm pretty sure I saw implementations that used the existing publish subscribe mechanism in Redis to handle it and seemed happy with it. I have no personal experience with it though.
Recently I've been doing a lot of low-level work with ARMv7-M microcontrollers (specifically, NXP's Kinetis Cortex-M4 chips) and was quite pleased to find out that they are pretty lenient about unaligned accesses. To quote from the ARM Cortex-M4 Processor Technical Reference Manual:
"Unaligned word or halfword loads or stores add penalty cycles. A byte aligned halfword load or store adds one extra cycle to perform the operation as two bytes. A halfword aligned word load or store adds one extra cycle to perform the operation as two halfwords. A byte-aligned word load or store adds two extra cycles to perform the operation as a byte, a halfword, and a byte. These numbers increase if the memory stalls."
However, multi-word memory instructions (LDRD, STRD, LDM, STM, etc.) always require their arguments to be word-aligned.
In future project I might be interested in the use of Redis for queuing jobs, this comes very handy to now early the main issues I could get when developing.
Sort of, Rust is supposed to make references to packed structure members unsafe, but currently doesn't. An RFC was accepted to change the behavior but it has not been fully implemented. Here's the tracking issue: https://github.com/rust-lang/rust/issues/27060
Considering dereferencing a pointer after doing some arithmetic on it can only be done within unsafe blocks, I would say you are at least warned about it. But it will happily compile.
wondering what kind of performance overhead it is going to cause by letting the kernel to handle unaligned access vs. fixing the software to actually always use aligned access?
[+] [-] drej|8 years ago|reply
Thanks!
[+] [-] hellwd|8 years ago|reply
[+] [-] drewg123|8 years ago|reply
Most (definitely not all) of the mis-alignment problems were in the network stack, and were centered around the fact that ethernet headers are 14 bytes, while nearly all other protocols had headers that were a multiple of at least 4 bytes.
I've said it before, and I'll say it again: If I had a time machine, I would not kill Hitler. I'd go back to the 70s and make the ethernet header be 16 bytes long, rather than 14.
[+] [-] IgorPartola|8 years ago|reply
[+] [-] blattimwind|8 years ago|reply
PowerPC, and really, most non-x86 architectures, do this one way or another.
[+] [-] faragon|8 years ago|reply
ARM v6-A and later (except for some microcontrollers, like Cortex M0/R0, that don't support hardware unaligned access at all, triggering a exception) is similar to the Intel x86 case (reference in transparent unaligned memory access -except for SIMD, where x86 can raise exceptions, too, in the case of unaligned load/store opcodes-), where there is hardware support for unaligned memory access.
For software that uses intensive non-aligned data access, e.g. data compression algorithms doing string search, PowerPC, ARM v6-A (and later ARM Application processors), new MIPS with HW support for unaligned memory access, and Intel are pretty much the same (i.e. b = * (uint32_t * )(a + 23) will take 1-2 cycles, not requiring doing a memcpy(&b, a + 23, sizeof(uint32_t))).
For SIMD, though, there is no transparent fix, although there are specific opcodes for aligned and unaligned memory access (e.g. load/store, unaligned load/store).
[+] [-] throwaway000002|8 years ago|reply
Which brings me to padding. I wonder what percentage of memory of the average 64-bit user's system is padding? I'm afraid of the answer. The heroes of yesteryear could've coded miracles in the ignored spaces in our data.
[+] [-] wzdd|8 years ago|reply
All ARM processors do this. The concept is called "natural alignment" and it's pretty common on non-x86. See e.g. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.... . The problem here is that a lot of code written for x86 wants more than that, e.g. byte addressing for non-byte-wide values.
[+] [-] pm215|8 years ago|reply
[+] [-] MrBuddyCasino|8 years ago|reply
[+] [-] luhn|8 years ago|reply
This sounds like it could be really exciting. Is there anywhere I can find out more?
Specifically, I've been struggling to find an appropriate backend for HTTP Server-Sent Events, could this feature help with that?
[+] [-] antirez|8 years ago|reply
[+] [-] yeswecatan|8 years ago|reply
https://www.reddit.com/r/redis/comments/4mmrgr/stream_data_s...
[+] [-] johnny22|8 years ago|reply
[+] [-] msarnoff|8 years ago|reply
"Unaligned word or halfword loads or stores add penalty cycles. A byte aligned halfword load or store adds one extra cycle to perform the operation as two bytes. A halfword aligned word load or store adds one extra cycle to perform the operation as two halfwords. A byte-aligned word load or store adds two extra cycles to perform the operation as a byte, a halfword, and a byte. These numbers increase if the memory stalls."
However, multi-word memory instructions (LDRD, STRD, LDM, STM, etc.) always require their arguments to be word-aligned.
[+] [-] type0|8 years ago|reply
[+] [-] JefeChulo|8 years ago|reply
[+] [-] amelius|8 years ago|reply
[+] [-] bbatha|8 years ago|reply
[+] [-] wofo|8 years ago|reply
[+] [-] dis-sys|8 years ago|reply
[+] [-] crncosta|8 years ago|reply
[+] [-] k__|8 years ago|reply
[+] [-] make3|8 years ago|reply
[+] [-] retox|8 years ago|reply
Turn on "show dead comments" and see how many greens are deleted. I screenshot many examples.