(no title)
damon_dam | 1 year ago
An alternative that might be worth looking into is just hashing the FS/GS into a table index. It will be slower than the well-optimized case, but it will let you opt out of the TLS allocation process altogether. This might be a good thing in some cases for a low-level facility like a function tracer.
yosefk|1 year ago
Note that my question is about shared libraries. If the thread_local is linked into an executable, I guess you could indeed save the offset somewhere and then add the value of %fs to it, though if this is a way to work around the constructor issue, I prefer to not have a constructor. The question is if this sort of direction can help for thread-local storage allocated by a shared library.
damon_dam|1 year ago
Allocate the variable normally, then compute the offset in one thread (e.g. offset = uintptr_t(&variable) - get_fs()), then access it by adding the offset to FS in any thread (e.g. (vartype *) (offset + get_fs())). The only difference from how it normally works is that you can manually force it to be inlined, sidestepping the codegen problems you described in your post. But if you can avoid those problems by not using constructors instead, that's definitely better.
I used "FS/GS" because GS is used instead of FS on some systems for the same purpose.
The shared library-specific issues are one of the reasons I was suggesting maybe looking into hashing, e.g. perhaps as a fallback solution when the TLS approach fails.
LegionMammal978|1 year ago
[0] https://godbolt.org/z/o6se3je8v