top | item 43079698

(no title)

damon_dam | 1 year ago

To the author: If you have a specific situation that the codegen is failing to optimize well (e.g. the ctor cases that you ran into) you can store the offset to your variable in a non-TLS global, then manually add the FS/GS to it. Use inline asm if you need to bypass any init checks.

An alternative that might be worth looking into is just hashing the FS/GS into a table index. It will be slower than the well-optimized case, but it will let you opt out of the TLS allocation process altogether. This might be a good thing in some cases for a low-level facility like a function tracer.

discuss

order

yosefk|1 year ago

I have updated the post with all the suggestions people replied with except this one, because I don't understand it. How do I allocate memory for addressing it with FS/GS? Isn't FS the register pointing to the TLS area - then how is the FS-based access you propose different from how TLS normally works? Isn't GS used "for something" on x86?.. If you could elaborate on this/show some code I would be very grateful!

Note that my question is about shared libraries. If the thread_local is linked into an executable, I guess you could indeed save the offset somewhere and then add the value of %fs to it, though if this is a way to work around the constructor issue, I prefer to not have a constructor. The question is if this sort of direction can help for thread-local storage allocated by a shared library.

damon_dam|1 year ago

> How do I allocate memory for addressing it with FS/GS?

Allocate the variable normally, then compute the offset in one thread (e.g. offset = uintptr_t(&variable) - get_fs()), then access it by adding the offset to FS in any thread (e.g. (vartype *) (offset + get_fs())). The only difference from how it normally works is that you can manually force it to be inlined, sidestepping the codegen problems you described in your post. But if you can avoid those problems by not using constructors instead, that's definitely better.

I used "FS/GS" because GS is used instead of FS on some systems for the same purpose.

The shared library-specific issues are one of the reasons I was suggesting maybe looking into hashing, e.g. perhaps as a fallback solution when the TLS approach fails.

LegionMammal978|1 year ago

I think GP is talking about something like this [0]. You let it call __tls_get_addr() once in a constructor, take the offset from %fs, store it in a static variable, and use that offset directly. (The static variable doesn't need to be atomic, since it's only written to once, at dlopen() time.)

[0] https://godbolt.org/z/o6se3je8v