top | item 46532492

(no title)

cronos | 1 month ago

I'm one of the Tailscale engineers who built node state encryption initially (@awly on Github), and who made the call to turn it off by default in 1.92.5.

Another comment in this thread guessed right - this feature is too support intensive. Our original thinking was that a TPM being reset or replaced is always sign of tampering and should result in the client refusing to start or connect. But turns out there are many situations where TPMs are not reliable for non-malicious reasons. Some examples: * https://github.com/tailscale/tailscale/issues/17654 * https://github.com/tailscale/tailscale/issues/18288 * https://github.com/tailscale/tailscale/issues/18302 * plus a number of support tickets

TPMs are a great tool for organizations that have good control of their devices. But the very heterogeneous fleet of devices that Tailscale users have is very difficult to support out of the box. So for now we leave it to security-conscious users and admins to enable, while avoiding unexpected breakage for the broader user base.

We should've provided more of this context in the changelog, apologies!

discuss

order

snailmailman|1 month ago

Those issues are a surprising read. I would expect issues with TPM on old or niche devices, but not Dell XPS laptops, or a variety of VMs. But I guess I'm not entirely sure how my vms handle TPM state, or if they even can.

I'm running nearly all of my personal tailscale instances in containers and VMs. Looking now at the dashboard, it appears this feature really only encrypted things on my primary linux and windows pc, my iphone, and my main linux server's host. None of the VMs+containers i use were able to take advantage of this, nor was my laptop. Although my laptop might be too old.

9x39|1 month ago

Stuff breaks all the time, you just need a bigger sample size.

Overseeing IT admins for corp fleets is part of my gig, and from my experience, we get malfunctioning TPMs on anything consumer - Lenovo, Dell, HP, whatever. I think the incidence is some fraction of a percent, but get a few thousand devices and the chance of eventually experiencing it is high, very high. I can't imagine a vTPM being perfect either, since there isn't a hypervisor out there someone hasn't screwed up a VM on.

slyn|1 month ago

Just had a system board replaced on a device in my org, Dell laptop.

As part of setting up a device in our org we enroll our device in Intune (Microsoft's cloud-based device management tool aka UEM / RMM / MDM / etc). To enroll your device you take a "hardware hash" which's basically TPM attestation and some additional spices and upload it to their admin portal.

After the system board replacement we got errors that the device is in another orgs tenant. This is not unusual (you open a ticket with MS and they typically fix it for you), and really isn't to blame on Dell per se. Why ewaste equipment you can refurbish?

Just adding 5c to the anecdata out there re: TPM as an imperfect solution.

evanjrowley|1 month ago

My eyes have opened up to the pitfalls of TPM recently while upgrading CPUs and BIOS/UEFI versions on various hardware in my home.

VMs typically do not use TPMs, so it is not surprising that the feature was not being used there. One common exception is VMware, which can provide the host's TPM to the VM for a better Windows 11 experience. One caveat is this doesn't work on most Ryzen systems because they implement a CPU-based fTPM that VMware does not accept.

zozbot234|1 month ago

It is in fact surprising that TPMs can be wiped so easily. It makes them almost useless compared to dedicated solutions like physical FIDO keys or smartcards, and does not bode well for hardware-backed Passkeys that would also be inherently reliant on TPM storage.

Macha|1 month ago

I had a Ryzen 3900x on a gigabyte motherboard and the fTPM was just totally unreliable for a pretty mainstream combination. Not fully sure which was to blame there.

At least it was fixed in the 5900x (and _different_ gigabyte motherboard, but from the same lineup) that replaced it.

braiamp|1 month ago

As some kernel developers have said: motherboard manufacturers are really bad making sure stuffs works.

behringer|1 month ago

I'm not sure what makes any of this "surprising". Each ticket reads like "we replaced the computer that tailscale was on, it doesn't work anymore" pikachu face.

Yeah, that was a feature and the exact reason why we use TPMs. I guess it should have been better advertised.

justincormack|1 month ago

VMs don't have TPMs as they are hw devices, although you can run a software TPM (potentially backed by the host TPM) and pass it to them, which you might want to do for this use case.

sydbarrett74|1 month ago

That's an eminently reasonable and logical policy. Thanks for the context.

traceroute66|1 month ago

@cronos

Question:

You link to https://github.com/tailscale/tailscale/issues/17654 where a user states[1]:

"Previous workaround from some comments (TS_ENCRYPT_STATE=false, FLAGS="--encrypt-state=false") didn't help on this problematic Debian 13 host"

And the same user states "I confirm this issue is NOT found anymore with tailscale version 1.92.1".

Could you provide a little extra context to clarify those types of comments which seem to suggest it wasn't state encryption after all ?

[1] https://github.com/tailscale/tailscale/issues/17654#issuecom...

cronos|1 month ago

There are two new-ish features in Tailscale that use TPMs: node state encryption (https://tailscale.com/kb/1596/secure-node-state-storage) and hardware attestation keys.

Hardware key attestation is a yet-unfinished feature that we're building. The idea is to generate a signing key inside of the TPM and use it to send signatures to our control plane and other nodes, proving that it's the same node still. (The difference from node state encryption is that an attacker can still steal the node credentials from memory while they are decrypted at runtime).

We started by always generating hardware attestation keys on first start or loading them from the TPM if they were already generated (which seemed safe enough to do by default). That loading part was causing startup failures in some cases.

To be honest, I didn't get to the bottom of all the reports in that github issue, but this is likely why for some users setting `--encrypt-state=false` didn't help.

dietr1ch|1 month ago

I too thought that the TPM was something to be trusted with a secret until a BIOS upgrade just wiped mine. I'm not relying on TPM again.

johncolanduoni|1 month ago

It was designed mostly for mechanisms where in the event of certain changes (BIOS upgrades, certain other firmware changes, some OS changes) there is a fallback mechanism to unlock the system and reset the key. This is why Windows BitLocker is so insistent about you saving your key somewhere else - if you do a BIOS update and it can’t decrypt, it’ll require your copy of the key and then reset the TPM-encrypted copy with the new BIOS accounted for.

A TPM’s primary function works by hashing things during the boot process, and then telling the TPM to only allow a certain operation if hashes X & Z don’t change. Depending on how the OS/software uses it, a whole host of things that go into that hash can change: BIOS updates being a common one. A hostile BIOS update can compromise the boot process, so some systems will not permit automatic decryption of the boot drive (or similar things) until the user can confirm that they have the key.

nathanlied|1 month ago

Thank you for your openness here - and yes, it would be nice to see this kind of reasoning in the changelog, even if it's tucked a little out of the way! Those of us who care will read it.

Also very welcome is to separate it into a small blogpost providing details, if the situation warrants a longer, more detailed format.

dist-epoch|1 month ago

Your suspicion is correct. I have an AMD AM5 motherboard, and everytime I update it's BIOS it warns me that the fTPM will be reset, and I know it does so because afterwards Bitlocker prompts me to introduce the recovery key since it can't unlock the drive anymore.

AceJohnny2|1 month ago

Thanks! In your change https://github.com/tailscale/tailscale/pull/18336 you mention:

> There's also tailscaled-on-macOS, but it won't have a TPM or Keychain bindings anyway.

Do you mean that on macOS, tailscaled does not and has never leveraged equivalent hardware-attestation functionality from the SEP? (Assuming such functionality is available)

cronos|1 month ago

On macOS we have 3 ways to run Tailscale: https://tailscale.com/kb/1065/macos-variants Two of them have a GUI component and use the Keychain to store their state.

The third one is just the open-source tailscaled binary that you have to compile yourself, and it doesn't talk to the Keychain. It stores a plaintext file on disk like the Linux variant without state encryption. Unlike the GUI variants, this one is not a Swift program that can easily talk to the Keychain API.

pja|1 month ago

A BIOS update to my PC reset the TPM only this week. I did get a warning that Bitlocker keys would be wiped as a result before acting at least.

(I believe this was because it was fixing an AMD TPM exploit - presumably updating the TPM code wipes the TPM storage either deliberately or as an inevitable side effect.)

plagiarist|1 month ago

TPMs are basically storing the hashes of various pieces of software, then deterministically generating a key from those. Since the BIOS software changed, that hash changed, and the key it generates is completely new.

If someone had messed with your BIOS maliciously, that's desirable. Unfortunately you messing with your BIOS intentionally also makes the original key pretty much unrecoverable.

lloeki|1 month ago

Coincidentally this was a feature unknown to me until I performed a SSD migration from one server to another and Tailscale failed to connect because ("of course!" in hindsight) it failed to decrypt whatever.

So not a TPM failure but certainly a gotcha! moment; luckily I had a fallback method to connect to the machine, otherwise in the particular situation I was in I would have been very sorry.

The "whoever needs this will enable it" + support angle makes total sense.

Thaxll|1 month ago

Did you rely on the Google go tpm lib for that?

cronos|1 month ago

Yes, we use github.com/google/go-tpm/tpm2

miki123211|1 month ago

So this is only disabled on platforms that use a TPM, e.g. Linux and Windows? What about Mac OS?

cronos|1 month ago

The macOS client uses the keychain by default, that's not changed here .

zdware|1 month ago

i just started using tailscale and responses like this make me believe in the product. awesome!

keepamovin|1 month ago

Does this mean TS is not FIPS 140-3 now?

tatersolid|1 month ago

It never was FIPS-approved and likely will never be. The wireguard protocol used by Tailscale uses ChaCha20 for encryption which is not FIPS approved.

jkaplowitz|1 month ago

Thank you for explaining that context!