top | item 43364229

(no title)

tomstokes | 11 months ago

Two important features I insist on for products I develop:

1. Staged rollout of firmware updates. It’s common practice for apps and software but for some reason it’s less common with firmware. Rolling out to 1% (or less, depending on scale) of devices and waiting a day is cheap insurance. Side note: Build a good relationship with customer service people so you hear about these things immediately.

2. A failsafe firmware reset back to factory state. Some sequence that resets the device completely back to the way it was when it came out of the box, firmware included, as a last resort. In conjunction, your automated tests need to confirm that every factory firmware you’ve ever released can update to the latest firmware.

discuss

order

EvanAnderson|11 months ago

> A failsafe firmware reset back to factory state.

This doesn't work if your threat model includes denying rollbacks to prevent exploiting bugs in old firmware. I'd love to be able to roll-back firmware on some of my devices to allow me to "jailbreak" them using old firmware.

In some cases your newer firmware may be blowing e-fuses that prevent old firmware from functioning. See the Nintendo Switch, for an example.

To be clear: I think this is anti-consumer and wrong, but manufacturers absolutely do it.

Edit: I also think it should be illegal, by way of consumer regulation. I don't think consumers should have option to waive their right to manufacturers not damaging hardware they own.

ChuckMcM|11 months ago

This doesn't get enough attention, waaaay too many of these issues are traced back to the vendor trying to "prevent" someone from using their product in a way that they don't like.

xp84|11 months ago

Yup! Depends on what's a higher priority: Preventing catastrophic destruction of the device, OR, "protecting" some IP from ultra-small-scale piracy, even though ultimately anyone bent on piracy will be able to pirate anyway.

Clearly the latter is heavily preferred by most companies.

Szpadel|11 months ago

even with that "requirement" add special minimal recovery that can be booted with special buttons sequence by bootloader and allows some form of flashing signed firmware.

this should be especially trivial when your device have some usb ports.

you can keep all requirements of only newer or the same version of firmware to flash, with all refuse checks.

if you mess up, you can allow consumers to flash fix using regular pendrive

efitz|11 months ago

Sometimes they do it because it’s contractually required if they want to get access to proprietary standards, for example to allow them to play copy-protected content.

Copyright and patent have morphed into evils that drive anti-consumer and anti-competitive behavior, and have driven a “subscription” model that allows rent seekers to achieve their wildest dreams.

throwawayk7h|11 months ago

This is a good reason for manufacturers not to deny rollbacks, and a good reason not to have e-fuses.

basch|11 months ago

Blow the fuse after its confirmed working. Or always allow a one version rollback.

Im not a fan of firmware lockdowns but I understand other people may value security over moddability.

protocolture|11 months ago

Big part of the UBNT vs Cambium dispute. IIRC UBNT won in court, but just to prevent the Cambium firmware being installed on their hardware the next few firmware versions fixed it so that it cant be easily reverted.

Whats worse is that a lot of the affected hardware was near or EOL anyway, so Cambium was simply helping rescue devices headed for the scrap heap.

water9|11 months ago

Blowing efuses is a destructive action and it should not be legal for a company to destroy parts of your electronic device that you paid for

grumple|11 months ago

I think the correct way to do this is to allow a rollback to the immediately previous working version. Before updating, write current firmware to failsafe data storage, then do the update. Then a firmware reset sends you back to the last good version. I'm pretty sure this is already done by many hardware and software manufacturers, such as me.

nomel|11 months ago

Is that applicable here? We're talking about speakers. For most/low security devices, a firmware rollback, or a firmware-download mode, are fine. In this case, it would probably have prevented millions in losses, with the risk being a...jailbroken speaker?

account42|11 months ago

This practice should simply be illegal or at least make the manufacturer liable for a full refund plus interest. We shouldn't let manufacturers brick devices that we own.

clysm|11 months ago

Yes it does work… with an A/B update system.

Android systems can do this today. After an orderly shutdown of new software, then it can mark the new stuff as good and not allow older software to boot.

0x457|11 months ago

Yes, they do it, but usually in devices where it's basically part of DRM. I don't think engineers put that much though in security of soundbars.

croes|11 months ago

But then at least have backup firmware of the one you want to update, so you can go one step back in case of errors.

AlotOfReading|11 months ago

Most companies don't do this because it's not one of their organizational priorities to have reliable updates. The infrastructure is usually custom built and maintained by a couple of folks who have a dozen other responsibilities they're told are more important. Testing is usually limited by hardware availability and release velocity. "One of every board revision we've ever produced" simply isn't available and waiting two days to run through every firmware version before you release updates is a conversational non-starter with the PMs.

There are commercial offerings (like mender.io, never used) that basically specialize in providing rock solid update infrastructure, but that again takes investment and organizational priority that doesn't exist for non-feature code.

boricj|11 months ago

I'm working on embedded systems and I've seen and heard some horror stories just on the device's side. Piles and piles of pre- and post-reboot shell scripts filled with race conditions against the system's services and themselves. When these break, if you're lucky a factory reset is enough to fix the system, if you're unlucky they become field bricks.

I'm trying to buck the trend though and on the new embedded system I'm working on, I've specifically designed the upgrade system to be as reliable as I can make it. It goes something like this:

- The new firmware is downloaded to the secondary application slot.

- Just prior to rebooting, the entire state data of the system is serialized as a document and stored on a flash partition.

- The upgrade flag is set, the system reboots and MCUboot does its thing.

- The new firmware finds out a upgrade happened, clears out all the data partitions, restores from the document and then clears out its partition.

The system is basically sanitized and restored after each upgrade. It's also the same codepath that handles saving and restoring the system's configuration by the end-user as well as settings management. If the document schema is for an older version, run the N-to-N+1 schema upgraders on it prior to applying instead of trying to patch the system in-place. If something goes horribly wrong, flip a jumper to trigger the heavy-duty sanitization that nukes the entire external flash (internal flash only contains the bootloader, primary application slot and factory parameters so it's essentially read-only once the application boots).

It might be hubris, but I hope it's good enough that I'll never see a bricked card that can't be resurrected by a factory reset with this project (assuming no hardware damage, no internal flash corruption and no bricking firmware getting signed with production keys seeping through the cracks despite all the checks in place).

x0x0|11 months ago

Different industry, but I (a long time ago) worked in a place that built scientific instruments.

> "One of every board revision we've ever produced"

The, ah, "special" people we had running engineering didn't even put in the work to be capable of the software querying the board rev. We had to play games like running certain motors past a position limit and seeing if there were limit switches there (or not) to guesstimate board revs.

I'm guessing stories like this are common.

ethan_smith|11 months ago

I completely agree with both points and would add a third: design for offline use first (maybe treat every OTA update as - this might be the final version this device ever receives). Products should work perfectly fine without an internet connection, heck that's how they worked until 5-7 years ago. Core features should never depend on cloud services, and updates should be opt-in, not forced.

Offline first approach respects user autonomy and creates a natural safety net against bad updates. Plus, it means your product keeps working even when servers change or get shut down years later or a nuclear war happens. Sure, connectivity has benefits, but a speaker's main job is playing sound, not phoning home. Building offline-first also forces better engineering decisions about longevity and graceful degradation.

It's so hard to find any offline-first apps/devices nowawdays, which is sad to see in a world of algorithms and AI.

This whole situation reminds me of this: https://programmerhumor.io/linux-memes/thats-the-attitude-sa...

the_snooze|11 months ago

But you see, the problem with offline use is the manufacturer can't claw back value in the future. How will you keep shareholders happy if you can't arbitrarily push ads, hobble existing functionality, or impose a new subscription service?

Galxeagle|11 months ago

I get the sense that #2 is viewed as a risk for DRM, given all the work that goes into preventing firmware downgrades to potentially insecure firmware. Specifically thinking of the Nintendo Switch[1] that goes so far as to blow fuses on each firmware upgrade!

https://news.ycombinator.com/item?id=23534793

steveBK123|11 months ago

Sonos completely missed the boat on these two simple concepts as well.

See their new app debacle which coupled a non-reversible firmware update that made the hardware incompatible with the old app.

ymyms|11 months ago

Great points! As an addendum to this, if #2 becomes untenable for whatever reason (such as a vulnerability in the factory firmware image), then this #3 would be good to strive for as well:

3. have a set of conditions to mark the running firmware image as "safe" and have it become the new fallback firmware image for this scenario. That way you can have a recently up-to-date firmware version constantly trailing the new ones

Zenbit_UX|11 months ago

IMO this is a terrible idea for many reasons but the most important of which is: As a consumer I should have the right to have my device revert any b.s. update and get my setup to how it was the day I bought it.

So many companies have begun rolling out updates that makes the device I purchased call home before allowing any user functions and if/when that server goes down my device becomes a brick. This behavior essentially invalidates my ownership of the product and renders it to a service, provided at will by the manufacturer.

Your idea ensures my device will one day become a brick as soon as the manufacturer decides to mark their update requiring internet check-ins “safe”.

If you think I’m exaggerating check out Louis Rossmann‘s YouTube channel.

bmicraft|11 months ago

Unfortunate you'd need to weave that all the way through the whole product stack in order not to end up in a state that looks like it's working at first glance but actually isn't doing what it is supposed to - like everything running but not showing an image, or everything running except networking is dead (-> also no further updates possible), or (remote) input devices, etc etc

amelius|11 months ago

This is what everybody wants, but almost nobody does. Time to market, etc.

tomstokes|11 months ago

You need to have the firmware equivalent of a platform team.

It's common now for medium and large companies to have some variant of a cloud platform team: People responsible for shared practices, infrastructure, and processes in the cloud.

Smart hardware companies have done the same for decades. You have a firmware platform team that handles things like update protocols, recovery protocols, testing checklists, on-device OTA update architecture, and other critical functions.

When you're a company like Samsung that continuously releases and develops products this actually increases your time to market rather than decreasing it. You let each product team focus on the parts of the firmware that make their product valuable and free them from having to roll their own update systems

drdaeman|11 months ago

It's almost exact same thing as purchasing an insurance.

If the management folks have personal health insurance, surely they must understand the concept and the need. And this is a much better deal because unlike actual insurance this is more like "invest once, enjoy forever" type of thing. And multi-stage boot chain, recovery partition and staged rollouts are not some rocket science that needs some serious expertise.

Yet, here we go. Humans are not really rational actors after all, and collective humans are even less so.

javchz|11 months ago

I suppose the closest equivalent would be motherboards with dual BIOS.

There if something goes wrong during an update, you always have a backup BIOS with the previous version (not necessarily factory settings). If the system fails to boot, it automatically switches to the backup BIOS and restores the main BIOS to the last working version.

neilv|11 months ago

For this $1500 street price soundbar, I'm wondering whether they consciously decided not to invest in BOM cost or software effort that would help avoid bricking.

I'm not sure I understand various industries' conventions...

While interviewing for a principal engineer job, I was meeting individually with a bunch of team leads and managers, and one engineer asked how would I design firmware updating for the company's product (which was more critical, complex, and expensive than a soundbar).

I assumed they were probably trying to see whether I would throw in some robustness/resilience (not oversimplify it). So I sketched it out, while hitting notes like diffs, downloading and assembling in staging space, imperfect networking, having at least two firmware "slots", backing out upon boot loop or failure soon after boot, gradual deployment to installed base, contrasting with some less-critical consumer product firmware update practices, etc.

(Either that was a bad answer, or they got distracted thinking about something I'd said, because I was getting odd subconscious backchannel cues, and they were unresponsive when I tried elicit more requirements or guidance about what they were looking for. Maybe there was some standard embedded systems programmer canned answer that I was supposed to recite (analogous to the Web brogrammer 'system design' interview), and they couldn't think of how to nudge me towards the shibboleth without saying it?)

devmor|11 months ago

#2 has been a godsend in the custom/HEDT PC market. Many expensive motherboards now come with a "dual BIOS" system that gives you an older known working image to boot from, in case flashing a new version broke something that can't be easily undone.

shantara|11 months ago

Another amazing feature is the ability to flash a BIOS from an unbootable system. You insert a flash drive with the firmware file into a USB port, press a hardware button and the BIOS gets updated, even without a CPU socketed.

werdnapk|11 months ago

As a user/customer, if I'm part of that 1% with an issue and get the same sort of "canned" response you see on the mentioned thread, I feel like me as a user doesn't matter. I guess the next step is calling customer support and then having the person on the phone making me go through their checklist of things I've already tried and again, feeling like this is of no use.

I think it usually takes a big rollout for these big companies to actually "hear" their users.

jandrese|11 months ago

The second point is the really important one here. Mistakes happen, having a factory reset that actually works is crucial to avoiding extremely expensive recalls.

I'm reminded of the time a random NPR station accidentally bricked the infotainment systems on thousands of Mazdas and because there was no factory reset feature they had to spend millions replacing head units. That's just bad design.

mytailorisrich|11 months ago

Indeed a golden factory firmware version that will be booted automatically if all else fails and that provides minimum connectivity is crucial.

OtherShrezzing|11 months ago

I wonder if that opens a threat vector from a security point of view? If an attacker knows that the golden firmware has some critical vulnerability which they can exploit easily, they can activate it at will by bricking the device and waiting for it to restart.

tomstokes|11 months ago

> will be booted automatically if all else fails

I prefer to keep the factory firmware reset to a manual process that requires user intervention.

For example, holding down the reset button for 10 seconds after plugging the device in.

In my experience, it's not a good idea to have a device automatically roll back firmware and erase user data after failed boots. These mechanisms get triggered too easily during certain power outages (power comes on then goes off just long enough to cause multiple failed boots) or when users are doing simple things like rearranging their power cables.

devsda|11 months ago

Ability to reset to original out of the box firmware is not only about failsafe. It's also a protection from "bug fixes" taking away features you had out of the box.

I'm still pissed off about LG removing record to disk option from our TV after an upgrade. I've only connected it to internet & upgraded assuming some of those bug fixes resolved few dlna issues otherwise it's always on internet block list.

liendolucas|11 months ago

The important feature here I would insist on is to let the user decide when to do a firmware update. Not the other way round. That's the way to build a good consumer relationship.

Why on earth a sound bar needs to update its firmware? Why firmware needs to be in a couple of tweeters and a woofer? It should basically output audio from an input source.

ErrantX|11 months ago

Another good one is; please always split any security updates from feature changes (and backport the updates per whatever versioning policy you have for those lagging the latest).

After many years of being burned I always delay system level non-security -related updates at least several days after launch to mitigate the risk.

crazygringo|11 months ago

> 2. A failsafe firmware reset back to factory state.

Do you mean like a physical button? That could work, though I'm not sure I've ever seen it. Holding down power for 10 seconds (or whatever) usually just erases user data, but doesn't reset firmware. Are you aware of any device that does this? But does it require some meta-firmware to roll back the firmware? What if that meta-firmware has a security flaw and needs to be updated? And that update is faulty?

If you're talking about a code sent from your servers to devices to reset, that seems like asking for the impossible. If a firmware update bricks the device, that may very well brick its ability to receive codes at all.

In both situations, it starts to feel like a problem of infinite regress...

JimDabell|11 months ago

Reverting to factory state seems riskier than last known good state. You could run into things like TLS root authorities not being recognised, deprecated cipher suites, etc. Just because that version worked a decade ago, it doesn’t mean it’s compatible with the world today.

tomstokes|11 months ago

> Reverting to factory state seems riskier than last known good state.

Reverting to factory state is the last resort. You don't have users do it unless there is no other good state to return to on the device.

> Just because that version worked a decade ago, it doesn’t mean it’s compatible with the world today.

That's why I said you have to include this in your test procedures.

When you're planning for the long term you can accommodate for these things on your servers.

boricj|11 months ago

> 2. A failsafe firmware reset back to factory state. Some sequence that resets the device completely back to the way it was when it came out of the box, firmware included, as a last resort.

That's a nifty mechanism that also allows downgrade attacks, so it has cybersecurity implications that may or may not be acceptable. Furthermore, it might not be practical or even be possible to restore the system to factory condition due to technical reasons.

The team next door allows its systems to downgrade to a previous minor version with a mandatory factory reset. It however refuses downgrading to a previous major version because it implies the bootloader was upgraded or the storage was repartitioned and they really don't want to rollback that.

account42|11 months ago

Except when it comes to firmware, downgrade "attacks" are not attacks at all but just owners making use of THEIR devices. The real attack is the company trying to retain control over something they have sold.

ashoeafoot|11 months ago

But .. but then they can escape the extortion to a working state..

gorlilla|11 months ago

This is the de facto playbook for one of the Mega-Evil Corp.'s CPE firmware (Gateways, IPTV receivers, etc...).

New firmware is pushed in phases 1%, 5%, 10%, 25%, 50% then full scale.

Each stage has some delay incorporated for acquisition/application and then for telemetry (including support contacts from affected accounts) to determine impact and allow for regression fixes.

The other reason they would phase launches is because of firmware builds being used across multiple CPE models and hardware revisions, where only a small subset of hardware could wind up being problematic, but not discovered until deployment.

When you have millions of devices deployed, even a fraction of devices having an issue can create a shit storm on the support side of things.

It all seems so obvious once you know to think about it.

weinzierl|11 months ago

> "A failsafe firmware reset back to factory state"

A failsafe firmware reset back to a safe and secure state yes. The factory state is not necessarily that, so no.

I think devices should keep a last known good state firmware but keeping a full factory state immutable firmware would be irresponsible for many usecases.

fhd2|11 months ago

What hardware reset typically does, in my experience, is to reinstall the last firmware you installed. Many don't even have the space to keep some original and/or safe image in addition. I'm working on one device where we delete much of the existing system to make space for even downloading a new firmware image. It's wild.

omoikane|11 months ago

> 1. Staged rollout of firmware update

Especially if there is an internal testing stage before actually rolling out to production. It's possible that the users seeing the bricked devices are in fact limited to the initial wave, but the damage is already done.

gblargg|11 months ago

> A failsafe firmware reset back to factory state.

Or perhaps to the very first released firmware version. This way they don't have to support updating from any version to the latest, just from the first one.

greesil|11 months ago

Also a dev or dogfood population of devices used by employees

gwerbret|11 months ago

Both are very reasonable features, of course. Here are (some of) the real-world challenges to their implementation:

#1: Requires competence, and/or management that isn't too focused on velocity and features to listen to their engineers' warnings about exactly the sort of problem being discussed here.

#2: Many firmware updates explicitly and specifically want to strip away features that the hardware shipped with (by introducing DRM, paywalls, etc.), so see the comment about management above.