(no title)
tomstokes | 11 months ago
1. Staged rollout of firmware updates. It’s common practice for apps and software but for some reason it’s less common with firmware. Rolling out to 1% (or less, depending on scale) of devices and waiting a day is cheap insurance. Side note: Build a good relationship with customer service people so you hear about these things immediately.
2. A failsafe firmware reset back to factory state. Some sequence that resets the device completely back to the way it was when it came out of the box, firmware included, as a last resort. In conjunction, your automated tests need to confirm that every factory firmware you’ve ever released can update to the latest firmware.
EvanAnderson|11 months ago
This doesn't work if your threat model includes denying rollbacks to prevent exploiting bugs in old firmware. I'd love to be able to roll-back firmware on some of my devices to allow me to "jailbreak" them using old firmware.
In some cases your newer firmware may be blowing e-fuses that prevent old firmware from functioning. See the Nintendo Switch, for an example.
To be clear: I think this is anti-consumer and wrong, but manufacturers absolutely do it.
Edit: I also think it should be illegal, by way of consumer regulation. I don't think consumers should have option to waive their right to manufacturers not damaging hardware they own.
ChuckMcM|11 months ago
xp84|11 months ago
Clearly the latter is heavily preferred by most companies.
Szpadel|11 months ago
this should be especially trivial when your device have some usb ports.
you can keep all requirements of only newer or the same version of firmware to flash, with all refuse checks.
if you mess up, you can allow consumers to flash fix using regular pendrive
efitz|11 months ago
Copyright and patent have morphed into evils that drive anti-consumer and anti-competitive behavior, and have driven a “subscription” model that allows rent seekers to achieve their wildest dreams.
throwawayk7h|11 months ago
basch|11 months ago
Im not a fan of firmware lockdowns but I understand other people may value security over moddability.
protocolture|11 months ago
Whats worse is that a lot of the affected hardware was near or EOL anyway, so Cambium was simply helping rescue devices headed for the scrap heap.
water9|11 months ago
grumple|11 months ago
nomel|11 months ago
account42|11 months ago
unknown|11 months ago
[deleted]
clysm|11 months ago
Android systems can do this today. After an orderly shutdown of new software, then it can mark the new stuff as good and not allow older software to boot.
0x457|11 months ago
croes|11 months ago
AlotOfReading|11 months ago
There are commercial offerings (like mender.io, never used) that basically specialize in providing rock solid update infrastructure, but that again takes investment and organizational priority that doesn't exist for non-feature code.
boricj|11 months ago
I'm trying to buck the trend though and on the new embedded system I'm working on, I've specifically designed the upgrade system to be as reliable as I can make it. It goes something like this:
- The new firmware is downloaded to the secondary application slot.
- Just prior to rebooting, the entire state data of the system is serialized as a document and stored on a flash partition.
- The upgrade flag is set, the system reboots and MCUboot does its thing.
- The new firmware finds out a upgrade happened, clears out all the data partitions, restores from the document and then clears out its partition.
The system is basically sanitized and restored after each upgrade. It's also the same codepath that handles saving and restoring the system's configuration by the end-user as well as settings management. If the document schema is for an older version, run the N-to-N+1 schema upgraders on it prior to applying instead of trying to patch the system in-place. If something goes horribly wrong, flip a jumper to trigger the heavy-duty sanitization that nukes the entire external flash (internal flash only contains the bootloader, primary application slot and factory parameters so it's essentially read-only once the application boots).
It might be hubris, but I hope it's good enough that I'll never see a bricked card that can't be resurrected by a factory reset with this project (assuming no hardware damage, no internal flash corruption and no bricking firmware getting signed with production keys seeping through the cracks despite all the checks in place).
x0x0|11 months ago
> "One of every board revision we've ever produced"
The, ah, "special" people we had running engineering didn't even put in the work to be capable of the software querying the board rev. We had to play games like running certain motors past a position limit and seeing if there were limit switches there (or not) to guesstimate board revs.
I'm guessing stories like this are common.
ethan_smith|11 months ago
Offline first approach respects user autonomy and creates a natural safety net against bad updates. Plus, it means your product keeps working even when servers change or get shut down years later or a nuclear war happens. Sure, connectivity has benefits, but a speaker's main job is playing sound, not phoning home. Building offline-first also forces better engineering decisions about longevity and graceful degradation.
It's so hard to find any offline-first apps/devices nowawdays, which is sad to see in a world of algorithms and AI.
This whole situation reminds me of this: https://programmerhumor.io/linux-memes/thats-the-attitude-sa...
the_snooze|11 months ago
Galxeagle|11 months ago
https://news.ycombinator.com/item?id=23534793
Tijdreiziger|11 months ago
https://en.wikipedia.org/wiki/EFuse
steveBK123|11 months ago
See their new app debacle which coupled a non-reversible firmware update that made the hardware incompatible with the old app.
ymyms|11 months ago
3. have a set of conditions to mark the running firmware image as "safe" and have it become the new fallback firmware image for this scenario. That way you can have a recently up-to-date firmware version constantly trailing the new ones
Zenbit_UX|11 months ago
So many companies have begun rolling out updates that makes the device I purchased call home before allowing any user functions and if/when that server goes down my device becomes a brick. This behavior essentially invalidates my ownership of the product and renders it to a service, provided at will by the manufacturer.
Your idea ensures my device will one day become a brick as soon as the manufacturer decides to mark their update requiring internet check-ins “safe”.
If you think I’m exaggerating check out Louis Rossmann‘s YouTube channel.
bmicraft|11 months ago
amelius|11 months ago
tomstokes|11 months ago
It's common now for medium and large companies to have some variant of a cloud platform team: People responsible for shared practices, infrastructure, and processes in the cloud.
Smart hardware companies have done the same for decades. You have a firmware platform team that handles things like update protocols, recovery protocols, testing checklists, on-device OTA update architecture, and other critical functions.
When you're a company like Samsung that continuously releases and develops products this actually increases your time to market rather than decreasing it. You let each product team focus on the parts of the firmware that make their product valuable and free them from having to roll their own update systems
drdaeman|11 months ago
If the management folks have personal health insurance, surely they must understand the concept and the need. And this is a much better deal because unlike actual insurance this is more like "invest once, enjoy forever" type of thing. And multi-stage boot chain, recovery partition and staged rollouts are not some rocket science that needs some serious expertise.
Yet, here we go. Humans are not really rational actors after all, and collective humans are even less so.
javchz|11 months ago
There if something goes wrong during an update, you always have a backup BIOS with the previous version (not necessarily factory settings). If the system fails to boot, it automatically switches to the backup BIOS and restores the main BIOS to the last working version.
neilv|11 months ago
I'm not sure I understand various industries' conventions...
While interviewing for a principal engineer job, I was meeting individually with a bunch of team leads and managers, and one engineer asked how would I design firmware updating for the company's product (which was more critical, complex, and expensive than a soundbar).
I assumed they were probably trying to see whether I would throw in some robustness/resilience (not oversimplify it). So I sketched it out, while hitting notes like diffs, downloading and assembling in staging space, imperfect networking, having at least two firmware "slots", backing out upon boot loop or failure soon after boot, gradual deployment to installed base, contrasting with some less-critical consumer product firmware update practices, etc.
(Either that was a bad answer, or they got distracted thinking about something I'd said, because I was getting odd subconscious backchannel cues, and they were unresponsive when I tried elicit more requirements or guidance about what they were looking for. Maybe there was some standard embedded systems programmer canned answer that I was supposed to recite (analogous to the Web brogrammer 'system design' interview), and they couldn't think of how to nudge me towards the shibboleth without saying it?)
devmor|11 months ago
shantara|11 months ago
Tijdreiziger|11 months ago
https://tweakers.net/reviews/10334/het-einde-van-de-high-end... (Dutch)
werdnapk|11 months ago
I think it usually takes a big rollout for these big companies to actually "hear" their users.
jandrese|11 months ago
I'm reminded of the time a random NPR station accidentally bricked the infotainment systems on thousands of Mazdas and because there was no factory reset feature they had to spend millions replacing head units. That's just bad design.
mytailorisrich|11 months ago
OtherShrezzing|11 months ago
tomstokes|11 months ago
I prefer to keep the factory firmware reset to a manual process that requires user intervention.
For example, holding down the reset button for 10 seconds after plugging the device in.
In my experience, it's not a good idea to have a device automatically roll back firmware and erase user data after failed boots. These mechanisms get triggered too easily during certain power outages (power comes on then goes off just long enough to cause multiple failed boots) or when users are doing simple things like rearranging their power cables.
devsda|11 months ago
I'm still pissed off about LG removing record to disk option from our TV after an upgrade. I've only connected it to internet & upgraded assuming some of those bug fixes resolved few dlna issues otherwise it's always on internet block list.
liendolucas|11 months ago
Why on earth a sound bar needs to update its firmware? Why firmware needs to be in a couple of tweeters and a woofer? It should basically output audio from an input source.
ErrantX|11 months ago
After many years of being burned I always delay system level non-security -related updates at least several days after launch to mitigate the risk.
crazygringo|11 months ago
Do you mean like a physical button? That could work, though I'm not sure I've ever seen it. Holding down power for 10 seconds (or whatever) usually just erases user data, but doesn't reset firmware. Are you aware of any device that does this? But does it require some meta-firmware to roll back the firmware? What if that meta-firmware has a security flaw and needs to be updated? And that update is faulty?
If you're talking about a code sent from your servers to devices to reset, that seems like asking for the impossible. If a firmware update bricks the device, that may very well brick its ability to receive codes at all.
In both situations, it starts to feel like a problem of infinite regress...
JimDabell|11 months ago
tomstokes|11 months ago
Reverting to factory state is the last resort. You don't have users do it unless there is no other good state to return to on the device.
> Just because that version worked a decade ago, it doesn’t mean it’s compatible with the world today.
That's why I said you have to include this in your test procedures.
When you're planning for the long term you can accommodate for these things on your servers.
unknown|11 months ago
[deleted]
boricj|11 months ago
That's a nifty mechanism that also allows downgrade attacks, so it has cybersecurity implications that may or may not be acceptable. Furthermore, it might not be practical or even be possible to restore the system to factory condition due to technical reasons.
The team next door allows its systems to downgrade to a previous minor version with a mandatory factory reset. It however refuses downgrading to a previous major version because it implies the bootloader was upgraded or the storage was repartitioned and they really don't want to rollback that.
account42|11 months ago
ashoeafoot|11 months ago
gorlilla|11 months ago
New firmware is pushed in phases 1%, 5%, 10%, 25%, 50% then full scale.
Each stage has some delay incorporated for acquisition/application and then for telemetry (including support contacts from affected accounts) to determine impact and allow for regression fixes.
The other reason they would phase launches is because of firmware builds being used across multiple CPE models and hardware revisions, where only a small subset of hardware could wind up being problematic, but not discovered until deployment.
When you have millions of devices deployed, even a fraction of devices having an issue can create a shit storm on the support side of things.
It all seems so obvious once you know to think about it.
weinzierl|11 months ago
A failsafe firmware reset back to a safe and secure state yes. The factory state is not necessarily that, so no.
I think devices should keep a last known good state firmware but keeping a full factory state immutable firmware would be irresponsible for many usecases.
fhd2|11 months ago
omoikane|11 months ago
Especially if there is an internal testing stage before actually rolling out to production. It's possible that the users seeing the bricked devices are in fact limited to the initial wave, but the damage is already done.
gblargg|11 months ago
Or perhaps to the very first released firmware version. This way they don't have to support updating from any version to the latest, just from the first one.
unknown|11 months ago
[deleted]
greesil|11 months ago
gwerbret|11 months ago
#1: Requires competence, and/or management that isn't too focused on velocity and features to listen to their engineers' warnings about exactly the sort of problem being discussed here.
#2: Many firmware updates explicitly and specifically want to strip away features that the hardware shipped with (by introducing DRM, paywalls, etc.), so see the comment about management above.
fumufumu|11 months ago
[deleted]