(no title)
pankalog | 4 months ago
Our OTAU architecture uses A/B system updates [1]. Core idea is that both the kernel and the rootfs (read-only) partitions had 2 different bootslots in storage, and the OTAU would only write to the bootslot that is unused. Hence, if something went wrong, the system would automatically fallback to the previous version by just switching the bootslot used. Over the numerous years that that architecture was used, I couldn't find a single post-mortem that resulted in devices being bricked. Something to note is that the rootfs partition was overlaid with a writable partition for persisting state data etc.
Now that was a $two-figure USD device, not a $5/6-figure USD electric SUV. Is this a cost-cutting measure? At those price levels, doubling your NAND size is not even half of a percent of the total cost of the vehicle.
Unless there was a serious issue that the used bootslot corrupted the unused bootslot, then I don't see how this could have happened.
It's saddening that car manufacturers are so unserious about the code they're deploying.
AlotOfReading|4 months ago
The big auto OEMs are just as sensitive to absolute BOM cost optimization, regardless of the percentage increases. I don't think this was a bootslot issue though, regardless of the word "bricked". Even as backwards and ill-advised as auto software can be, generally accepted practice is that updates are impossible while the vehicle is in motion. This is usually enforced by systems shared across multiple OEMs through the tier system.
The situation sounds more like a disastrously buggy new firmware.
I wouldn't put either past stellantis though. The auto industry already scrapes the bottom of the proverbial barrel sometimes, and stellantis isn't exactly known for their top of market compensation.
Rebelgecko|4 months ago
potatolicious|4 months ago
It definitely reduces the risk of updates, but it absolutely doesn't eliminate it.
The A/B updater itself is a surface area - especially if the logic is complex and there are other child devices that are updated at the same time (likely for cars). In that case you're not just coordinating between two independent partitions, you're coordinating between 2 * N partitions, half of which have dependencies on each other.
Also, the key bit of the mechanism is that upon successful boot the new partition is flagged as "good", which causes flags to be set to assert that the update was successful and the backup partition is no longer needed. That logic can (and does) fail - if your failure point occurs after this checkpoint you're hosed still because you're past the point of no return.
Making that worse is that in most systems you want the "it's all good" checkpoint to occur early - you don't want to, for example, wait multiple minutes for all user services to come up. But that also means that if a critical failure happens in said services, you're past the checkpoint.
palmotea|4 months ago
Could just be a competence and priorities problem. If it's cost cutting, it feels way more likely that some PM cut some story from a sprint to hit a deadline (and objections were either not raised or ignored), than they did some engineering analysis and explicitly decided to save $3 per vehicle by cutting the NAND size.
Edit: Actually, I don't think that technique would have helped, the problem wasn't a botched update, but a seriously buggy one. From the OP:
> The buggy update doesn't appear to brick the car immediately. Instead, the failure appears to occur while driving—a far more serious problem.
general1465|4 months ago
That and combined with general refusal of new automotive bootloaders to downgrade. You can go only up in versioning. So even that you could have working version on second partition, it will never get loaded because it has lower version than currently one you are running.
shadowpho|4 months ago
1) Total cost of the vehicle does not matter. What does matter is the operating margin. Half a percent of the total cost of the vehicle will move them from 2% margin to 1.5% margin. (Ford has operating margin of 2% as an example)
In other words an increase in 0.5% cost of total vehicle will reduce their profits by 25%.
That’s a huge number now! Note also that car manufacturers are in a bad spot because their volumes are fairly low (smartphone = 1M/yr, car = 40k/yr) and have harsher requirements for chips, driving the cost way up.
2)AB updates are great, but they can still fail or get soft locked. Especially important around code when you configure the slot to be good and when bad.
maxerickson|4 months ago
It's also more dynamic than your presentation. They have a little bit of pricing power, so a small increase doesn't all come out of the margin.
jcalvinowens|4 months ago
That's the hard part though.
It's shockingly common in my experience to have an A/B boot setup, but no actual logic in the userspace application to switch back to the other partition if something goes wrong. It's just a defense against somebody pulling the plug during the OTA, it doesn't protect against software bugs at all.
kijin|4 months ago
It's totally possible that the update corrupted the other bootslot as well. If those blocks aren't off-limits to the updater program, it's just an off-by-one error waiting to happen. Slot 0 or slot 1?
Another possibility is that the updated version booted up just enough not to trigger the automatic fallback, and then got stuck in a loop.
Telaneo|4 months ago
avidiax|4 months ago
What could easily have happened is that the negotiators didn't include A/B updates in their spec, or they only specced A/B updates at 1GB OTA size.
They do their usual hammering on price, and the head unit or ECU manufacturer gave them some savings by cutting storage space to the bone.
Maybe it was still enough for A/B updates, until the usual software bloat took the updates past the critical limit.
They could still do a safe update by doing an A/B/A update (where B is a shrunken, update-only OS), but that requires development time, and the engineers should already be working on the next vehicle.
thunfischbrot|4 months ago
mikkupikku|4 months ago
[deleted]
apex_sloth|4 months ago
stefan_|4 months ago
(Most computers in a car don't need duplicate partitioning because they can be bootstrapped from a central computer)
zoeysmithe|4 months ago
We just never bothered to develop a new term. Maybe 'soft-bricked?' 'Semi-bricked?' I would like journalists at least to start using more accurate terms, but 'bricked' I imagine gets a lot more engagement and ad impressions, so here we are.
stevenhubertron|4 months ago
upboundspiral|4 months ago
CoastalCoder|4 months ago
I'm curious if failing to do that opens Jeep up to legitimate lawsuits.
jacquesm|4 months ago
ThatMedicIsASpy|4 months ago
monero-xmr|4 months ago
herbturbo|4 months ago
The only American-made vehicle that sold in any volume outside the US was Tesla and that is already over.
zoeysmithe|4 months ago