SanDisk Extreme SSDs keep abruptly failing–firmware fix for only some promised

[+] bb88|2 years ago|reply

I'm thinking maybe the Gamer's Nexus approach might work well here, where they buy failed hardware to do an autopsy on it -- and then publish the results of it on youtube as they recently have done for the ASUS high end motherboards that cook the AMD chips.

It allows the the media companies access to the failed hardware to do their own autopsy on it, and it saves the users from needing to go through a painful RMA process, complicated by companies not willing to admit fault.

[+] mey|2 years ago|reply

Side note, it wasn't just ASUS motherboards. Just ASUS had their additional issues and poor response.

[+] whitemary|2 years ago|reply

The "high end" PC parts market comprises such a horrendous pit of garbage.

The only way to know if anything even works to begin with is to read all (poorly written) manuals front to back taking notes, then procure the rest of the parts and rigorously test them yourself within all of their 30 day return windows. And even then you're virtually guaranteed to miss some glaring issue.

Just last week, an obscure forum post from someone who already went through the tech support/RMA gamut saved me from wasting a month + $5K on a build with a motherboard that doesn't support sleep mode, which the manufacturer ASRock doesn't mention anywhere.

[+] miahi|2 years ago|reply

Yeah, the sleep is a mess with new hardware (and MS also does not help, with the new OS sleep). I have a "creator"-targeted MB from Gigabyte and it does sleep, but if it wakes up immediately after I put it to sleep (because of a mouse move), it does a series of 7-8 BIOS initalizations/restarts and it resets the full BIOS in the process.

[+] userbinator|2 years ago|reply

This is the result of an industry optimising for profit and not longevity. That's why SLC NAND has become almost extinct and priced beyond reason. I don't care how fast or large a storage device is if isn't reliable.

[+] tpolzer|2 years ago|reply

There are SSDs on the market that use all of their TLC flash as SLC cache, so you can almost use them as SLC drives if you partition them to leave 2/3 empty.

Eg the ADATA XPG SX8200. Look for whole drive fill speed benchmarks, if they use the whole drive as cache, the first third is fast (usually the SLC area is much smaller).

[+] Seattle3503|2 years ago|reply

Is there the equivalent of Backblaze's HD stats for SSDs?

[+] wmf|2 years ago|reply

I don't think longevity or flash has anything to do with it. Consumer devices of any type just can't afford reliable firmware.

[+] scns|2 years ago|reply

Micron showed a pseudo SLC drive.

https://www.anandtech.com/show/18863/micron-updates-data-cen...

[+] Dalewyn|2 years ago|reply

The sellers are but one part of the larger market system at large.

If SLC NAND went extinct, that's because both the sellers and the buyers (read: customers, aka end users) didn't see value in reliability as much as other factors like storage density and price-per-bit.

You, as someone who does want reliability above all else, are an outlier.

[+] short_throw|2 years ago|reply

I'm 2 for 2 with SanDisk ssds suddenly dieing due to the controller going out way short of the drive's expected life.

That was years before this issue but my rule of never buying a SanDisk product again is serving me well.

[+] csdvrx|2 years ago|reply

Sandisk "industrial" SD cards are the only ones that have died on me (and no, they didn't come from Amazon but from a reliable source)

[+] flyinghamster|2 years ago|reply

I've had equally bad luck with SanDisk's MicroSD cards. Samsung cards rarely ever go out to lunch in my Raspberry Pi systems, but I've never had a SanDisk card last more than six months.

As (sometimes six-year-old) Samsung cards get retired, I've gone to... Samsung. This time, I'm getting their cards intended for surveillance cameras and other write-intensive duty.

[+] asmor|2 years ago|reply

I had a phantom issue of my PC not booting (black lit screen) unless cold for about 30 minutes that followed me across 2 different builds. I eventually - after going through every other component - figured out it was a bad SanDisk SSD that wasn't even a boot drive simply not responding to ATA commands and the MSI BIOS simply didn't have a timeout during early boot disk iteration. It was extra weird because that drive worked fine if it initialized correctly.

I have not bought a WD or SanDisk drive since, I'm still very pissed that I spent days debugging this issue, decided I needed to scrap the entire machine and then still had the issue. Who thinks of a bad drive as a reason you can't even boot into BIOS?!

[+] sockaddr|2 years ago|reply

I’ve also suffered a SanDisk controller failure on an SSD years back. Been doing Samsung ever since.

[+] whitepoplar|2 years ago|reply

Just my own experience, but I have a 1TB SanDisk Extreme V2 which was made several years ago. It routinely gets so hot that it nearly burns my skin to the touch, and when it does get that hot, it randomly unmounts from my computer. Last week, my SuperDuper backup to the drive failed due to underlying corruption, which strangely passed Disk Utility's health checks. I'm so done with this product.

[+] tobilg|2 years ago|reply

My 1TB version doesn't get as warm as yours, but also randomly unmounts from MacOS as well.

[+] danparsonson|2 years ago|reply

Try a heatsink next time

[+] inconceivable|2 years ago|reply

this is more common than you may think at all levels of the industry. some failures get more publicity than others. all the hand wringing over vendor silicon reliability data and TRIM and RAID topology and i/o pattern optimization doesn't matter one bit when the firmware just decides to delete itself along with all the data for no reason at all.

https://www.anandtech.com/show/15673/dell-hpe-updates-for-40...

this one was a real pain in the ass to deal with. always make testable backups, people. backups are not the same as redundancy. a 1, 2, 3, even 7 or 14 day recovery point is far better than poof it's gone.

[+] magicalhippo|2 years ago|reply

On a related note, I had a OCZ SSD back in the days which bricked itself all of a sudden. I was running nightly full-disk cloning of my disk via TrueImage, in addition to "realtime" backup of my user directory via Crashplan (back when they were good), so I was back up and running in less than 30 minutes with no important data loss.

I know there are tools like Restic that can do what Crashplan did, but what's the TrueImage equivalent for Linux? Ie something I can use to clone my primary disk nightly in the background, including boot partition, incrementally with periodic full clones, and that supports resizing partitions (both up and down) in case the replacement disk isn't of equal size?

I know of Clonezilla but from their front page it can't do incremental which is a showstopper. With TrueImage each incremental image takes only about 2-3GB per day on average, full is 500GB. It also seems to only support resizing up partitions which isn't great as it means I can't easily use old disks as emergency restore targets like I did when my OCZ died.

I know ZFS root + send/receive is an option but as much as I like ZFS, I'm not comfortable running it on root yet.

[+] flyinghamster|2 years ago|reply

I had a OCZ Vertex brick itself like that as well. Fortunately, it was a ZFS cache drive, so it was just a performance hit, but I never bought an OCZ product again.

[+] tinglymintyfrsh|2 years ago|reply

Realize the marketing and support (likely none) of retail parts like these: manufacturers don't believe enterprise customers will not scream at them if some or many of them fail. They can play fast-and-loose to push the boundaries to get marketshare. Big name vloggers and tech reporters may complain.

OTOH, enterprise parts are built and supported towards conservatism and reliability.

There is crossover and a spectrum between the 2, but this case isn't a complete surprise.

[+] lastdong|2 years ago|reply

Bought a nvme SSD and failed after 3 months of use. Read similar occurrences in reviews amazon. It was a Crucial P3, now reading about Sandisk.

Storage (from well known brands) used to be the most reliable component. Not sure what is going on, but feels like quality control is not as good.

[+] zokier|2 years ago|reply

Storage has been always a crapshoot. Deathstars, intel 8mb bug, failing barracudas, sandforce bugs, the list is endless.

[+] JohnBooty|2 years ago|reply

Wow... huh? I don't know about that. Been into personal computers since the 80s and it seems like storage has always failed orders of magnitude more often than anything else.

[+] ilyt|2 years ago|reply

We even seen Intel enterprise SSDs die like month after warranty, with 98% life left...

[+] aftbit|2 years ago|reply

Random anecdote but I usually run Samsung (8X0 Pro) or Western Digital (Blue/Black) when I need cheap consumer NVMe drives. Otherwise I run used Intel enterprise SSDs with lots of life. Any component can fail but I have had good luck so far with these (fingers crossed). Of course, take frequent backups of any important storage.

[+] l8rlump|2 years ago|reply

I've had issues with WD Blue recently. MacOS intermittently wouldn't boot on a 3-month-old drive. I wonder how much is shared between this product and the Sandisk, given that WD now own Sandisk?

https://community.wd.com/t/wd-blue-250-gb-sa510-issues/27981...

https://community.wd.com/t/wd-blue-sa510-sata-ssds-critical-...

[+] 7speter|2 years ago|reply

Jeesh bought a crucial drive last year and then the firmware bug thing happened, so instead of buying another crucial drive, I bought a sandisk (ultra sata) drive and now this happens. None of the first party nand makers seem to be immune from these nand killing firmware bugs, not even samsung.

Actually, I take that back. Maybe solidigm/SKHynix?

[+] JohnBooty|2 years ago|reply

Pretty much any brand has had some issues here and there.

If you actually want a durable SSD, your best bet is probably to buy "new old stock" of a model that is 1-3 years old and has proven to be reliable.

Of course, it's not always easy to gauge the reliability of a product based on forum posts. A popular product that is 99.9% reliable will... still have thousands of unhappy people crying on forums if they sell millions of units.

In the end, you should probably just have some kind of daily (or better) backup system so that it's not a huge deal if your drive kicks the bucket. That probably makes more sense than obsessing over reliability in a world in which we as consumers don't have much insight into actual failure rates.

[+] whitepoplar|2 years ago|reply

I'd read through this thread before purchasing an SK Hynix SSD: https://twitter.com/xenadu02/status/1495693475584557056?s=20

[+] unknown|2 years ago|reply

[deleted]

[+] winrid|2 years ago|reply

I have bad luck with buying products that have Extreme or Fatality in the name...

[+] flyinghamster|2 years ago|reply

Indeed, anything that has a "gamer" vibe to it has become suspect for me. I'm not interested in eking out that last smidgen of speed if it comes at the cost of reliability. With Asus, I'd go for the corporate stable motherboards rather than the gamer models.

[+] PenguinRevolver|2 years ago|reply

Can this happen to 1TB models? Currently using one in the back of my computer 24/7.

[+] varenc|2 years ago|reply

If anyone wants a recommendation for an alternative, I've had a good experience with the Crucial X8 4TB external SSD. Not quite as fast as the SanDisk Extreme Pro but pretty decent. Wouldn't trust it 100% though. (Also after writing ~300GB continuously the write speed falls from ~900MB/s to 90MB/s).

93 comments