top | item 27475240

(no title)

notaplumber | 4 years ago

> Because flash does not overwrite anything, ever.

This is repeated multiple times in the article, and I refuse to believe it is true. If NVME/SSDs never overwrote anything, they would quickly run out of available blocks, especially on OSs that don't support TRIM.

discuss

order

cduzz|4 years ago

There's nuance to this; the deletes / overwrites are accomplished by bulk wiping entire blocks.

Rather than change the paint color in a hallway you have to tear down the house and build a new house in the vacant lot next door that's a duplicate of the original, but with the new hallway paint.

To optimize, you keep a bucket of houses to destroy, and a bucket of vacant lots, and whenever a neighborhood has lots of "to be flattened houses" the remaining active houses are copied to a vacant lot and the whole neighborhood is flattened.

So, things get deleted, but not in the way people are used to if they imagine a piece of paper and a pencil and eraser.

slaymaker1907|4 years ago

Just to add to the explanation, SSDs are able to do this because they have a layer of indirection akin to virtual memory. This means that what your OS thinks is byte 800000 of the SSD may change it's actual physical location on the SSD over time even in the absence of writes or reads to said location.

This is a very important property of SSDs and is a large reason why log structured storage is so popular in recent times. The SSD is very fast at appends, but changing data is much slower.

zdw|4 years ago

inspired by that last sentence, the analogy could be rewritten as:

  - lines on page
  - pages of paper
  - whole notebooks
and might be easier for people to grok than the earlier houses/paint analogy.

wand3r|4 years ago

I think the explanation is sound maybe (I am not that familiar) but the analogy gets a bit lost when you talk about buckets of houses and buckets of vacant lots.

Maybe there is a better analogy or paradigm to view this through.

daniellarusso|4 years ago

Spoiler alert - This is the plot to ‘The Prestige’.

eqvinox|4 years ago

It's true and untrue depending on how you look at it. Flash memory only supports changing/"writing" bits in one direction, generally from 1 to 0. Erase, as a separate operation, clears entire sectors back to 1, but is more costly than a write. (Erase block size depends on the technology but we're talking MB on modern flash AFAIK, stuff from 2010 already had 128kB.)

So, the drives do indeed never "overwrite" data - they mark the block as unused (either when the OS uses TRIM, or when it writes new data [for which it picks an empty block elsewhere]), and put it in a queue to be erased whenever there's time (and energy and heat budget) to do so.

Understanding this is also quite important because it can have performance implications, particularly on consumer/low-end devices. Those don't have a whole lot of spare space to work with, so if the entire device is "in use", write performance can take a serious hit when it becomes limited by erase speed.

[Add.: reference for block sizes: https://www.micron.com/support/~/media/74C3F8B1250D4935898DB... - note the PDF creation date on that is 2002(!) and it compares 16kB against 128kB size.]

matheusmoreira|4 years ago

> Understanding this is also quite important because it can have performance implications

Security implications too. The storage device cannot be trusted to securely delete data.

IshKebab|4 years ago

By any reasonable definition they do overwrite data. It's just that they can't overwrite less than a block of data.

tzs|4 years ago

If a logical overwrite only involved bits going from 1 to 0, are and drives smart enough to recognize this and do it as an actual overwrite instead of a copy and erase?

isotopp|4 years ago

Flash has a flash translation layer (FTL). It translates linear block addresses (LBA) into physical addresses ("PHY").

Flash can write blocks at a granularity similar to a memory page (cells, around 4-16 KB). It can erase only sets of blocks, at a much larger granularity (around 512-ish cell sized blocks).

The FTL will try to find free pages to write your data to. In the background, it will also try to move data around to generate unused erase blocks and then erase them.

In flash, seeks are essentially free. That means that it does no longer matter if blocks are adjacent. Also, because of the FTL, adjacent FTL are not necessarily adjacent on the physical layer. And even if you do not rewrite a block, it may be that the garbage collection moves data around at the PHY layer in order to generate completely empty erase blocks.

The net effect is that positioning as seen from the OS no longer matters at all from the OS layer, and that the OS layer has zero control over adjacency and erase at the PHY layer. Rewriting, defragging, or other OS level operations cannot control what happens physically at the flash layer.

TRIM is a "blatant layering violation" in the Linus sense: It tells the disk "hardware" what the OS thinks it no longer needs. TRIM'ed blocks can be given up and will not be kept when the garbage collector tries to free up an erase page.

anarazel|4 years ago

> In flash, seeks are essentially free. That means that it does no longer matter if blocks are adjacent.

> The net effect is that positioning as seen from the OS no longer matters at all from the OS layer, and that the OS layer has zero control over adjacency and erase at the PHY layer. Rewriting, defragging, or other OS level operations cannot control what happens physically at the flash layer.

I don't agree with this. The "OS visible position" is relevant, because it influences what can realistically be written together (multiple larger IOs targeting consecutive LBAs in close time proximity). And writing data in larger chunks is very important for good performance, particularly in sustained write workloads. And sequential IO (in contrast to small random IOs) does influence how the FTL will lay out the data to some degree.

notaplumber|4 years ago

Thanks for this part, I feel like this was a crucial piece of information I was missing. Also explains my observations about TRIM not being as important as people claim it is, the firmware on modern flash storage seems more than capable of handling this without OS intervention.

throwaway09223|4 years ago

The author clearly explains how this works in the sentence immediately following. "Instead it has internally a thing called flash translation layer (FTL)" ...

notaplumber|4 years ago

I unfortunately skimmed over this, isotopp's explanation helped clear things up in my head.

aidenn0|4 years ago

Perhaps they mean it must erase an entire block before writing any data, unlike a disk that can write a single sector at a time?

dragontamer|4 years ago

The issue is that DDR4 is like that too. Not only the 64 byte cache line, but DDR4 requires a transfer to the sense amplifiers (aka a RAS, row access strobe) before you can read or write.

The RAS command eradicated the entire row, like 1024 bytes or so. This is because the DDR4 cells only have enough charge for one reliable read, after that the capacitors don't have enough electrons to know if a 0 or 1 was stored.

A row close command returns the data from the sense amps back to the capacitors. Refresh commands renew the 0 or 1 as the capacitor can only hold the data for a few milliseconds.

------

The CAS latency statistic assumes that the row was already open. It's a measure of the sense amplifiers and not of the actual data.