top | item 34803186

What if mass storage were free? (1980)

51 points| dwenzek | 3 years ago |dl.acm.org

48 comments

order

dwenzek|3 years ago

I just found this quite old paper and it came as a surprise to me to discover that the idea of append-only storage is not 20 years old but more than 40!

The older work I was aware of is on "The design and implementation of a log-structured file system" (1)

So this is with pleasure that I learned that these ideas was around in the 80:

- Deletion considered harmful

- A non-deletion strategy using timestamps

- The importance of accessing past data

- A non-deletion strategy can improve both integrity and reliability

(1) https://dl.acm.org/doi/10.1145/146941.146943

codemac|3 years ago

Sadly 1992 is 31 years ago. The authors pushed for log structured filesystems in an earlier paper in 1988 : Beating the I/O Bottleneck https://www2.eecs.berkeley.edu/Pubs/TechRpts/1988/5760.html . It was inspiration for many storage appliances, NetApp probably being a very strong example.

Though many were thinking about these ideas in the 88-92 timeframe, as Tape storage systems are roughly speaking append only, so lots of the ideas of a logged filesystem are around the increased random read from disk drives.

oakwhiz|3 years ago

A non-deletion strategy should consider including an encryption and key management strategy to enable retroactive secure deletion without impacting availability, reliability, and performance. This seems to be missing from a lot of systems that deal with personal information.

082349872349872|3 years ago

Paper-based accounting was append-only, so I think the idea's always been there but was uneconomic in machine readable media for a long time.

(in particular, "new master = old master + updates" card/tape jobs were in principle append-only but —due to finite number of tapes— in practice overwriting)

refset|3 years ago

The topic of dealing with history in databases seems to go most of the way back to the beginning of the field. I'm still hoping a copy of "Bubenko (1977) The Temporal Dimension in Information Modelling" turns up on the web eventually as I'd love to read it.

The 1980 paper you linked is touched on briefly at the beginning of this Strange Loop talk on "Light and Adaptive Indexing for Immutable Databases (2022)": https://www.youtube.com/watch?v=Px-7TlceM5A

srhtftw|3 years ago

The no-overwrite storage architecture of Postgres from 1985 also took advantage of optical write-once read-many (WORM) drives developed in the late 70's.

kragen|3 years ago

the idea of append-only storage is surely older than pacioli

mouse_|3 years ago

I'm fairly certain data and records have been sewn into tapestries for thousands of years.

MisterTea|3 years ago

Plan 9 implemented this concept in the worm cached file server, one of the on-disk file systems used in plan 9. The idea was you have a disk based cache and a WORM (write once, read many) dump consisting of optical juke boxes. Writes to the fs are stored in the cache until the fs is dumped to worm, manually or on a schedule (hard-coded to do this 2am every night.) http://man.9front.org/4/cwfs

The idea was to reduce the cost of storage by removing long term data from costly hard disks and storing it on cheap magneto-optical disks which like CD's could be stored in an automated juke box. Write all the data you want to the cache, then commit to worm. As the worm fills, you just buy another disk and put it in the jukebox. The history(1) command then gives you a files history as a set of paths you can bind over another path to use an old version of a file instead of copying it. Its really a file system for programmers. http://doc.cat-v.org/plan_9/4th_edition/papers/fs/

This idea was expanded on with Venti/Fossil which allows you to build file systems from arbitrary venti data sets. http://doc.cat-v.org/plan_9/4th_edition/papers/venti/

Hooray_Darakian|3 years ago

> Optical discs promise to come one to two orders of magnitude closer to the limiting case of free mass storage than ever before. Other features of optical discs include improved reliability and a single technology for both on-line and archival storage with a long shelf life. Because of these features and because of (not in spite of) their non-deletion limitation, it is argued that optical discs fit the requirements of database systems better than magnetic discs and tapes.

Wild view from where we sit today, but CDs were ~700MB in 1982. Seagate launched a 5MB hard drive in 1980 so.... not entirely absurd to think that `just don't delete things` could be the way of the future. We sorta adopted `just don't delete things` anyway though not with respect to RDBMS systems.

Thanks for sharing!

PaulDavisThe1st|3 years ago

1988: Schlumberger Cambridge Research takes possession of a new 1MB drive to be added to its VAXcluster. The drive is the size of ... a small refridgerator. It was quite a day!

ocal5|3 years ago

Isn’t the way of _Glacier_ ?

tpmx|3 years ago

We now (2023) live in a time where storing years of text and even audio is essentially free. Storing years of video is still actually costly.

Btw: You need about 12 TB for a 1 year video stream at 3 Mbit/s, so it's certainly doable, but it's not cheap.

tpmx|3 years ago

I'm still thinking of recording a 30 year video stream of my backyard just for the sake of it. :)

pclmulqdq|3 years ago

It's interesting that we have almost started to live in this world. I have a half-written blog post on this phenomenon but I guess I'm 45 years too late.

Interestingly, Google and Facebook seem to have basically done it right with their exascale filesystems. The same with object stores.

usrusr|3 years ago

Reminds me of the various "what if all memory was non-volatile" that made the rounds when Intel Optane entered the stage. A bit like the inverse of this, but the caveats might turn out similar: in one case you'd still want a well-defined resettable area, in the other case you'd still want to avoid having to deal with arbitrarily long addresses which would at some point become as bad as seek times even if hypothetically seek times in the stricter sense did not exist.

bob1029|3 years ago

If mass storage were free, then everything would be append-only by default. There would be no excuse to not do this.

A major benefit of append-only is that your writes are always ideal for whatever storage medium. Especially magnetic or tape. Combine append-only with batching of transactions (i.e. across 1-10 milliseconds at a time), and you can write multiple txns per disk I/O operation (assuming txn size < storage block size).

DrSAR|3 years ago

Isn't information retrieval potentially more costly if you have to search over a larger sea of useless blocks. Certainly is the case in my garage+house where I can now store way more than I ever could in previous stages of my life.

kragen|3 years ago

what if you accidentally wrote your private key, photos of your nude boyfriend, or evidence of a crime to your mass storage

fmajid|3 years ago

Leo Szilard's solution to the problem of Maxwell's Demon was that the acto of deleting data the demon must perform is the thermodynamically limiting factor. Deleting selective data efficiently is in fact one of the greatest challenges in large production databases, and in an era of increasing privacy restrictions like GDPR's right to deletion, an increasing challenge for database operators.