top | item 39416784

How much is one terabyte of data?

46 points| Judyrabbit | 2 years ago |github.com

46 comments

order
[+] SushiHippie|2 years ago|reply
> The high-speed SSD reads 300 megabytes of data per second

Aren't normal SSDs now at 500-600MB/s read?

And then there are nvme SSDs which can read up to 7GB/s

[+] danking00|2 years ago|reply
It'd be interesting to see a peak-sequential-bandwidth by cost-per-gigabyte plot. The number I keep in my head is 500 MiB/s, but you're right that there are much faster drives out there [1]. Of the public clouds: Google's "Local SSD" claims ~12,000 MiB/s but they're ephemeral and you need 12 TiB of disks to hit that bandwidth [2][4]. AWS has these io2 SSDs which claim 4,000 MiB/s [3].

On the other points of the article, even if you had a huge disk array plugged into the machine, how many cores can you also plug into that computer? I suppose there will always be a (healthy, productive) race here between the vertical scaling of GPUs + NVMe SSDs and the horizontal scaling of CPUs and blob storage.

EDIT: formatting.

[1] First Google result is Tom's hardware: https://www.tomshardware.com/features/ssd-benchmarks-hierarc...

[2] https://cloud.google.com/compute/docs/disks/local-ssd#nvme_l...

[3] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/provisio...

[4] The ephemerality has two downsides. First, you have to get the data onto that local SSD from some other, probably slower, storage system (I haven't benchmarked GCS lately, but that's probably your best bet for quickly downloading a bunch of data?). Second, you need to use non-spot instances which are 3-6x the price.

[+] thworp|2 years ago|reply
This tripped me up as well. While modern NAND SSDs do slow down to about a third of the advertised speeds on non-sequential reads/writes, that is still 2-3 GB/s. The "normal" (SATA) SSDs are just limited by SATA's link speed, they could easily get to similar rates.
[+] thrwwycbr|2 years ago|reply
"Up to xyz MB/s"

This very detail is the reason USB flash drives have useless performance numbers on them, which oftentimes lead to 1/100th of the advertised performance.

Everything that's bigger than the sector size (4k) is usually far lower than those numbers.

[+] causi|2 years ago|reply
I've never seen an SSD as slow as 300MB/s. My first SSD from a decade ago was over 500MB/s sequential read.
[+] timenova|2 years ago|reply
I'm pretty sure they mean HDDs as they write hard disks in each paragraph below.
[+] HPsquared|2 years ago|reply
1 million is 100^3 (i.e. a cube 1 metre across, contains a million 1 cm cubes)

1 billion is 1000^3 (1 m cube contains a billion 1 mm cubes)

1 trillion is 10,000^3 (I can't visualize 0.1 mm, so increase size: the big cube is now 10 m on side, with 1 mm parts)

So the million and billion you could have on your desk made of parts that you can see and handle; the trillion is just a bit too big for that, you'd need a big room with a high ceiling.

Edit: and, of course, you can also see a million in 2D. A sheet of grid paper 1 m across, with 1mm grid squares. Or, more practically, try counting the pixels on a 720p monitor: 921600 pixels on that screen. Use checkerboard pattern.

[+] hnuser123456|2 years ago|reply
0.1mm is about the thickness of a sheet of paper.
[+] favourable|2 years ago|reply
https://en.m.wikipedia.org/wiki/Zettabyte_Era

> The Zettabyte Era or Zettabyte Zone[1] is a period of human and computer science history that started in the mid-2010s. The precise starting date depends on whether it is defined as when the global IP traffic first exceeded one zettabyte, which happened in 2016, or when the amount of digital data in the world first exceeded a zettabyte, which happened in 2012. A zettabyte is a multiple of the unit byte that measures digital storage, and it is equivalent to 1,000,000,000,000,000,000,000 (1021) bytes.

> According to Cisco Systems, an American multinational technology conglomerate, the global IP traffic achieved an estimated 1.2 zettabytes (an average of 96 exabytes (EB) per month) in 2016. Global IP traffic refers to all digital data that passes over an IP network which includes, but is not limited to, the public Internet. The largest contributing factor to the growth of IP traffic comes from video traffic (including online streaming services like Netflix and YouTube).

[+] ykonstant|2 years ago|reply
A couple of AAA games ٩(ఠ益ఠ)۶
[+] causi|2 years ago|reply
Really wish games would give you a "cleanup" option in the settings. If I'm only going to use the Ultra textures it could delete all the others, and same for if someone is only going to use Medium, etc.
[+] xhrpost|2 years ago|reply
I think it should also be pointed out that a lot of DB's have a lot of redundant data due to indexing. When your DB is small, an index seems negligible, but once you get into millions or even billions of rows, you see first hand just how much space they can take up.
[+] JackSlateur|2 years ago|reply

  But for a city wide or even some state wide institutions, it (300rps) is really a big number.

Well, I laughed. It is common for monitoring infrastructure to poll at the minute-range, and to have dozens of probes per server. 300 is really few for a whole company
[+] itslennysfault|2 years ago|reply
Interesting read, but it could REALLY use some visualization to drive the point home.
[+] eulenteufel|2 years ago|reply
The first thing that comes to my mind is a quantum state of 36 qubits represented in 64-bit complex floats.
[+] forinti|2 years ago|reply
The first thing I thought of was a pile of 3 million 360KB floppies.
[+] kenm47|2 years ago|reply
1 tb of searchable data is $25/mo if you do it right.
[+] textread|2 years ago|reply
I am interested in the back of the envelope calculations you did to come to this conclusion. Would you please elaborate if possible?

I know of an early stage YC startup that has a 6TB Postgres DB. Would it be fair to say that the DB hosting (neglecting replica, engineering time) can be done at $150/month?

[+] dadzilla|2 years ago|reply
From Groq using Llama 2 70b: To calculate the volume of space required to contain the world's population, we need to first convert the area calculated earlier to square feet.

Area (square feet) = 600 square miles × 5,280 square feet/square mile = 3,160,000,000 square feet

Next, we need to assume a height at which the population can comfortably stand. Let's assume an average height of 5 feet.

Volume = Area (square feet) × Height (feet) Volume = 3,160,000,000 square feet × 5 feet Volume = 15,800,000,000 cubic feet

Now, we need to convert the volume from cubic feet to cubic miles. There are 1,476,333,333 cubic feet in a cubic mile, so:

Volume (cubic miles) = Volume (cubic feet) ÷ 1,476,333,333 Volume (cubic miles) = 15,800,000,000 ÷ 1,476,333,333 Volume (cubic miles) = 10.7 cubic miles

Therefore, to contain the entire world population, we would need a volume of approximately 10.7 cubic miles, assuming an average height of 5 feet and a density similar to that of a solid object.

Please note that this calculation is purely theoretical and doesn't take into account factors like personal space, comfort, and actual population density.

[+] gcr|2 years ago|reply
Please don't just paste AI-generated summaries into a low-effort comment. I don't know why your summary only picked up on the first half of the first paragraph, but estimating global population volume has nothing to do with the article.

Also, even taking the assumptions at face value, your model's calculations are wrong by several orders of magnitude. There are not 5,280 square feet in one square mile and 600*5280 is not 3,160,000,000.

[+] cyclotron3k|2 years ago|reply
If you weren't concerned with preserving life, you could take the current population of Earth, multiply by the average weight of a human (69kg according to Wolfram Alpha's sources), assume that we have the same density as water, and find that we could all fit in a 0.5km³ box.