top | item 47013168

(no title)

repeekad | 15 days ago

I once asked one of the original YouTube infra engineers “will you ever need to delete the long tail of videos no one watches”

They said it didn’t matter, because the sheer volume of new data flowing in growing so fast made the old data just a drop in the bucket

discuss

order

MagicMoonlight|15 days ago

Now that they can harvest it all for AI training, that decision was the cheapest and greatest thing they ever did.

Imagine trying to pay for all that content, nobody on earth would be able or willing to supply it.

paulryanrogers|15 days ago

PeerTube is a thing. I like to think without centralized players like YT, that P2P supported federation may have gained a better foothold.

lysp|14 days ago

I seem to recall reading that the HD variations may get removed leaving only 480p or lower for older unwatched videos.

The original upload would likely still be stored, but not available for viewing.

Nevermark|14 days ago

That would be an odd thing to do. HD is low resolution already, and 480 is noticeably worse.

If they really wanted to compress, take out every other frame, and regenerate those frames with a neural decoder. But I don't know why that would be worth the effort for a stable number of low res files either.

wasmainiac|15 days ago

I wonder if that still holds true? The volume of videos increases exponentially especially with AI slop, I wonder if at some point they will have to limit the storage per user, with a paid model if you surpass that limit. Many people who upload many videos I guess some form of income off YouTube so it wouldn’t that be that big of a deal.

weird-eye-issue|15 days ago

What they said only holds true because the growth continues so that the old volume of videos doesn't matter as much since there's so many more new ones each year compared to the previous year. So the question is more about whether or not it will hold true in the long term, not today

pogue|15 days ago

I assume it's an economics issue. As long as they continue making money off the uploads to a higher extent than it costs for storage, it works out for them.

pwdisswordfishy|14 days ago

> The volume of videos increases exponentially

Source?

jl6|15 days ago

One day, it will matter. Not even Google can escape the consequences of infinite growth. Kryder's Law is over. We cannot rely on storage getting cheaper faster than we can fill it, and orgs cannot rely on being able to extract more value from data than it costs to store it. Every other org knows this already. The only difference with Google is that they have used their ad cash generator to postpone their reality check moment.

One day, somebody is going to be tasked with deciding what gets deleted. It won't be pretty. Old and unloved video will fade into JPEG noise as the compression ratio gets progressively cranked, until all that remains is a textual prompt designed to feed an AI model that can regenerate a facsimile of the original.

asah|15 days ago

You can see how Google rolls with how they deleted old Gmail accounts - years of notice, lots of warnings, etc. They finally started deletions recently, and I haven't heard a whimper from anyone (yet).

dyauspitr|15 days ago

It depends. At the rough 2 PB of new data they get a day that’s about 10 sq ft of physical rack space per day. Each data center is like 500,000 sq feet so each data center can hold 120 years of YouTube uploads. They’re not going to have to restrict uploads anytime soon.

jongjong|15 days ago

Oh. I noticed in an AI music generation service I use that old pieces were severely degraded to the point that they were crackling really bad... And I remember thinking that it's a good thing I downloaded an mp3 of my favorites. I confirmed that the quality is very different by listening to the downloaded recording with the hosted version side-by-side.

ntoskrnl_exe|15 days ago

Wouldn't it also be a performance nightmare?

The energy bill for scanning through the terabytes of metadata would be comparable to that of several months of AI training, not to mention the time it would take. Then deleting a few million random 360p videos and putting MrBeast in their place would result in insane fragmentation of the new files.

It might really just be cheaper to keep buying new HDDs.

stogot|15 days ago

S3 allows delete and is efficient here. I’m sure Google can figure it out

They allow search by timestamp, I’m sure YouTube can write algo to find zero <=1 view

dev1ycan|15 days ago

This is why they removed searching for older videos (specific time) and why their search pushes certain algorithmic videos, other older videos when found by direct link are on long term storage and take a while to start loading.

moffkalast|15 days ago

Besides with their search deteriorating to the point where a direct video title doesn't result in a match, nobody can see those videos anyway and they don't have to cache them.