top | item 45359766

(no title)

Andrews54757 | 5 months ago

Nsig/sig - Special tokens which must be passed to API calls, generated by code in base.js (player code). This is what has broken for yt-dlp and other third party clients. Instead of extracting the code that generates those tokens (eg using regular expressions) like we used to, we now need to run the whole base.js player code to get these tokens because the code is spread out all over the player code.

PoToken - Proof of origin token which Google has lately been enforcing for all clients, or video requests will fail with a 403. On android it uses DroidGuard, for IOS, it uses built in app integrity apis. For the web it requires that you run a snippet of javascript code (the challenge) in the browser to prove that you are not a bot. Previously, you needed an external tool to generate these PoTokens but with the Deno change yt-dlp should be capable of producing these tokens by itself in the near future.

SABR - Server side adaptive bitrate streaming, used alongside Google's UMP protocol to allow the server to have more control over buffering, given data from the client about the current playback position, buffered ranges, and more. This technology is also used to do server-side ad injection. Work is still being done to make 3rd party clients work with this technology (sometimes works, sometimes doesn't).

Nsig/sig extraction example:

- https://github.com/yt-dlp/yt-dlp/blob/4429fd0450a3fbd5e89573...

- https://github.com/yt-dlp/yt-dlp/blob/4429fd0450a3fbd5e89573...

PoToken generation:

- https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide

- https://github.com/LuanRT/BgUtils

SABR:

- https://github.com/LuanRT/googlevideo

EDIT2: Addeded more links to specific code examples/guides

discuss

order

ACCount37|5 months ago

If you ever wondered why the likes of Google and Cloudflare want to restrict the web to a few signed, integrity-checked browser implementations?

Now you know.

jasode|5 months ago

>If you ever wondered why the likes of Google and Cloudflare want to restrict the web

I disagree with the framing of "us vs them".

It's actually "us vs us". It's not just us plebians vs FAANG giants. The small-time independent publishers and creators also want to restrict the web because they don't want their content "stolen". They want to interact with real humans instead of bots. The following are manifestations of the same fear:

- small-time websites adding Anubis proof-of-work

- owners of popular Discord channels turning on the setting for phone # verification as a requirement for joining

- web blogs wanting to put a "toll gate" (maybe utilize Cloudflare or other service) to somehow make OpenAI and others pay for the content

We're long past the days of colleagues and peers of ARPANET and NFSNET sharing info for free on university computers. Now everybody on the globe wants to try to make a dollar, and likewise, they feel dollars are being stolen from them.

mtrovo|5 months ago

I don't know, it's really hard to blame them. In a way, the next couple of years are going to be a battle to balance easy access to info with compensation for content creators.

The web as we knew it before ChatGPT was built around the idea that humans have to scavenge for information, and while they're doing that, you can show them ads. In that world, content didn't need to be too protected because you were making up for it in eyeballs anyway.

With AI, that model is breaking down. We're seeing a shift towards bot traffic rather than human traffic, and information can be accessed far more effectively and, most importantly, without ad impressions. So, it makes total sense for them to be more protective about who has access to their content and to make sure people are actually paying for it, be it with ad views or some other form of agreement.

th0ma5|5 months ago

Weird people talking about small time creators wanting DRM I've never seen that... Usually they'd be hounding for any attention? I don't know why multiple accounts are seemingly independently bringing this up, but maybe it is trying to muddy the waters? This concept?

supriyo-biswas|5 months ago

At least for YouTube, viewbotting is very much a thing, which undermines trust in the platform. Even if we were to remove Google ads from the equation, there’s nothing preventing someone from crafting a channel with millions of bot-generated views and comments, in order to paid sponsor placements, etc.

The reasons are similar for Cloudflare, but their stances are a bit too DRMish for my tastes. I guess someone could draw the lines differently.

eek2121|5 months ago

The fact you shoved Cloudflare in there shows your ignorance of the actual problems and solutions offered.

codedokode|5 months ago

There could be valid reasons for fighting downloaders, for example:

- AI companies scraping YT without paying YT let alone creators for training data. Imagine how many data YT has.

- YT competitors in other countries scraping YT to copy videos, especially in countries where YT is blocked. Some such companies have a function "move all my videos from YT" to promote bloggers migration.

gjsman-1000|5 months ago

Everything trends towards centralization on a long enough period.

I laugh at people who think ActivityPub or Mastodon or BlueSky will save us. We already had that, it was called e-mail, look what happened once everyone started using it.

If we couldn't stop the centralization effects that occurred on e-mail, any attempt to stop centralization in general is honestly a utopian fool's errand. Regulation is easier.

Aperocky|5 months ago

And barely a few days after google did it the fix is in.

Amazing how they simply couldn't win - you deliver content to client, the content goes to the client. Could be the largest corporation of the world and we still have yt-dlp.

That's why all of them wanted proprietary walled gardens where they would be able to control the client too - so you get to watch the ads or pay up.

dylan604|5 months ago

> For the web it requires that you run a snippet of javascript code (the challenge) in the browser to prove that you are not a bot.

How does this prove you are not a bot. How does this code not work in a headless Chromimum if it's just client side JS?

Andrews54757|5 months ago

Good question! Indeed you can run the challenge code using headless Chromium and it will function [1]. They are constantly updating the challenge however, and may add additional checks in the future. I suppose Google wants to make it more expensive overall to scrape Youtube to deter the most egregious bots.

[1] https://github.com/LuanRT/BgUtils

Beretta_Vexee|5 months ago

Once JavaScript is running, it can perform complex fingerprinting operations that are difficult to circumvent effectively.

I have a little experience with Selenium headless on Facebook. Facebook tests fonts, SVG rendering, CSS support, screen resolution, clock and geographical settings, and hundreds of other things that give it a very good idea of whether it's a normal client or Selenium headless. Since it picks a certain number of checks more or less at random and they can modify the JS each time it loads, it is very, very complicated to simulate.

Facebook and Instagram know this and allow it below a certain limit because it is more about bot protection than content protection.

This is the case when you have a real web browser running in the background. Here we are talking about standalone software written in Python.