(no title)
Andrews54757 | 5 months ago
PoToken - Proof of origin token which Google has lately been enforcing for all clients, or video requests will fail with a 403. On android it uses DroidGuard, for IOS, it uses built in app integrity apis. For the web it requires that you run a snippet of javascript code (the challenge) in the browser to prove that you are not a bot. Previously, you needed an external tool to generate these PoTokens but with the Deno change yt-dlp should be capable of producing these tokens by itself in the near future.
SABR - Server side adaptive bitrate streaming, used alongside Google's UMP protocol to allow the server to have more control over buffering, given data from the client about the current playback position, buffered ranges, and more. This technology is also used to do server-side ad injection. Work is still being done to make 3rd party clients work with this technology (sometimes works, sometimes doesn't).
Nsig/sig extraction example:
- https://github.com/yt-dlp/yt-dlp/blob/4429fd0450a3fbd5e89573...
- https://github.com/yt-dlp/yt-dlp/blob/4429fd0450a3fbd5e89573...
PoToken generation:
- https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide
- https://github.com/LuanRT/BgUtils
SABR:
- https://github.com/LuanRT/googlevideo
EDIT2: Addeded more links to specific code examples/guides
ACCount37|5 months ago
Now you know.
jasode|5 months ago
I disagree with the framing of "us vs them".
It's actually "us vs us". It's not just us plebians vs FAANG giants. The small-time independent publishers and creators also want to restrict the web because they don't want their content "stolen". They want to interact with real humans instead of bots. The following are manifestations of the same fear:
- small-time websites adding Anubis proof-of-work
- owners of popular Discord channels turning on the setting for phone # verification as a requirement for joining
- web blogs wanting to put a "toll gate" (maybe utilize Cloudflare or other service) to somehow make OpenAI and others pay for the content
We're long past the days of colleagues and peers of ARPANET and NFSNET sharing info for free on university computers. Now everybody on the globe wants to try to make a dollar, and likewise, they feel dollars are being stolen from them.
mtrovo|5 months ago
The web as we knew it before ChatGPT was built around the idea that humans have to scavenge for information, and while they're doing that, you can show them ads. In that world, content didn't need to be too protected because you were making up for it in eyeballs anyway.
With AI, that model is breaking down. We're seeing a shift towards bot traffic rather than human traffic, and information can be accessed far more effectively and, most importantly, without ad impressions. So, it makes total sense for them to be more protective about who has access to their content and to make sure people are actually paying for it, be it with ad views or some other form of agreement.
th0ma5|5 months ago
supriyo-biswas|5 months ago
The reasons are similar for Cloudflare, but their stances are a bit too DRMish for my tastes. I guess someone could draw the lines differently.
eek2121|5 months ago
codedokode|5 months ago
- AI companies scraping YT without paying YT let alone creators for training data. Imagine how many data YT has.
- YT competitors in other countries scraping YT to copy videos, especially in countries where YT is blocked. Some such companies have a function "move all my videos from YT" to promote bloggers migration.
gjsman-1000|5 months ago
I laugh at people who think ActivityPub or Mastodon or BlueSky will save us. We already had that, it was called e-mail, look what happened once everyone started using it.
If we couldn't stop the centralization effects that occurred on e-mail, any attempt to stop centralization in general is honestly a utopian fool's errand. Regulation is easier.
Aperocky|5 months ago
Amazing how they simply couldn't win - you deliver content to client, the content goes to the client. Could be the largest corporation of the world and we still have yt-dlp.
That's why all of them wanted proprietary walled gardens where they would be able to control the client too - so you get to watch the ads or pay up.
dylan604|5 months ago
How does this prove you are not a bot. How does this code not work in a headless Chromimum if it's just client side JS?
Andrews54757|5 months ago
[1] https://github.com/LuanRT/BgUtils
Beretta_Vexee|5 months ago
I have a little experience with Selenium headless on Facebook. Facebook tests fonts, SVG rendering, CSS support, screen resolution, clock and geographical settings, and hundreds of other things that give it a very good idea of whether it's a normal client or Selenium headless. Since it picks a certain number of checks more or less at random and they can modify the JS each time it loads, it is very, very complicated to simulate.
Facebook and Instagram know this and allow it below a certain limit because it is more about bot protection than content protection.
This is the case when you have a real web browser running in the background. Here we are talking about standalone software written in Python.