top | item 44968711

(no title)

pilif | 6 months ago

> This was obviously dumb when it launched:

Yes. Obviously dumb but also nearly 100% successful at the current point in time.

And likely going to stay successful as the non-protected internet still provides enough information to dumb crawlers that it’s not financially worth it to even vibe-code a workaround.

Or in other words: Anubis may be dumb, but the average crawler that completely exhausting some sites resources is even dumber.

And so it all works out.

And so the question remains: how dumb was it exactly, when it works so well and continues to work so well?

discuss

account42|6 months ago

> Yes. Obviously dumb but also nearly 100% successful at the current point in time.

Only if you don't care about negatively affecting real users.

pilif|6 months ago

I understand this as an argument that it’s better to be down for everyone than have a minority of users switch browsers.

I’m not convinced by that makes sense.

Now ideally you would have the resources to serve all users and all the AI bots without performance degradation, but for some projects that’s not feasible.

In the end it’s all a compromise.

kldg|6 months ago

does it work well? I run chromium controlled by playwright for scraping and typically make Gemini implement the script for it because it's not worth my time otherwise. -but I'm not crawling the Internet generally (which I think there is very little financial incentive to do; it's a very expensive process even ignoring Anubis et al); it's always that I want something specific and am sufficiently annoyed by lack of API.

regarding authentication mentioned elsewhere, passing cookies is no big deal.

eaglefield|6 months ago

Anubis is not meant to stop single endpoints from scraping. It's meant to make it harder for massive AI scrapers. The problematic ones evade rate limiting by using many different ip addresses, and make scraping cheaper on themselves by running headless. Anubis is specifically built to make that kind of scraping harder as i understand it.

bananalychee|6 months ago

Does it actually? I don't think I've seen a case study with hard numbers.

pilif|6 months ago

Here’s one study

https://dukespace.lib.duke.edu/server/api/core/bitstreams/81...

And of all the high-profile projects implementing it, like the LKML archives, none have backed down yet, so I’m assuming the initial improvement in numbers must continue or it would have been removed since

snickerdoodle12|6 months ago

the workaround is literally just running a headless browser, and that's pretty much the default nowadays.

if you want to save some $$$ you can spend like 30 minutes making a cracker like in the article. just make it multi threaded, add a queue and boom, your scraper nodes can go back to their cheap configuration. or since these are AI orgs we're talking about, write a gpu cracker and laugh as it solves challenges far faster than any user could.

custom solutions aren't worth it for individual sites, but with how widespread anubis is it's become worth it.