(no title)
zargath | 8 months ago
Anybody know why these web crawling/bot standards are not evolving ? I believe robots.txt was invented in 1994(thx chatgpt). People have tried with sitemaps, RSS and IndexNow, but its like huge$$ organizations are depending on HelloWorld.bas tech to control their entire platform.
I want to spin up endpoints/mcp/etc. and let intelligent bots communicate with my services. Let them ask for access, ask for content, pay for content, etc. I want to offer solutions for bots to consume my content, instead of having to choose between full or no access.
I am all for AI, but please try to do better. Right now the internet is about to be eaten up by stupid bot farms and served into chat screens. They dont want to refer back to their source and when they do its with insane error rates.
stereolambda|8 months ago
Not to pick on you, but I find it quicker to open new tab and do "!w robots.txt" (for search engines supporting the bang notation) or "wiki robots.txt"<click> (for Google I guess). The answer is right there, no need to explain to LLM what I want or verify [1].
[1] Ok, Wikipedia can be wrong, but at least it is a commonly accessible source of wrong I can point people to if they call me out. Plus my predictive model of Wikipedia wrongness gives me pretty low likelihood for something like this, while for ChatGPT it is more random.
reaperducer|8 months ago
Thought of and discussed as a possibility in 1994.
Proposed as a standard in 2019.
Adopted as a standard in 2022.
Thanks, IETF.
Dylan16807|8 months ago
TechDebtDevin|8 months ago
This is clearly the first step in cf building out a marketplace where they will (fail) at attempting to be the middleman in a useless market between crawlers and publishers.
zargath|8 months ago