(no title)
r_singh | 4 months ago
I run an e-commerce-specific scraping API that helps developers access SERP, PDP, and reviews data. I've noticed the web already has unsaid balances: certain traffic patterns and techniques are tolerated, others clearly aren’t. Most sites handle reasonable, well-behaved crawlers just fine.
Platforms claim ownership of UGC and public data through dark patterns and narrative control. The current guidelines are a result of supplier convenience, and there are several cases where absolutely fundamental web services run by the largest companies in the world themselves breach those guidelines (including those funded by the fund running this site). We need standards that treat public data as a shared resource with predictable, ethical access for everyone, not just for those with scale or lobbying power.
karlshea|4 months ago
Not everyone has the budget for unlimited bandwidth and compute, and in several of my clients’ cases that’s been >95% of all traffic.
People running these bots with AI/VC capital are just script kiddies that forgot that not every site is a boatload of app servers behind Cloudflare.
r_singh|4 months ago
It would be great if there were reliable ways to distinguish good bots from bad ones — many actually improve discoverability and sales. I see this with affiliate shopping sites that depend on e-commerce data, though that impact is hard to trace directly.
The bad actors are the ones cloning sites or using data for manipulation and propaganda.