(no title)
windsignaling | 1 year ago
On one hand, I get the annoying "Verify" box every time I use ChatGPT (and now due its popularity, DeepSeek as well).
On the other hand, without Cloudflare I'd be seeing thousands of junk requests and hacking attempts everyday, people attempting credit card fraud, etc.
I honestly don't know what the solution is.
rozap|1 year ago
The thing is that these tools are generally used to further entrench power that monopolies, duopolies, and cartels already have. Example: I've built an app that compares grocery prices as you make a shopping list, and you would not believe the extent that grocers go to to make price comparison difficult. This thing doesn't make thousands or even hundreds of requests - maybe a few dozen over the course of a day. What I thought would be a quick little project has turned out to be wildly adversarial. But now spite driven development is a factor so I will press on.
It will always be a cat and mouse game, but we're at a point where the cat has a 46 billion dollar market cap and handles a huge portion of traffic on the internet.
jeroenhd|1 year ago
They ignored robots.txt (claimed not to, but I blacklisted them there and they didn't stop) and started randomly generating image paths. At some point /img/123.png became /img/123.png?a=123 or whatever, and they just kept adding parameters and subpaths for no good reason. Nginx dutifully ignored the extra parameters and kept sending the same images files over and over again, wasting everyone's time and bandwidth.
I was able to block these bots by just blocking the entire IP range at the firewall level (for Huawei I had to block all of China Telecom and later a huge range owned by Tencent for similar reasons).
I have lost all faith in scrapers. I've written my own scrapers too, but almost all of the scrapers I've come across are nefarious. Some scour the internet searching for personal data to sell, some look for websites to send hack attempts at to brute force bug bounty programs, others are just scraping for more AI content. Until the scraping industry starts behaving, I can't feel bad for people blocking these things even if they hurt small search engines.
makeitdouble|1 year ago
This usually includes people making a near-realtime updated perfect copy of your site and serving that copy for either scam or middle-manning transactions or straight fraud.
Having a clear category of "good bots" from either a verified or accepted companies would help for these cases. Cloudflare has such a system I think, but then a new search engine would have to go to each and every platform provider to make deals and that also sounds impossible.
OptionOfT|1 year ago
It's gonna get even worse. Walmart & Kroger are implementing digital price tags, so whatever you see on the website will probably (purposefully?) be out of date by the time you get to the store.
Stores don't want you to compare.
ohcmon|1 year ago
to11mtm|1 year ago
I used to work at a company that did auto inspections. (e.x. if you turned a lease in, did a trade in on a used car, private party, etc.)
Because of that, we had a server that contained 'condition reports', as well as the images that went through those condition reports.
Mind you, sometimes condition reports had to be revised. Maybe a photo was bad, maybe the photos were in the wrong order, etc.
It was a perfect storm:
- The Image caching was all inmem
- If an image didn't exist, the server would error with a 500
- IIS was set up such that too many errors caused a recycle
- Some scraper was working off a dataset (that ironically was 'corrected' in an hour or so) but contained an image that did not exist.
- The scraper, instead of eventually 'moving on' would keep retrying the URL.
It was the only time that org had an 'anyone who thinks they can help solve please attend' meeting at the IT level.
> and you would not believe the extent that grocers go to to make price comparison difficult. This thing doesn't make thousands or even hundreds of requests - maybe a few dozen over the course of a day.
Very true. I'm reminded of Oren Eini's tale of building an app to compare grocery prices in Israel, where apparently mandated supermarket chains to publish prices [0]. On top of even the government mandate for data sharing appearing to hit the wrong over/under for formatting, There's the constant issue of 'incomparabilities'.
And it's weird, because it immediately triggered memories of how 20-ish years ago, one of the most accessible Best Buy's was across the street from a Circuit City, but good luck price matching because the stores all happened to sell barely different laptops/desktops (e.x. up the storage but use a lower grade CPU) so that nobody really had to price match.
[0] - https://ayende.com/blog/170978/the-business-process-of-compa...
tempodox|1 year ago
gjsman-1000|1 year ago
Robots went out of control, whether malicious or the AI scrapers or the Clearview surveillance kind; users learned to not trust random websites; SEO spam ruined search, the only thing that made a decentralized internet navigable; nation state attacks became a common occurrence; people prefer a few websites that do everything (Facebook becoming an eBay competitor). Even if it were possible to set rules banning Clearview or AI training, no nation outside of your own will follow them; an issue which even becomes a national security problem (are you sure, Taiwan, that China hasn't profiled everyone on your social media platforms by now?)
There is no solution. The dream itself was not sustainable. The only solution is either a global moratorium of understanding which everyone respectfully follows (wishful thinking, never happening); or splinternetting into national internets with different rules and strong firewalls (which is a deal with the devil, and still admitting the vision failed).
stevenAthompson|1 year ago
To make matters worse, I suspect that not even a splinternet can save it. It needs a new foundation, preferably one that wasn't largely designed before security was a thing.
Federation is probably a good start, but it should be federated well below the application layer.
supportengineer|1 year ago
benatkin|1 year ago
Aeolun|1 year ago
inetknght|1 year ago
Yup!
> I honestly don't know what the solution is.
Force law enforcement to enforce the laws.
Or else, block the countries that don't combat fraud. That means... China? Hey isn't there a "trade war" being "started"? It sure would be fortunate if China (and certain other fraud-friendly countries around Asia/Pacific) were blocked from the rest of the Internet until/unless they provide enforcement and/or compensation their fraudulent use of technology.
marginalia_nu|1 year ago
jeroenhd|1 year ago
jacobr1|1 year ago
RIMR|1 year ago
EVa5I7bHFq9mnYK|1 year ago
il-b|1 year ago
BytesAndGears|1 year ago
It works so well and is very secure. You get to the checkout page on a website, click a link. If you’re on your phone, it hotlinks to open your banking app. If you’re on desktop, it shows a QR code which does the same.
When your bank app opens, it says “would you like to make this €28 payment to Business X?” And you click either yes or no on the app. You never even need to enter a card in the website!
You can also send money to other people instantly the same way, so it’s perfect for something like buying a used item from someone else.
Plus the whole IBAN system which makes it all possible!
carlosjobim|1 year ago
kobalsky|1 year ago
this is wrong.
if someone can use your site they can use stolen cards, and bots doing this will not be stopped by them.
cloudflare only raises the cost of doing it, it may make scrapping a million of product pages unprofitable but that doesn't apply to cc fraud yet.
hecanjog|1 year ago
bragr|1 year ago
It stops "card testing" where someone has bought or stolen a large number of cards and need verify which are still good. The usual technique is to cycle through all the cards on a smaller site selling something cheap (a $3 ebook for example). The problem is that the high volume of fraud in a short time span will often get the merchant account or payment gateway account shut down, cutting off legitimate sales.
As a consumer, you should also be suspicious of a mysterious low value charge on your card because it could be the prelude to much larger charges.
unknown|1 year ago
[deleted]
markisus|1 year ago
porty|1 year ago
Aachen|1 year ago
I host many webpages and this is exactly it. Anyone is welcome to use the websites I host. There is no CDN, your TLS session terminates at the endpoint (end to end encryption). May be a bit slower for the pages having static assets if you're coming from outside of Europe, but the pages are light anyway (no 2 MB JavaScript blobs)
lynndotpy|1 year ago
The solution is good security-- Cloudflare only cuts down on the noise. I'm looking at junk requests and hacking attempts flow through to my sites as we speak.
lynndotpy|1 year ago
Cloudflare cuts down on the noise, but also helps does the work of preventing scrapers, people who re-sell your site wholesale, and cutting down on the noise also means cutting down on the cost of network requests.
It also can help where security is lax. You should have measures against credential stuffing, but if you don't, Cloudflare might prevent (some) of your users from being hacked. Which isn't good enough, but is better than no mitigation at all.
I don't use Cloudflare personally, but I won't dismiss it wholesale. I understand why people use it.
carlosjobim|1 year ago
That sounds like the solution, that sounds like good security.
boomboomsubban|1 year ago
Though annoying, it's tolerable. It seemed like a fair solution. Blocking doesn't.
chaoskitty|1 year ago
Bots are a fact of life. Secure your site properly, follow good practices, set up notifications for important things, log stuff, but don't look at the logs unless you have a reason to look at the logs.
Having run web servers forever, this is simply normal. What's not normal is blindly trusting a megacorporation to make my logs quiet. What're they doing? Who are they blocking? What guidelines do they use? Nobody, except them, knows.
It's why I self-host email. Sure, you might feel safe because most people use Gmail or Outlook, and therefore if there are problems, you can point the finger at them, but what if you want to discuss spam? Or have technical discussions about Trojans and viruses? Or you need to be 100% absolutely certain that email related to specific events is delivered, with no exceptions? You can't do that with Gmail / Outlook, because they have filters that you can't see and you can't control.
unknown|1 year ago
[deleted]
buyucu|1 year ago
buyucu|1 year ago
grayhatter|1 year ago
well, for starters, if you're using cloudflare to block otherwise benign traffic, just because you're worried about some made... up....
> On the other hand, without Cloudflare I'd be seeing thousands of junk requests and hacking attempts everyday, people attempting credit card fraud, etc.
well damn, if you're using it because otherwise you'd be exposing your users to active credit card fraud... I guess the original suggestion to only ban traffic once you find it to be abusive, and then only by subnet, doesn't really apply for you.
I wanna suggest using this as an excuse to learn how not to be a twat (the direction cf is moving towards more and more), where for most sites 20% of the work will get you 80% of the results... but dealing with cc fraud, you're adversaries are already on the more advanced side, and that becomes a lot harder to prevent... rather than catch and stop after the fact.
Balancing the pervasive fear mongering with sensible rules is hard. Not because it's actually hard, but because that's the point of the FUD. To create the perception of a problem where there isn't one. With a few exceptions, a WAF doesn't provide meaningful benefits. It only serves to lower the number of log entries, it rarely ever reduces the actual risk.
ludjer|1 year ago
jillyboel|1 year ago