WingNews

rozap|1 year ago

What is a "junk" request? Is it hammering an expensive endpoint 5000 times per second, or just somebody using your website in a way you don't like? I've also been on both sides of it (on-call at 3am getting dos'd is no fun), but I think the danger here is that we've gotten to a point where a new google can't realistically be created.

The thing is that these tools are generally used to further entrench power that monopolies, duopolies, and cartels already have. Example: I've built an app that compares grocery prices as you make a shopping list, and you would not believe the extent that grocers go to to make price comparison difficult. This thing doesn't make thousands or even hundreds of requests - maybe a few dozen over the course of a day. What I thought would be a quick little project has turned out to be wildly adversarial. But now spite driven development is a factor so I will press on.

It will always be a cat and mouse game, but we're at a point where the cat has a 46 billion dollar market cap and handles a huge portion of traffic on the internet.

jeroenhd|1 year ago

I've such bots on my server. Some Chinese Huawei bot as well as an American one.

They ignored robots.txt (claimed not to, but I blacklisted them there and they didn't stop) and started randomly generating image paths. At some point /img/123.png became /img/123.png?a=123 or whatever, and they just kept adding parameters and subpaths for no good reason. Nginx dutifully ignored the extra parameters and kept sending the same images files over and over again, wasting everyone's time and bandwidth.

I was able to block these bots by just blocking the entire IP range at the firewall level (for Huawei I had to block all of China Telecom and later a huge range owned by Tencent for similar reasons).

I have lost all faith in scrapers. I've written my own scrapers too, but almost all of the scrapers I've come across are nefarious. Some scour the internet searching for personal data to sell, some look for websites to send hack attempts at to brute force bug bounty programs, others are just scraping for more AI content. Until the scraping industry starts behaving, I can't feel bad for people blocking these things even if they hurt small search engines.

makeitdouble|1 year ago

> somebody using your website in a way you don't like?

This usually includes people making a near-realtime updated perfect copy of your site and serving that copy for either scam or middle-manning transactions or straight fraud.

Having a clear category of "good bots" from either a verified or accepted companies would help for these cases. Cloudflare has such a system I think, but then a new search engine would have to go to each and every platform provider to make deals and that also sounds impossible.

OptionOfT|1 year ago

> and you would not believe the extent that grocers go to to make price comparison difficult. This thing doesn't make thousands or even hundreds of requests - maybe a few dozen over the course of a day.

It's gonna get even worse. Walmart & Kroger are implementing digital price tags, so whatever you see on the website will probably (purposefully?) be out of date by the time you get to the store.

Stores don't want you to compare.

ohcmon|1 year ago

Actually, I think creating google alternative has never been as doable as it is today.

to11mtm|1 year ago

I'll give a fun example from the past.

I used to work at a company that did auto inspections. (e.x. if you turned a lease in, did a trade in on a used car, private party, etc.)

Because of that, we had a server that contained 'condition reports', as well as the images that went through those condition reports.

Mind you, sometimes condition reports had to be revised. Maybe a photo was bad, maybe the photos were in the wrong order, etc.

It was a perfect storm:

- The Image caching was all inmem

- If an image didn't exist, the server would error with a 500

- IIS was set up such that too many errors caused a recycle

- Some scraper was working off a dataset (that ironically was 'corrected' in an hour or so) but contained an image that did not exist.

- The scraper, instead of eventually 'moving on' would keep retrying the URL.

It was the only time that org had an 'anyone who thinks they can help solve please attend' meeting at the IT level.

> and you would not believe the extent that grocers go to to make price comparison difficult. This thing doesn't make thousands or even hundreds of requests - maybe a few dozen over the course of a day.

Very true. I'm reminded of Oren Eini's tale of building an app to compare grocery prices in Israel, where apparently mandated supermarket chains to publish prices [0]. On top of even the government mandate for data sharing appearing to hit the wrong over/under for formatting, There's the constant issue of 'incomparabilities'.

And it's weird, because it immediately triggered memories of how 20-ish years ago, one of the most accessible Best Buy's was across the street from a Circuit City, but good luck price matching because the stores all happened to sell barely different laptops/desktops (e.x. up the storage but use a lower grade CPU) so that nobody really had to price match.

[0] - https://ayende.com/blog/170978/the-business-process-of-compa...

tempodox|1 year ago

+1 for spite-driven development.

gjsman-1000|1 year ago

Simple: We need to acknowledge that the vision of a decentralized internet as it was implemented was a complete failure, is dying, and will probably never return.

Robots went out of control, whether malicious or the AI scrapers or the Clearview surveillance kind; users learned to not trust random websites; SEO spam ruined search, the only thing that made a decentralized internet navigable; nation state attacks became a common occurrence; people prefer a few websites that do everything (Facebook becoming an eBay competitor). Even if it were possible to set rules banning Clearview or AI training, no nation outside of your own will follow them; an issue which even becomes a national security problem (are you sure, Taiwan, that China hasn't profiled everyone on your social media platforms by now?)

There is no solution. The dream itself was not sustainable. The only solution is either a global moratorium of understanding which everyone respectfully follows (wishful thinking, never happening); or splinternetting into national internets with different rules and strong firewalls (which is a deal with the devil, and still admitting the vision failed).

stevenAthompson|1 year ago

I hate that you're right.

To make matters worse, I suspect that not even a splinternet can save it. It needs a new foundation, preferably one that wasn't largely designed before security was a thing.

Federation is probably a good start, but it should be federated well below the application layer.

supportengineer|1 year ago

A walled garden where each a real, vetted human being is responsible for each network device. It wouldn't scale but it could work locally.

benatkin|1 year ago

Luckily the decentralization community has always been decentralized. There are plenty of decentralized networks to support.

Aeolun|1 year ago

The great firewall, but in reverse.

inetknght|1 year ago

> On the other hand, without Cloudflare I'd be seeing thousands of junk requests and hacking attempts everyday, people attempting credit card fraud, etc.

Yup!

> I honestly don't know what the solution is.

Force law enforcement to enforce the laws.

Or else, block the countries that don't combat fraud. That means... China? Hey isn't there a "trade war" being "started"? It sure would be fortunate if China (and certain other fraud-friendly countries around Asia/Pacific) were blocked from the rest of the Internet until/unless they provide enforcement and/or compensation their fraudulent use of technology.

marginalia_nu|1 year ago

A lot of this traffic is bouncing all over the world before it reaches your server. Almost always via at least one botnet. Finding the source of the traffic is pretty hopeless.

jeroenhd|1 year ago

A lot of the fake browser traffic I'm seeing is coming from American data centres. China plays a major part, but if we're going by bot traffic, America will end up on the ban list pretty quickly.

jacobr1|1 year ago

Slightly more complicated because a ton of the abuse comes from IPs located western countries, explicitly to evade fraud and abuse detection. Now you can go after the western owners of those systems (and all the big ones do have have large abuse teams to handle reports) but enforcement has a much higher latency. To be effective you would need a much more aggressive system. Stronger KYC. Changes in laws to allow for less due-process and more "guilty by default" type systems that you then need to prove innocence to rebut.

RIMR|1 year ago

A wild take only possible if you don't understand how the Internet works.

EVa5I7bHFq9mnYK|1 year ago

Credit card fraud exists because credit card companies can't (or won't) implement elementary security measures. There should be a requirement to confirm every online payment, but many sites today require just a cc number+date+code+zip, with no additional confirmation, can't call it other than complicity in the crime.

il-b|1 year ago

Lost sales due to 2fa are greater than losses due to refunds

BytesAndGears|1 year ago

Something like iDeal, which is a payment processing system in the Netherlands.

It works so well and is very secure. You get to the checkout page on a website, click a link. If you’re on your phone, it hotlinks to open your banking app. If you’re on desktop, it shows a QR code which does the same.

When your bank app opens, it says “would you like to make this €28 payment to Business X?” And you click either yes or no on the app. You never even need to enter a card in the website!

You can also send money to other people instantly the same way, so it’s perfect for something like buying a used item from someone else.

Plus the whole IBAN system which makes it all possible!

carlosjobim|1 year ago

What kind of fraud protection does iDeal have for customers?

kobalsky|1 year ago

> people attempting credit card fraud

this is wrong.

if someone can use your site they can use stolen cards, and bots doing this will not be stopped by them.

cloudflare only raises the cost of doing it, it may make scrapping a million of product pages unprofitable but that doesn't apply to cc fraud yet.

hecanjog|1 year ago

They might be talking about people who are trying to automate the testing hundreds of stolen credit cards with small purchases to see if they are still working. This is basically why we ended up using cloudflare at work.

bragr|1 year ago

>that doesn't apply to cc fraud yet

It stops "card testing" where someone has bought or stolen a large number of cards and need verify which are still good. The usual technique is to cycle through all the cards on a smaller site selling something cheap (a $3 ebook for example). The problem is that the high volume of fraud in a short time span will often get the merchant account or payment gateway account shut down, cutting off legitimate sales.

As a consumer, you should also be suspicious of a mysterious low value charge on your card because it could be the prelude to much larger charges.

unknown|1 year ago

[deleted]

markisus|1 year ago

If I were hosting a web page, I would want it to be able to reach as many people as possible. So in choosing between CDNs, I would choose the one that provides greater browser compatibility, all other things equal. So in principle, the incentives are there for Cloudflare to fix the issue. But the size of the incentive may be the problem. Not too many customers are complaining about these non-mainstream browsers.

porty|1 year ago

In that case you can turn off / not turn on the WAF feature(s) of Cloudflare - it's optional and configured by the webmaster.

Aachen|1 year ago

> If I were hosting a web page, I would want it to be able to reach as many people as possible. So in choosing between CDNs

I host many webpages and this is exactly it. Anyone is welcome to use the websites I host. There is no CDN, your TLS session terminates at the endpoint (end to end encryption). May be a bit slower for the pages having static assets if you're coming from outside of Europe, but the pages are light anyway (no 2 MB JavaScript blobs)

lynndotpy|1 year ago

> On the other hand, without Cloudflare I'd be seeing thousands of junk requests and hacking attempts everyday, people attempting credit card fraud, etc. > > I honestly don't know what the solution is.

The solution is good security-- Cloudflare only cuts down on the noise. I'm looking at junk requests and hacking attempts flow through to my sites as we speak.

lynndotpy|1 year ago

Whoops-- this was a draft I didn't intend to post in this state. I must have fatfingered the "reply" button somehow. Alas, too late to edit or delete now.

Cloudflare cuts down on the noise, but also helps does the work of preventing scrapers, people who re-sell your site wholesale, and cutting down on the noise also means cutting down on the cost of network requests.

It also can help where security is lax. You should have measures against credential stuffing, but if you don't, Cloudflare might prevent (some) of your users from being hacked. Which isn't good enough, but is better than no mitigation at all.

I don't use Cloudflare personally, but I won't dismiss it wholesale. I understand why people use it.

carlosjobim|1 year ago

>Cloudflare only cuts down on the noise.

That sounds like the solution, that sounds like good security.

boomboomsubban|1 year ago

>On one hand, I get the annoying "Verify" box every time I use ChatGPT (and now due its popularity, DeepSeek as well).

Though annoying, it's tolerable. It seemed like a fair solution. Blocking doesn't.

chaoskitty|1 year ago

Simple: Don't look at the logs.

Bots are a fact of life. Secure your site properly, follow good practices, set up notifications for important things, log stuff, but don't look at the logs unless you have a reason to look at the logs.

Having run web servers forever, this is simply normal. What's not normal is blindly trusting a megacorporation to make my logs quiet. What're they doing? Who are they blocking? What guidelines do they use? Nobody, except them, knows.

It's why I self-host email. Sure, you might feel safe because most people use Gmail or Outlook, and therefore if there are problems, you can point the finger at them, but what if you want to discuss spam? Or have technical discussions about Trojans and viruses? Or you need to be 100% absolutely certain that email related to specific events is delivered, with no exceptions? You can't do that with Gmail / Outlook, because they have filters that you can't see and you can't control.

unknown|1 year ago

[deleted]

buyucu|1 year ago

My VPN/Fileserver VPS is not behind Cloudflare, and I haven't had any trouble for years. Only the SSH port is accessible from outside (which is probably not even necessary), with password login disabled. I use fail2ban and a few other extra layers of security.

buyucu|1 year ago

Credit cards are an ancient insecure technology that needs to go away. There are systems in Europe like iDEAL that are much more 21st century appropriate.

grayhatter|1 year ago

> I honestly don't know what the solution is.

well, for starters, if you're using cloudflare to block otherwise benign traffic, just because you're worried about some made... up....

> On the other hand, without Cloudflare I'd be seeing thousands of junk requests and hacking attempts everyday, people attempting credit card fraud, etc.

well damn, if you're using it because otherwise you'd be exposing your users to active credit card fraud... I guess the original suggestion to only ban traffic once you find it to be abusive, and then only by subnet, doesn't really apply for you.

I wanna suggest using this as an excuse to learn how not to be a twat (the direction cf is moving towards more and more), where for most sites 20% of the work will get you 80% of the results... but dealing with cc fraud, you're adversaries are already on the more advanced side, and that becomes a lot harder to prevent... rather than catch and stop after the fact.

Balancing the pervasive fear mongering with sensible rules is hard. Not because it's actually hard, but because that's the point of the FUD. To create the perception of a problem where there isn't one. With a few exceptions, a WAF doesn't provide meaningful benefits. It only serves to lower the number of log entries, it rarely ever reduces the actual risk.

ludjer|1 year ago

I used to work one of the top 1000 visited websites, and we have massive bot issues where 60% of our traffic was bots and had to implement solutions similar to cloudflare to reduce the bots. Also, with the raise of ai, it's become even more important since a lot of ai data scraping companies do not respect robots.

jillyboel|1 year ago

accept reality and design your api so it's not a problem