top | item 29119890

(no title)

biosed | 4 years ago

I used to lead Sys Eng for a FTSE 100 company. Our data was valuable but only for a short amount of time. We were constantly scraped which cost us in hosting etc. We even seen competitors use our figures (good ones used it to offset their prices, bad ones just used it straight). As the article suggest, we couldn't block mobile operator IPs, some had over 100k customers behind them. Forcing the users to login did little as the scrapers just created accounts. We had a few approaches that minimised the scraping:

Rate Limiting by login,

Limiting data to know workflows ...

But our most fruitful effort was when we removed limits and started giving "bad" data. By bad I mean alter the price up or down by a small percentage. This hit them in the pocket but again, wasn't a golden bullet. If the customer made a transaction on the altered figure we we informed them and took it at the correct price.

It's a cool problem to tackle but it is just an arms race.

discuss

rootusrootus|4 years ago

I know a guy at Nike that had to deal with a similar problem. As I recall, they basically gave in -- instead of trying to fight the scrapers, they built them an API so they'd quit trashing the performance of the retail site with all the scraping.

matheusmoreira|4 years ago

Yes. That's exactly what everyone should do.

chadwittman|4 years ago

The real Jedi move

gonzo41|4 years ago

I think there's an opportunity for a new JS framework to have something like randomly generated dom that will always display the page and elements the same to a human but constantly break paths for computers.

Like displaying a table with semantic elements, then divs, then using an iframe with css grid and floating values over the top.

This almost seems like a problem for AI to solve.

endymi0n|4 years ago

> It's a cool problem to tackle but it is just an arms race.

Plus, it's one you're going to lose. I was once asked at an All-Hands why we don't defend ourselves against bots even more vigorously.

My answer was: "Because I don't know how to build a publically available website that I could not scrape myself if I really wanted to."

wolverine876|4 years ago

> But our most fruitful effort was when we removed limits and started giving "bad" data. By bad I mean alter the price up or down by a small percentage. ... If the customer made a transaction on the altered figure we we informed them and took it at the correct price.

Is that legal? It would be a big blow to trust if I was the customer, but that's without knowing what you were selling and in what market.

killingtime74|4 years ago

It’s legal if it’s in the contract. Standard for contracts to allow for mistakes and confirmations of prices

ransom1538|4 years ago

I love the honey pot approach. Put tons of valued hrefs on the page that are invisible (css) that the scrapper would find. Then just rate limit that ip address and randomize the data coming back. Profit.

histriosum|4 years ago

I think this falls into the "arms race" trap, though. If you can make an href invisible via CSS, then the scraper can certainly be written to understand CSS, and thus filter out the invisible hrefs..