top | item 20478498

Authentication and the Have I Been Pwned API

209 points| Rels | 6 years ago |troyhunt.com | reply

123 comments

order
[+] zaroth|6 years ago|reply
All this seems to be hinting more than ever, that the time to provide these results directly and exclusively to the email address being queried is approaching.

Why is this API being abused? Because it provides valuable information—which took a significant amount of effort to curate—about an email address.

The list of services which have lost my (hashed or not) password at some point ever in the past eventually turns into a list of every service I’ve ever subscribed to.

Whether or not it’s possible to scrape that information together, is it really something that should be available to pull over an API for a million emails a month?

Note this is very different information than the password breach count, which gives you an approximate count of how many times a given password has been breached, and works as a proxy for password strength without disclosing any PII.

[+] diminoten|6 years ago|reply
Sorry, but the cat is out of the bag. HIBP is evening the playing field, making the data less valuable to those who have the skills to collect it.

It's the same thing as responsible/full disclosure; by making this information available to anyone (publish a vulnerability), you greatly reduce the power of those who have the skills to collect it anyway (the person who found the 0day).

So yes, this information needs to be available, or it'll only be some people who have it, not none, and those few people who do have it will be 10x stronger than they are now.

This is the old Antisec debate all over again, let's skip to the part where we end up agreeing generally that disclosure is better, okay? No need to relive 2009 or whatever.

[+] jtbayly|6 years ago|reply
You’ve convinced me. I didn’t know anybody could lookup my info. I only want it for myself.

Only thing is, there are a couple of old email addresses I used to use that I don’t have access to anymore. I guess I just need to shrug at that at this point.

[+] massaman_yams|6 years ago|reply
Bulk emailing notifications to all affected addresses would be a deliverability nightmare, and would require manual intervention at most ISPs to prevent these messages from being blocked, which said ISPs may or may not be willing to do.

Just think of the number of clueless users who would mark such a notification as spam, and the number of old, dead addresses, some of which are now spamtraps.

edit: clarify bulk vs. individual notifications

[+] theandrewbailey|6 years ago|reply
> Making an authenticated call is a piece of cake, you just add an hibp-api-key header as follows:

> GET https://haveibeenpwned.com/api/v3/breachedaccount/test@examp...

> hibp-api-key: [your key]

Wouldn't the standard Authorization: Bearer <key> header be more compliant?

[+] floatingatoll|6 years ago|reply
See also elsethread about "not a token" — but, also:

> There's a couple of these and they're largely due to me trying to make sure I get this feature out as early as possible and continue to run things on a shoestring cost wise

Using the Authorization header can cause significant problems with both clients and servers, and also might unintentionally permit browsers to directly query the server if they can be convinced to provide a bearer token.

Using a custom HTTP header sidesteps both client and server issues altogether and closes the door on browsers direct-querying the API, which could be considered a positive by the site operator.

[+] jawns|6 years ago|reply
I wish the post made more clear, ideally right at the top, that the new fee applies only to third-party apps that access the HIBP API, not to end users whose email addresses are being checked against the API. You have to read through the post a bit before that becomes clear.

Individual users who just want to figure out whether they've been pwned will not have to pony up the cash. They can still visit https://haveibeenpwned.com and get that information for free.

[+] pixelbath|6 years ago|reply
Perhaps it could be made more clear, but from the post I thought it was very apparent he was only talking about API abuse; most of the introductory text was concerning rate-limiting.
[+] nathan_f77|6 years ago|reply
It would also be great to emphasize that this only applies to the HIBP API, and the Pwned Passwords API will still be free. (It's mentioned about half-way through the article.)
[+] badrabbit|6 years ago|reply
Domain wide breach searches for a domain you control still appears to work for free as well.
[+] JoshTriplett|6 years ago|reply
> Late last year after seeing a similar pattern with a well-known hosting provider, I reached out to them to try and better understand what was going on. I provided a bunch of IP addresses which they promptly investigated and reported back to me on

I'd love to know how to get a hosting provider to actually answer such requests. (I hope the answer isn't just "be high profile". I'm hoping the answer is more like "know the right people to contact or the right phrasing to get through first-line support".)

I've reached out to hosting providers before, providing clear logs of malicious activity, and either gotten no answer, or occasionally gotten a rote "prove it came from us" that would trivially have been answered by actually reading the logs.

(Examples of such logs include SSH brute-forcing attempts, HTTP logs showing attempts to exploit web-app security holes, and spam headers showing the IP that contacted my provider's mail server.)

I've mostly stopped even trying, due to the near-zero response rate.

In an ideal world, I'd love to see reports like this lead to "we can confirm and we've shut down outbound traffic from that system until it gets fixed".

[+] novaleaf|6 years ago|reply
I feel his pain.

I run a SaaS with what I think is a pretty generous free tier (PhantomJsCloud dot com), and yeah, I have numerous people from all over the world doing their best to shit all over it:

- switching IP addresses every request to circumvent "demo user" rate limiting

- creating upwards of 100 fake accounts to get free credits ($0.05/day each account)

- embedding api calls into their webpages so their users ip address is used for "demo user" credits

- API driven credit cards and hijinks around that.

- using url shorteners to circumvent blacklisted domains

I'm not sure if it's a case of people being incapable of paying credit cards, or just their ethics allow stealing anything that's not bolted down?

I don't mind people signing up with a burner email address, but unfortunately most these abusers are too. I am going to be banning all throw away email accounts soon. And if that doesn't work (which it probably wont) I'm going to have to kill my free tier.

[+] peterwwillis|6 years ago|reply
Can you do what the big cloud providers do, and demand a "real" phone number be verified for sign-up? Not impossible to beat, but more costly. Or maybe there's a market for paying customers somewhere between your free and paid tiers?
[+] ksahin|6 years ago|reply
"After 4 and a bit years, by far and away the most popular method with an uptake of more than 90% is versioning via the URL. So that's all V3 supports. I don't care about the philosophical arguments to the contrary, I care about working software and in this case, the people have well and truly spoken. I don't want to have to maintain code and provide support for something people barely use when there's a perfectly viable alternative."

Well said !

[+] mehrdadn|6 years ago|reply
Funny thing is here I am wondering why he didn't pass a query parameter instead of altering the path or adding a header to version the API... does anyone know? It has the advantage of being clickable while not implying the resource is different.
[+] elamje|6 years ago|reply
I wonder if this actually has more to do with trying to sell HIBP, than abuse. He just announced that he was selling HIBP a month or two ago. Presumably, if he can get people to pay a nominal fee now for access to the api, it makes HIBP much more valuable to a potential acquirer. If you can prove people are willing to pay $.01/month for a subscription, you can assume(as a potential acquirer) that they would pay $.02/month in the future. Much harder to sell something that is completely free because of the risk that monetization completely fails later.

In previous blog posts he mentions that he gets 99.x% cache hits on Cloudflare, then also has a cache on his Azure service. He is sponsored by Cloudflare and Microsoft and doesn’t pay for the service unless something has changed since a few months ago. If that is still true, I don’t fully buy that he is actually spending money on Microsoft api hits as the post claims.

But, I like Troy and HIBP, so maybe I’m just too much of a skeptic :-)

[+] skybrian|6 years ago|reply
Very understandable, and also yet another example of why we can't have nice services on the Internet. Traffic from bad actors pushes anyone offering an API in a similar direction, or discontinuing it altogether.
[+] birdman3131|6 years ago|reply
I find it ironic that a site dedicated to seeing if you have been compromised has no method of changing your API key if it is compromised.
[+] incidentnormal|6 years ago|reply
Even though he explained why (it is likely a forthcoming feature), I did enjoy this comment.
[+] londons_explore|6 years ago|reply
Who bruteforce scrapes the HIBP API across many IP addresses when they could just download the original leaked username & password databases?

Theres even a torrent file of all of them I won't link here...

[+] sleavey|6 years ago|reply
Maybe spammers check if an email address is legitimate by checking HIBP. A pretty significant fraction of legitimate email addresses probably do show up in at least one list.
[+] rolltiide|6 years ago|reply
Torrent file Of ALL leaks?

I usually only see some

And when people ask about a latest leak, others disingenuously reply “just check YOUR email on HIBP what kind of person needs the database”

[+] abathur|6 years ago|reply
The compromised servers might be doing some primary work to which these queries are incidental, rather than for the purpose of scraping the database.

In such a case, the API may be saving them from needing to build infrastructure to accumulate the database and either distribute slices of the data or host their own API for their distributed software to use.

While the database may be valuable, they'd still have to invest a lot of time and some amount of money, face the same need to secure their API against exploitation by others, leave a stronger footprint leaving back to themselves, and have to depend on a service that is more likely to get flagged as a sure sign of suspicious activity than HIBP...

[+] floatingatoll|6 years ago|reply
Why download anything when you can simply query a public endpoint for free?
[+] yjftsjthsd-h|6 years ago|reply
Obvious next concern: Will bad actors just scrape the website? Putting authentication and payments in front of that rather defeats the entire point, and without that you're back to rate limiting which is exactly what has just been declared as a failed approach.
[+] abathur|6 years ago|reply
Probably.

But you can justify a significantly more restrictive rate limit for a website form intended for individual mortal humans to check their own personal email addresses for breaches.

The API has to support request frequencies for legitimate usage that are obviously exploitable at a sufficiently small scale to attract a few exploiters...

[+] ec109685|6 years ago|reply
Or scrape websites that provide a proxy to the API (e.g. the cloudflare worker he described).
[+] lightedman|6 years ago|reply
"Will bad actors just scrape the website?"

That's already been happening. Many simply use HIBP as a starting point to pwning someone's online accounts. Now, Troy is just going to attempt to really profit off of the actions of those bad actors.

[+] zxcvbn4038|6 years ago|reply
Adding authentication so you know who is using your service is reasonable, but not sure why author is complaining about 1.2M requests per day, that is only 14 requests per second on average.
[+] floatingatoll|6 years ago|reply
They consider those requests to be "bad actors". It's not necessarily about the volume of traffic, it's that they are compromised VPSes configured to perform unknown malicious activity that takes advantage of a free endpoint in support of unknown malicious intent. See also "Why do bad actors abuse this endpoint?" discussion elsethread: https://news.ycombinator.com/item?id=20480230
[+] mtmail|6 years ago|reply
Near the top of the article it says peak 14k per minute (233 per second) and it sounds like demand is ever growing.
[+] w8rbt|6 years ago|reply
I obtain the SHA1 hashes published by HIBP, load them into a bloom filter and use that for checks. It's super fast (constant time lookups) and avoids a network dependency/third party service. Here's working Go code:

https://github.com/w8rbt/bp

Edit: This is solely for password vetting during account creation and password reset (which will remain free/no-cost in the API).

[+] sucrose|6 years ago|reply
Why are bad actors abusing the API? What benefit does it give them to just be able to check for leaked data on e-mail addresses? Especially when it doesn't actually provide the leaked data...
[+] HereBeBeasties|6 years ago|reply
Doesn't take much imagination to find a use.

Assume I find Anna's email address as part of a breach somewhere.

Hello Anna,

We value transparency and honesty highly at $p0wn3d_company. To that end, we're sorry to have to tell you that our systems were compromised by an unknown hacker recently. Although we believe that no personal data has been stolen, we are working with Government agencies and expert security consultants to determine the full extent of the breach.

As a precaution we are asking our customers to change their passwords, which you can do by clicking on >this link here to a website that looks like ours but is actually owned by a hacker<.

Etc.

[+] birdman3131|6 years ago|reply
AFAIK from looking myself up on the website before it tells which breaches to go hunt down for the actual info. Knowing they need to go hunt down the SpecificWebsite.com's March 2017 breach is way more specific than trying to have a database of all breaches.
[+] geddy|6 years ago|reply
Perhaps they hammer it inefficiently or simply too often, possibly without even realizing it?
[+] sroussey|6 years ago|reply
Makes sense. I was writing an email to Troy that he can post about how to set custom user agent in Electron and Cordova, as the defaults fail. Guess it won’t be needed.
[+] Aeolun|6 years ago|reply
I don’t use this API myself, so it doesn’t really effect me, but this somehow feels like one of the last purely good things was lost.
[+] Daviey|6 years ago|reply
Next step, premium access without rate limit?
[+] w3rhn2j34oh5o|6 years ago|reply

[deleted]

[+] mfkp|6 years ago|reply
It does cost money to run a service like this. He's historically had sponsors, but you can't expect someone to run a high traffic service for free forever.
[+] penagwin|6 years ago|reply
He gives a cost breakdown showing that he's almost guaranteed to lose money off it. Azure is charging him 3.5$ per 1 million calls to ratelimit/charge people for using the api. He's charging 3.5$. Consider that Stripe will be taking another 35 cents or so... lets just say if this was a monetization method it's not a very good one.
[+] sucrose|6 years ago|reply
I don't see it as him monetizing the stolen data, but the mere existence of it.
[+] DINKDINK|6 years ago|reply
All the ways congestion controls are implemented on the web lead to a cognitively infantilizing UX, privacy violations, and even "skynet" enabling[1] (hyperbolic but nothing stopping it from happening).

"Are you really human? What's: 3 x 9"

"Can you click on images of buses?, hmmmm don't believe you're human still, can you click images of stores, hmmm now bikes, hmmm now vehicles, oh I didn't mean all vehicles I just meant autos and not motorcycles, here quick copy this token, oh it expired? Too bad. How about you click on images of buses for me..."

"Sorry, browsers that protect your privacy and location aren't allowed. We only allow users who are willing to deanonymize themselves."

"Well we all know /those people/ who come /that place/ are antisocial users"

"Here's your IP addresses back. Oh yeah, sorry about blacklisting them"

This is a comment about the meta issue Troy faces. If costs are rubegoldberg'ed to create a facade of "free", it's not actually free (even if user data isn't being sold). e.g. A median-wage (10e3USD/year) world worker spending 20 seconds solving a captcha has an opportunity cost of 0.03USD[2]. Further more, having to solve congestion issues by implementing requirements to use closed/inaccessible (credit cards) poorly programmable, sucks too. Additionally, if a congestion solution is ("I'd rather low-demand users have free access and high-demand users have expensive access) isn't solved by having a flat rate (which a "keep it low cost, mantra is incentivized to keep low"). There is market demand for: If your demands on my service are x, I'll give you back the $3.50 but if you consume y resources You have to pay Z.

Wouldn't it be great if there was a way machines could own money, send it over a layer-2 network, that was open, cheaper than credit cards, faster than L1 bitcoin, and get your money refunded if you didn't demand excessive server resources, all while not using game-able "good users come from here" privacy violating algos?

This is why micropayment using layer-2 bitcoin on the Lightning Network has significantly-valuable, latent, economic-coordination implications. Micropayments aren't about paying for 1/1000 of a peanut. They're about obviating all the engineering, social, product costs dealt with dealing with Marginal Value, Marginal Cost issues. BAD: The marginal cost of anti-DoS counter measures can always be above the marginal value of deploying them ("listen folks it costs to much to keep this service running, we'll have to shut it down". UNSTOPPABLE: If a price is put on service requests (Services on Demand)[3] the marginal value will never be below the marginal cost ("I can keep this AED locator map service running because I know a spamming request will incur costs above my production costs").

In a future where L2 Bitcoin payment/Lightning client infrastructure is prevalent, gone will be the days of annoying, productivity-draining captchas, attribute-discriminating access. Troy could charged a 0.01USD "bond" payment for a request (Which he could give back fast and costlessly to a low-demand user). Meaning the 14e3/min requests for 3 hours would have required the high-demand user a payment of $25,000USD[4].\

0.01USD refundable payment for honest users.

$25,000 USD penalty for high-demand "spammer"

[1] https://i.redd.it/pb5nggw3rulz.jpg

[2] 20/60/60 * 5

[3] https://medium.com/@soddiraju/the-not-so-micro-potential-for...

[4] 14e3 * .01 * 60 * 3

[+] skybrian|6 years ago|reply
That would only solve paying for services if you are an amoral service provider and don't care where the money really comes from as long as you get paid.

It doesn't do anything for people who don't want their services used by bad actors, which is increasingly the case these days - see all the people concerned about privacy and how big tech companies use their data. It's not going to help for anything social where you are trying to promote pro-social usage and discourage anti-social usage, however you define it.

Those concerns inevitably lead to things like "know your customer" and supply-chain policing. You can still build nice services, but not anonymous ones.

The issues are pretty much the same as TOR. Some people are willing to run TOR nodes because the good outweighs the bad, others get squeamish about child pornography and say: no thanks.

And that's why it's an API. If the "have I been owned" database were harmless and there were no concerns about bad actors, it would be a torrent, not a service.

[+] gen3|6 years ago|reply
Why do something that is so complicated and time consuming to implement when charging $3.50 is good enough? Its easier for him, as he can use already made tools, and its easier for me because I don't have to add all this extra overhead (and money) to a project. It's just $3.50 and a header.