top | item 9443867

Race conditions on Facebook, DigitalOcean and others (fixed)

294 points| franjkovic | 11 years ago |josipfranjkovic.blogspot.com | reply

88 comments

order
[+] ejcx|11 years ago|reply
I actually fixed the issue that was reported to LastPass.

I could be mistaken but I believe he reported the security issue through our regular support channel which is why it took three days to see (instead of our security channel). From the time I saw it, I fixed it with the patch going live within an hour or two.

When I DID see it, tried it myself with a quick shell script that that curled and backgrounded the same request a bunch of times, I just kind of chuckled. It was a good bug. Josip is top notch.

[+] franjkovic|11 years ago|reply
Thanks! I reported the bug to security@ email, and one of your team's members replied on the same day (January 6th). Either way, good job on fixing this really fast. I wish more teams are as responsive as yours.
[+] homakov|11 years ago|reply
> When I DID see it, tried it myself with a quick shell script that that curled and backgrounded the same request a bunch of times, I just kind of chuckled. It was a good bug

That's the problem with OWASP, when developer from a big company sees race condition for the first time and is surprised

[+] monksy|11 years ago|reply
BTW: I just subscribed to LastPass a few days ago. I'm pretty happy with the service.
[+] MichaelGG|11 years ago|reply
We should see lots more of these if people embrace eventual consistency instead of "slow" ACID transactions. And interestingly, the more larger scale a system, the more likely that globally consistent operations are too expensive to enable in general, and developers will overlook cases where they must implement some locking or double checking.
[+] partisan|11 years ago|reply
I would have thought that the opposite would be true; by having an CQRS/event sourcing system with eventual consistency would allow you to avoid posting duplicates to your database:

1. The user submits X number of requests within a second. 2. The system puts the request in a command queue that synchronizes the commands by coupon code, for example. 3. The command is popped off the queue and an event is generated and saved saying the coupon was redeemed. 4. The next command is picked up and all events are applied before processing. At this point, the command is no longer valid so you reject and send an event saying that an attempt was made to redeem a redeemed coupon. 5. Do the same for subsequent requests.

To me, this approach is safer and easier to reason about. You have a log of the fact that someone made the attempt so you can report on this. Not sure you get that benefit from a stored procedure and a transaction unless you build it in and then increase the running time of the open transaction.

[+] pyvpx|11 years ago|reply
when did eventual consistency equate to race conditions, or even increased susceptibility to race conditions? I don't follow. could you explain your reasoning further?
[+] unclesaamm|11 years ago|reply
Wow, it seems like there is room here for a 3rd party vendor to implement promo code handling as a service, and to do it right once and for all.
[+] d_luaz|11 years ago|reply
No bounty for bug report? Should at least have a nominal fee of $100 (else no one would bother to report it).
[+] reagan83|11 years ago|reply
The economics of bug bounty programs could lead to misaligned incentives. Because the overhead cost to validate and communicate around bug reports isn't zero, the % of non-bugs submitted could become imbalanced because it is free to submit.

In most systems the reward is zero, so you can infer if a person has taken the time to submit a bug report it is because he/she is invested in seeing it fixed.

Context: I work at a decent sized company in SV on this type of problem.

[+] squiguy7|11 years ago|reply
I agree. If I had my own company I would surely provide some incentive for bugs found in the product. Whether that incentive was monetary, a free membership, etc. I think it's important to acknowledge that all software systems are imperfect.
[+] Kiro|11 years ago|reply
I'm a novice but would like to know how these issues can arise. What kind of backend setup is needed for it to be a problem? What is happening when a race condition occurs in these examples?
[+] spdy|11 years ago|reply
Its actually quite simple as example for the promo code the code looks like this:

1. Code sent.

2. Check if valid.

3. Redeem code.

4. Invalid code.

Now if i send 10 requests at the same time with the same code maybe 4-6 will hit the code part after 2.

And your window of opportunity is the time it takes to go from 3 to 4. Sometimes certain tasks are put inside async queue, you have a slight delay to your database server or you need to wait for db replication to kick in.

Because normally there is no code part to recheck how often this code was used.

[+] mike_hearn|11 years ago|reply
>I'm a novice but would like to know how these issues can arise.

The problem is concurrency. Whenever you have multiple things happening at once, you have concurrency and programming concurrent system is always really hard.

Unfortunately the software industry has never really got a grip on this problem and there are lots of developers who have never really studied multi-threading at all. That's a problem, because it's something that takes a lot of practice and you have to just incorporate it into the way you think. After a while you do get a sixth sense for race conditions, but you'll still write racy code from time to time anyway. It's just tricky to get it right 100% of the time.

spdy has already outlined what is happening here, but this problem is something that is covered in literally any introductory course to database systems or multi-threaded programming. If you have two threads (or processes) in flight simultaneously that are reading and writing to a shared data store, then you need some kind of mutual exclusion. That can mean a lock:

1. Request for /reviews/add is received.

2. Database lock on the page reviews table is acquired.

3. Check if the user has already posted a review. If so, release the lock and abort (any good framework will release locks for you automatically if you throw an exception).

4. Add review to table.

5. Release lock.

At the point where the lock is acquired if another web server is in the middle of the operation, conceptually speaking the first one will stop and wait for the table to become available.

Real implementations don't actually "stop and wait" - that would be too slow. They use database transactions instead where both web server processes/threads proceed optimistically, and at the end the database will undo one of the changes if they detect that there was a conflict .... but you can imagine it as being like stop and wait.

Of course once you have concurrency, you have all the joy that comes with it like various kinds of deadlock.

It's funny, a few days ago I was letting my mind wonder and ended up thinking about web apps for some reason. Oh yes, now I remember, I was thinking about concurrency strategies in a software library I maintain and how to explain it to people. And then I was thinking how hard multi-threading is and how many people are selling snake-oil silver bullets to it, and started to wonder how many web apps had race conditions in them. And then I just discarded the thought as one of no consequence and got on with my day, haha :) Perhaps I should have tried to earn a bit of money this way instead.

[+] emmab|11 years ago|reply
It would be cool if there was a browser addon that let you submit a form N times in parallel.
[+] ejcx|11 years ago|reply
I do a lot of App Sec related things and I actually use mostly Chrome dev tools and command line instead of burp and other tools. The way I reproduced the bug when it was reported was by using the "Copy to curl" feature in Chrome, and then using it as follows

    for i in `seq 1 16`;do
        curl.*&               #copied from chrome dev tools. & to background
    done
[+] andersonmvd|11 years ago|reply
More interesting than the bounty itself is to understand which defense works best at scale and the nitty gritty details of those kind of attacks. Intuitively I think that we just need to avoid inconsistencies between the Time of Check (TOC) and Time of Use (TOU), so veryfing the existence of a discount coupon while inserting it in one query should do the trick (INSERT INTO coupons (...) Values (...) WHERE NOT EXISTS (SELECT 1 FROM coupons WHERE (...)) instead of increasing the time between the TOC/TOU, e.g. one query to check if the coupon exists and a second one to insert the coupon. Besides it I am wondering if I am missing something, e.g. is this really a problem limited to the application layer or are the databases unable to prevent such attacks? I think I am right regarding the app protection, but let's see what people have to say :)
[+] pilif|11 years ago|reply
In many databases, your suggested "where not exists" sub query might not actually protect you but just make the possible window to hit the race much smaller. What happens is that your database would evaluate the subquery, the rest of the where, commit another transaction and then finally run the insert part of your query.

There are no guarantees in the SQL standard that queries with subqueries should be atomic.

The only truly safe way to protect yourself is to fix the schema in a way that you can make use of unique indexes. Those are guaranteed to be unique no matter what.

[+] Osiris|11 years ago|reply
This could be an issue of a clustered database where the requests are being load balanced to multiple masters and due to latency in replication, part of the cluster may not be consistent with the other part yet. Though for someone like DigitalOcean I'd be surprised if this was the case.
[+] underwater|11 years ago|reply
Not every database is powered by SQL. Add to that sharding, caching, cross data center traffic and the problem becomes non trivial very quickly.
[+] zhoutong|11 years ago|reply
Or just use a UNIQUE INDEX.
[+] rblatz|11 years ago|reply
You are making a lot of assumptions about architecture. Not everything is back by a simple sql database. Some sites utilize event stores and read only data stores to provide data access/storage. Other sites have other unique architectures that make this specific issue a lot harder to mitigate than a simple unique constraint.
[+] hobarrera|11 years ago|reply
I'd never heard of "WHERE NOT EXISTS", but my first approach would be to make the pair (user_id, cupon_code) unique together, so that only one insertion can be made and only do any further processing if that transaction does not fail.
[+] inportb|11 years ago|reply
So the review bug was a security issue but the username bug wasn't? I wonder what else the review bug affected.
[+] franjkovic|11 years ago|reply
I think they did not reward me because you cannot really hurt anyone by having multiple usernames.
[+] georgerobinson|11 years ago|reply
Can anyone comment on how the author flooded HTTP requests to the endpoint URLs? Did he use developer tools in his browser and execute his own JavaScript, or use CURL in a tight loop with the cookie and CSRF token from his browser session?
[+] gislifb|11 years ago|reply
Without knowing exactly how he did it I assume this is possible by doing a POST with cURL inside a loop or with parallel.

You can then get the exact request by using Chrome developer-tools. (Find the POST-request in the network-tab, right-click and select copy as cURL)

[+] Rafert|11 years ago|reply
I have reported the same issue with Digital Ocean (security) in November 2014, and they told me I was using the wrong address and that they forwarded it to the proper team. I triggered it by accident, using the same GitHub code twice, and I (or the DO staffer) didn't realize it was a race condition. I never heard back but they let me keep the balance :)
[+] numair|11 years ago|reply
I would be really interested to know how various forms of this bug are resolved. This seems like a problem that, on its surface, seems easy to fix, but isn't. Especially if you've designed your architecture for real-time-ness and global redundancy. Google's servers with atomic clocks come to mind...
[+] ekimekim|11 years ago|reply
cynical answer: I've seen alot of races get "fixed" by adding a sleep() or similar

less cynical answer: Commonly you already have some kind of means to handle races - locking, transactions, some other variety of extra check - and the fix for newly discovered races is "oh, I didn't realise that could happen. add lock"

[+] jbkkd|11 years ago|reply
Now that race condition bugs have been widely exposed, I have a feeling we'll start seeing more of these "attacks" in the near future. They are relatively easy to execute and don't raise a high suspicion.
[+] tomcam|11 years ago|reply
Now please fix race conditions everywhere else, like Baltimore.
[+] yesmade|11 years ago|reply
$3k for the facebook review bug. that's a little bit too much

- update

thanks for the downvotes guys. keep up the good work

[+] franjkovic|11 years ago|reply
The bounty actually surprised me, too. I expected between $1000-$2000. That is one of reasons I like reporting bugs to Facebook - they pay really good, critical bugs are fixed really fast (<1 day).

One time they paid me $5000 for a bug I never could have found, but they did internally based on my low severity report. (http://josipfranjkovic.blogspot.com/2013/11/facebook-bug-bou...)

[+] totony|11 years ago|reply
This bug actually seems quite critical imo, defeats the purpose of a feature and permits abuse/cheating
[+] mikeash|11 years ago|reply
Who are you to say that it's "too much," when it's their money than they can spend as they wish?
[+] Gigablah|11 years ago|reply
Instead of questioning why others are getting so much, question why you're getting so little.