top | item 9784247

Fighting spam with Haskell

357 points| vamega | 10 years ago |code.facebook.com | reply

93 comments

order
[+] LukeHoersten|10 years ago|reply
For those not involved in the Haskell community, Simon Marlow worked full time on the GHC compiler and specifically run-time system for many years. Along with Simon Peyton-Jones, he's huge in the Haskell world. Marlow also wrote the very excellent "Parallel and Concurrent Programming in Haskell" book.

Facebook also employs Bryan O'Sullivan, an epic Haskell library writer (Aeson, Attoparsec, Text, Vector, and on and on http://hackage.haskell.org/user/BryanOSullivan). Bryan also co-authored the "Real World Haskell" book.

So Facebook has hired two prolific Haskellers and probably others I don't know about.

[+] Periodic|10 years ago|reply
It really shows what an investment they are willing to make in the language. They aren't just training up internal engineers, they're bringing in some of the best in the community.

I'm hugely excited that Facebook is making this investment and giving some of the developments back to the community. There are many smaller companies that would be interested in Haskell but don't have the resources or expertise to tackle some of these complicated problems.

Facebook is blazing the trail for production Haskell and the rest of us can follow.

[+] LukeHoersten|10 years ago|reply
A side note: The Haskell runtime is written in C so Simon Marlow is also one of the best low level C hackers I've ever met.
[+] gghh|10 years ago|reply
All correct, but I don't think Bryan O'Sullivan does much haskell these days. He's the manager of some sort of "Developer productivity" team. The Hack programming language came out of his team [1] (tho he wrote a blog post to contrast all the media coverage and say "my people did it, not me! [2]); he spoke about his activity at the last F8 [3]. But he does teach Haskell at Facebook after work :) [4]

[1] https://twitter.com/bos31337/status/446679462835273728 [2] http://www.serpentine.com/blog/2014/03/28/where-credit-belon... [3] https://www.youtube.com/watch?v=X0VH78ye4yY [4] https://twitter.com/bos31337/status/476536457000415232

EDIT: spelling

[+] dasmoth|10 years ago|reply
Damn you FaceBook.

I dislike the underlying premise, the adverts, and (especially) the "real names" policy.

But... between great bits of Open Source like React, cool infrastructure projects like this, and a technical culture which seems a whole lot more open than many other big companies, it's getting kind-of hard to go on hating. Walk back a bit from the obsession with open plan offices, and I might just cave...

[+] r0naa|10 years ago|reply
> But... between great bits of Open Source like React, cool infrastructure projects like this, and a technical culture which seems a whole lot more open than many other big companies, it's getting kind-of hard to go on hating. Walk back a bit from the obsession with open plan offices, and I might just cave...

Wholeheartedly agree with you comment, Facebook's engineering blog is truly humbling.

There is not a day where I do not try to think of something that would both make the Internet more decentralized, anonymous and secure while being extremely profitable. The latter is crucial because capitalism would make the change viral and being profitable would let me attract talent.

In short, make something that people want AND that promotes values I believe in.

[+] Bahamut|10 years ago|reply
Their commitment to open source is truly amazing - I am currently in the process of interviewing with them, and am increasingly in awe of the company.

Surprisingly, their massive open office is the quietest office I have been in. It is a really nice workplace.

[+] zak_mc_kracken|10 years ago|reply
> it's getting kind-of hard to go on hating.

Don't worry, most of their code base is still written in PHP.

[+] HugoDaniel|10 years ago|reply
This is an amazing effort, implementing ApplicativeDo and using Haxl for automatic batching and concurrency, doing code hot-swap in a compiled language, developing per-request automatic memoization, finding a aeson performance bug, translating C++ from/to haskell to do partial marshalling of data, implementing allocation limits to ghc threads, creating a shared c++ library to bundle the c++ dependencies in ghci for interactive coding, killing two ghc bugs, and more... and in the end producing a reliable scalable solution.

ouch!

[+] spopejoy|10 years ago|reply
It's definitely not just a "We're really happy with Haskell here at FB" post. This probably explains the trollish comments elsewhere to the tune of "Haskell without Simon+bos /= scalable".

It is a little intimidating, though. I think the perception is "Haskell is hard enough, AND I have to be a C++/GHC internals expert too?" The hard truth is you have to be this level of expert at something to achieve robustness at industrial scale.

Simon's comment in this thread hints at some of the power of Haskell alone:

The "automatic" bit is that we insert the code that consults the map so the programmer doesn't have to write it. The map itself is already invisible, because it's inside the monad. So the overall effect is a form of automatic memoization.[1]

This pretty much sums up appdev in Haskell. Engineer (or employ) amazing abstractions with no funny business (hello Spring!) and no loss in safety or expressiveness.

I'm really happy this team exists, this is the kind of work that will take Haskell into the mainstream. Kudos!

[1] https://news.ycombinator.com/item?id=9787523

[+] seddona|10 years ago|reply
Thanks for the overview Simon, great to hear about the use of Haskell at scale. At CircuitHub we use Haskell to build our entire web app, Haskell is great for most tasks these days.
[+] Lewisham|10 years ago|reply
Did you use a particular framework? I've been greatly enjoying my 4th attempt at learning Haskell (I think only now am I seasoned enough to really get it), and my general proof that I know something and can use it usefully is to develop a web app (as all apps are now anyways). However, I didn't find anything which looked like it had majority community buy-in unlike Rails/Sinatra/Flask et al.
[+] j_m_b|10 years ago|reply
|We implemented automatic memoization of top-level computations using a source-to-source translator. This is particularly beneficial in our use-case where multiple policies can refer to the same shared value, and we want to compute it only once. Note, this is per-request memoization rather than global memoization, which lazy evaluation already provides.

I would like to know more about this. What is a request exactly? An API call? If so, when an existing policy is changed, do the memoization tables have to change as well? How are the memoization tables shared? If this is running on a cluster, I would imagine that lookups in a memoization table could be a bottleneck to performance.

[+] simonmar|10 years ago|reply
For example, let's say that one of the things you want to compute is the number of friends of the current user. This value is used all over the codebase, but it only makes sense in the context of the current request (because every request has a different idea of "the current user"). So this is a memoized value, even though in the language it looks like a top-level expression.

Memoization only stores results during a request. It starts empty at the beginning of the request and is discarded at the end, and it is not shared with any other requests. It's just a map that's passed around (inside the monad) during a request.

[+] fokz|10 years ago|reply
I am under the impression that a large part of engineering effort at established companies go into porting existing components to a deemed to be more appropriate language for that task.

Is it plain impossible to pick the best fit language without implementing a solution in the first place and fleshing out the requirements and challenges that specific to the problem space? Or do the problems evolve fast enough that no matter how well you design the system, it will need to be deprecated once in a few years?

[+] lmm|10 years ago|reply
It's rarely "just" a port. It's usually because the new language has better characteristics. These days porting can be a very gradual, as-needed process - partly because of thrift, another Facebook tool!

One of the big parts of becoming a professional for me was accepting that code has a lifecycle; code is written to make the business money at the time, but it's entirely normal for it to change and die as time goes on.

(That said, you should just write everything in Haskell and then you won't have these problems. When was the last time you saw a company port code away from Haskell?)

[+] bpyne|10 years ago|reply
It's tricky picking a "best fit language" when you're not sure at project inception what you're fitting it to. Say, for instance, that you're developing v. 1.0 of an application in a new application space. Your CEO catches wind that another company is looking to enter that same space. She is also adjusting requirements to find a business model that works. You would choose a development stack that allows rapid prototyping and refactoring.

Once you win user-share in the space the application starts having problems with scalability or debugging becomes an issue or a myriad of other problems that come with success. Now you need to consider writing parts (or possibly the whole application) in a development stack that allows greater scalability or has built-in semantics for core functionality in your application, thus allowing less reliance on 3rd party libraries, etc.

The other factor is that the team who authors an application may have a totally different skill set from the team who takes over the mature application. The authors may have been VB developers with good SQL skills. Your current team has more people who have dealt with Haskell in large-scale applications.

There is some parallel to home-building. A homeowner does a room conversion for more space, more privacy, etc. The builders of the home didn't install tracks and sliding-panel walls for better room configuration. At the time, they didn't see a need. The homeowner now has a need, so he goes through a much more expensive (relatively speaking) building effort.

[+] liviu-|10 years ago|reply
Could be simply that time uncovers better solutions to the same problems, which triggers the need of porting.
[+] Lewisham|10 years ago|reply
Bit of Column A, bit of Column B.

All software tends to unmaintainability, even if you don't touch it (libraries change/deprecate etc). Given enough time, the problems that you are experiencing, say, memory allocation in C, get solved by something else, such as garbage collectors, and the amount of time it takes to solve the headache in what you've got is longer than just rewriting into the new language.

[+] reagency|10 years ago|reply
Plan to migrate to a new implementation every time scale grows 10-100x. Scale changes the relative weights of the concerns in the tradeoff.
[+] wz1000|10 years ago|reply
How does the hot-swapping work? The only way I had seen of making this happen is what xmonad does. I'm assuming this is radically different from that.
[+] mmarx|10 years ago|reply
What xmonad does actually isn't hotswapping, but exec'ing the new code and simply passing the current state along. Instead they are using GHC's runtime linker to load/unload modules dynamically, in the same fashion as ghci does.
[+] agentgt|10 years ago|reply
I was wondering that myself. Maybe they will release code examples on how that works. I'm only slightly familiar with the Haskell ecosystem but I have also wondered if it has a plugin like system (e.g. OSGI for Java).
[+] VeejayRampay|10 years ago|reply
And two GHC bugs fixed along the way. Well done everyone.
[+] reagency|10 years ago|reply
This is impressive, and in line with Haskell's philosophy to "avoid success at all costs".

As a mere mortal programmer, who knows a little Haskell, my takeaway is that if you want to run Haskell in at web scale for a a large userbase, you need the language's primarily compiler author to help build the application and to modify the Haskell compiler to make it performant. And you also need your team led by a 20-year-veteran Haskell expert who is one of the language's handful of luminaries whonwrotr a plurality of its main libraries. What are the rest of us to do, who aren't at Facebook?

[+] DasIch|10 years ago|reply
This is an incredibly stupid takeaway. Facebook can hire the best software engineers and computer scientists in the world, so of course they hire such people and these people build stuff at Facebook.

That doesn't mean other people can't use Haskell to build web applications.

[+] wz1000|10 years ago|reply
> Haskell's philosophy to "avoid success at all costs".

How? I think you are misunderstanding the philosophy. Like SPJ once explained, its 'avoid (success at all costs)' and not '(avoid success) at all costs'. It basically means, 'don't compromise on your principles in order to achieve success'.

[+] e12e|10 years ago|reply
> web scale for a a large userbase

Yes, it might take some effort to scale things across a 1.4 billion users. Now, in what language wouldn't that be a challenge?

I suppose if you're one of the "many" other companies that are approaching 2 billion active users/month, you might need to put some thought into your systems too...

[+] vezzy-fnord|10 years ago|reply
Loading and unloading code currently uses GHC's built-in runtime linker, although in principle, we could use the system dynamic linker.

That they could use the system dynamic linker makes me think they're using some form of relatively basic dlopen/dlsym/call-method procedure, or something along those lines. That's fine, though the use of "hotswapping" evokes the image of some more elaborate DSU mechanism.

[+] themeekforgotpw|10 years ago|reply
Does this system also detect propaganda operations?
[+] jjawssd|10 years ago|reply
No, it generates them /sarcasm
[+] edwinnathaniel|10 years ago|reply
I learned a few things from this post outside the usual "technical" explanations:

1. They have CORE Haskell contributors on their payroll to deliver this type of project (what this mean is that no... Haskell isn't any better than other language, it's just that they have people who know Haskell very very deep to the compiler level...)

2. In-house custom language eventually does not scale (the EOL is much much much shorter than other programming languages), plan for that :)

[+] Tehnix|10 years ago|reply
In the rare cases I would ever downvote something, this would be the case.

This is just complete snark, that misunderstands the general thesis of the article (intentionally?).

Having a core language writer on your team by no means implies that the language in itself is unusable without the designer himself. Your comment explicitly mentions this, and you could really generalise this to any language at all.

The fact that they hired a haskell contributor over a PHP or Ruby contributor was perhaps something you should consider. Or, you know, you read the article, where they answer some of these basic assumptions.

[+] dasil003|10 years ago|reply
> (what this mean is that no... Haskell isn't any better than other language, it's just that they have people who know Haskell very very deep to the compiler level...)

This is extremely presumptuous for a parenthetical. If this is what you believe it merits a comment all to itself, because I don't see how this follows in the least. To rephrase what you're claiming: if a language has a flaw than it can not be any better than any other language.

[+] Scarbutt|10 years ago|reply
What you should ask yourself then is, why did they hire core Haskell contributors?
[+] tome|10 years ago|reply
And Google and Dropbox hired Guido van Rossum ...
[+] saosebastiao|10 years ago|reply
Anybody know what sort of policy resolution algorithms are used? Is this based on Rete, or home grown?
[+] mark_l_watson|10 years ago|reply
I don't know if Rete is appropriate. Rete optimizes for a large number of rules and a small amount of data. I used to hack Rete based OPS5 a lot, including adding support for multiple data worlds. If you like Common Lisp, then the OPS5 code base is fun to hack.
[+] covi|10 years ago|reply
In the throughput graph, why does Haxl perform worse in the 3 most common request types?
[+] simonmar|10 years ago|reply
The graph is sorted by performance, with the worst performing (not necessarily the most common) on the left. We've also done more profiling and optimisation since we took those measurements.

FXL employed some tricks that were sometimes beneficial, but often weren't - for example it memoized much more aggressively than we do in Haskell. Mostly that's a loss, but just occasionally it's a win. When a profile shows up one of these cases, we can squash it by fixing the original code.

What matters most is overall throughput for the typical workload, and we win comfortably there.

[+] Periodic|10 years ago|reply
I don't think the graph says which request types are more or less common. The ordering could be arbitrary.

It would be interesting to know why some are slower. Perhaps they require very little processing and so the time becomes dominated by FFI transformations? It would be nice to know!

[+] nlake44|10 years ago|reply
Facebook censors so much and swaps out links. I can't even post a link to https://scientificamerica.com. Their SSL cert is weak and should not be trusted. It is swapped out for phishing scams.
[+] reagency|10 years ago|reply
That URL is a typosquatting domain-parking site, so....
[+] TazeTSchnitzel|10 years ago|reply
Facebook swaps out links to hide the referrer. It's for privacy.
[+] giancarlostoro|10 years ago|reply
It seems like Facebook seems to be the one company developing a lot of the very interesting and useful tools for developers.
[+] pekk|10 years ago|reply
That is unnecessarily negative toward the efforts of literally everyone else.