The work sounds very cool (and they are hiring), but (only) a factor of 4 speedup over Python is (to repeat a phrase from elsewhere today) like boasting that you're the tallest midget ;o)
It's important to note that this particular job is largely bound on a.) I/O and b.) format serialization tasks. Both Python's BSON and JSON libraries are mature and have their critical sections written in C, so a speedup of 4x is still noteworthy. The Haskell version, on the other hand, is pure Haskell.
Agreed. Even where you can optimize the hot code in C, Python is no speed demon. Cassandra's java stress test can push out about 10x as many ops/s as the python one, even though Thrift C extension for Python is quite good.
I'd love to point people to this when trying to convey some advantages of Haskell. To make it more compelling, can you expand some on the downsides and maybe obstacles you encountered?
The thing I'm unsure about, is how difficult it would be for (very) talented developers to just jump in. We have really talented developers, and everyone is super time-constrained, so many are wary of diving into a language as different as Haskell. Was it hard for your developers to figure Haskell out? Did your previous use of Scala help? How long did it take them to dive into Scala?
I would say the two real barriers to writing effective Haskell projects are a.) "getting" monads, and b.) understanding the implications of laziness, especially with regard to space leaks and unconsumed thunks. Everything else isn't that big of a deal.
It's all much easier to digest, though, even for "really talented developers", if they have some experience with another functional language first. OCaml is a nice stepping stone before digging into the abstractions involved in understanding Haskell's powerful type system. Scala is good too, but having the object stuff mixed in there can lead you to rely on some patterns that aren't going to be available in a non-OOP language. I think the scheme/clojure path isn't bad either, but it's probably ideal to spend some time in the "statically typed" wing of the functional universe before going to Haskell.
From personal experience: I didn't make much progress in Haskell until I stopped using Scala. The problem is that Scala allows you to mix and match different paradigms and if you come from a mostly-imperative/OO background, you tend to use Scala as an OO language with some functional constructs.
To learn to program purely functional, it's best to jump into Haskell cold-turkey, since you will have to learn to think in FP.
Learning Haskell, optimization in a lazy world was the most difficult task. Often, I still have problems predicting how efficient particular code will be. The complexity of monads is somewhat overstated, though it doesn't help that some tutorials make something big and esoteric out of it. It is nothing more than a type class, that specifies how to combine computations that result in some 'boxed value'.
The author is mostly write about the usage cases of Haskell, but simply "systems" is a bit misleading because there are certain performance characteristics of lazy programs which make them bad choices for some systems programs. Any type of real-time system, for example, can suffer unpredictable performance in critical sections, which is pretty undesirable.
Not to argue the example, but Python's garbage collection disqualifies it for real-time systems as well. In fact, I'm having a hard time find a "system" task for which Python (as a language) is qualified by Haskell is not.
While I agree with you that Haskell (or, really, any GC'd language) is unsuitable for real-time systems, I disagree that my statement about its excellent suitability for systems programming in general is misleading. There are many, many domains (read: most) that, in my experience, are called "systems programming" that have nothing to do with hard or soft real-time requirements.
Now, if I had stated that all conceivable systems programming domains are addressable with Haskell, that would have indeed been foolish.
Are the logs being read from disk? In my experience, python is highly optimized for reading (possibly compressed) files from disk. If your infrastructure keeps logs in memory, python will lose this advantage and compete on computational performance where Haskell has the advantage. This is important for those of us who grind logs on disk and might be considering a language switch.
I'd be interested in hearing more about how the author is using the resulting data set. Doing extractions at event generation time can be very useful if you know what you are after in advance, but not so good for adhoc analysis.
Any reason why you didn't use Hadoop for this, then run batch jobs to extract summaries?
Yeah, the whole pipeline is actually quite more faceted than can be deduced from this summary. This stage actually just persists the events into a consolidated transaction log. Then, there are secondary processes that scan these transaction logs (in batch) and distribute data into various databases for system, business, and user analytics. I can't go into too much detail there, but the actual digesting and reporting side is more involved.
[+] [-] andrewcooke|15 years ago|reply
[+] [-] jamwt|15 years ago|reply
It's important to note that this particular job is largely bound on a.) I/O and b.) format serialization tasks. Both Python's BSON and JSON libraries are mature and have their critical sections written in C, so a speedup of 4x is still noteworthy. The Haskell version, on the other hand, is pure Haskell.
[+] [-] jbellis|15 years ago|reply
/still a Python fan
[+] [-] Peaker|15 years ago|reply
I'd love to point people to this when trying to convey some advantages of Haskell. To make it more compelling, can you expand some on the downsides and maybe obstacles you encountered?
The thing I'm unsure about, is how difficult it would be for (very) talented developers to just jump in. We have really talented developers, and everyone is super time-constrained, so many are wary of diving into a language as different as Haskell. Was it hard for your developers to figure Haskell out? Did your previous use of Scala help? How long did it take them to dive into Scala?
[+] [-] jamwt|15 years ago|reply
It's all much easier to digest, though, even for "really talented developers", if they have some experience with another functional language first. OCaml is a nice stepping stone before digging into the abstractions involved in understanding Haskell's powerful type system. Scala is good too, but having the object stuff mixed in there can lead you to rely on some patterns that aren't going to be available in a non-OOP language. I think the scheme/clojure path isn't bad either, but it's probably ideal to spend some time in the "statically typed" wing of the functional universe before going to Haskell.
[+] [-] microtonal|15 years ago|reply
To learn to program purely functional, it's best to jump into Haskell cold-turkey, since you will have to learn to think in FP.
Learning Haskell, optimization in a lazy world was the most difficult task. Often, I still have problems predicting how efficient particular code will be. The complexity of monads is somewhat overstated, though it doesn't help that some tutorials make something big and esoteric out of it. It is nothing more than a type class, that specifies how to combine computations that result in some 'boxed value'.
[+] [-] Locke1689|15 years ago|reply
[+] [-] dons|15 years ago|reply
Haskell as an EDSL for generating hard real time, however, is very viable: http://corp.galois.com/blog/2010/9/22/copilot-a-dsl-for-moni...
[+] [-] awj|15 years ago|reply
[+] [-] jamwt|15 years ago|reply
Now, if I had stated that all conceivable systems programming domains are addressable with Haskell, that would have indeed been foolish.
[+] [-] ynniv|15 years ago|reply
[+] [-] enneff|15 years ago|reply
What you're probably observing is Python's slow code generation being masked by the inherent slowness of I/O.
[+] [-] jamwt|15 years ago|reply
[+] [-] kordless|15 years ago|reply
Any reason why you didn't use Hadoop for this, then run batch jobs to extract summaries?
[+] [-] jamwt|15 years ago|reply
[+] [-] aristus|15 years ago|reply
http://tartarus.org/james/diary/2008/06/17/widefinder-final-...