top | item 6021053

Introducing HipHip (Array): Fast and flexible numerical computation in Clojure

93 points| trevoragilbert | 12 years ago |blog.getprismatic.com | reply

41 comments

order
[+] adrianm|12 years ago|reply
I do not believe this is the right approach to the problem, but I do appreciate the problem you're trying to solve here. However, in my opinion - Clojure libraries shouldn't be trying to reinvent the wheel. If your goal is to expose a better interface for vector arithmetic in Clojure - write a library that does that really well.

But if your primary concern is performance, please don't roll your own vector or matrix "native" interface. You will certainly never come close in speed to what has come before (BLAS implementations galore, et al). Also it's just a lot of work that is basically keeping you from working on the higher order problems out there that we desperately need to tackle.

If your goal is more "Clojurey" syntax then just spend a day or two wrapping the functions you want over a tried and tested numerics implementation. Additionally, there is likely a pre-existing Java wrapper which does just that for whatever you need considering that Java is still beloved by university professors, a key demographic for fast math libraries.

On the other hand, I think Vertigo ( github: https://github.com/ztellman/vertigo ) is taking a very interesting approach to the Clojure->Native problem, which I believe might be of use to any library wanting to bring performant numerics to Clojure. Unfortunately, ztellman has deprecated his OpenGL and OpenCL libraries, but I think that Vertigo in combination with OpenCL and the kernels courtesy of clMAGMA would be fantastic.

[+] w01fe|12 years ago|reply
> If your goal is more "Clojurey" syntax then just spend a > day or two wrapping the functions you want over a tried > and tested numerics implementation.

This is exactly what we're trying to do: provide some Clojure macros that give nicer syntax for interacting with Java arrays with high performance. We're explicitly not introducing a new vector type.

Most of the work here wasn't in the wrapping -- hiphip itself consists of very little code -- but in figuring out what's fast and what's not, documenting this, and making it easy to do things the fast way.

[+] Mikera|12 years ago|reply
You've just pretty much described the motivation for core.matrix: it's an API that wraps various other back end vector/matrix libraries (including JBlas etc.) with a nice, standard Clojure API.
[+] vemv|12 years ago|reply

    All arithmetic operations on these boxed objects are
    significantly slower than on their primitive counterparts.
    This implementation also creates an unnecessary intermediate sequence
    (the result of the map), rather than just summing
    the numbers directly.
Clojure's Reducers framework might address the described issues in a future when, in Rich's words, "those IFn.LLL, DDD etc primitive-taking function interfaces spring to life". For now, they only solve the intermediate-collections part of the problem.
[+] w01fe|12 years ago|reply
We're also anxiously awaiting this -- it seems with gvecs and reducers and primitive fns the pieces are all there, we just need the glue to put them all together. Unfortunately, for now I think we're stuck with arrays, and we're trying to make the most of it :)
[+] w01fe|12 years ago|reply
One of the authors here. We're excited to hear your feedback on hiphip, and will be around all day to read feedback and answer questions.
[+] flakk|12 years ago|reply
Another author here (Emil). It's been a pleasure and a great learning experience working with Prismatic (and Climate) on this. Hopefully it'll show that, given enough macros and coffee, all problems are shallow, or something to that effect.
[+] stevoski|12 years ago|reply
Love the name. That sort of pun brings a smile to my face.
[+] peatmoss|12 years ago|reply
Looks awesome! One data issue I've seen go relatively unaddressed in the Clojure community is the serialization of big matrices and arrays.

There's a start on a clojure hdf5 (hdf5 is a container format common in scientific circles) implementation, but it's a long ways from done. https://github.com/clojure-numerics/clj-hdf5 I'm not the author, but I am the negligent steward.

I'd love it if someone smarter / better at Clojure than me was interested in helping to think about useful, idiomatic high-level abstractions on top this high-performance data store.

PyTables does a great job of making gobs of hdf5 data easy to work with for analysts--I'm just too novice at Clojure/FP to know what is a reasonable analogue for Clojure.

[+] prospero|12 years ago|reply
Without knowing anything about hdf5 specifically, Vertigo [1] will let you treat a memory-mapped file (or a piece of one) as a normal Clojure data structure, as long as the element types are fixed-layout.

[1] https://github.com/ztellman/vertigo

[+] Mikera|12 years ago|reply
I've been thinking about adding some IO features to core.matrix. Haven't got round to it yet, but linking this with hdf5 format could be very useful.
[+] 51Cards|12 years ago|reply
Love the project et. all, this would be very helpful!

I have to comment on the name as well... brilliant. Kudos for something creative that has already stuck firmly in my mind.

[+] aria|12 years ago|reply
Have any icon ideas? Was thinking ["hip","hip"] or hiphip[]
[+] netshade|12 years ago|reply
Cool library, can imagine how moving away from boxing / unboxing can be a huge boost for them.

I've been looking for something that gave SIMD intrinsics to Java programmers - does anyone know if such a thing exists? Could be a nice addition to this lib.

[+] fiatmoney|12 years ago|reply
You can't, unless you write it as native code, put your data in direct NIO buffers, and go through the JNI dance.
[+] Historiopode|12 years ago|reply
What brought you to develop this library rather than relying on Incanter/Colt? The scope of HipHip seems different, of course, but there is enough of an overlap to warrant the question.
[+] aria|12 years ago|reply
I could be wrong, but I don't think Incanter has any Clojure-native means of generic operations over arrays at all.
[+] mjw|12 years ago|reply
Did you guys look at the core.matrix API?

https://github.com/mikera/matrix-api

[+] w01fe|12 years ago|reply
We did, and we've been talking to the developers about a potential future collaboration. Our goals are really complementary; hiphip is about getting your code into the inner loop of Java bytecode (not just a set of canned operations), whereas core.matrix is about abstractions for a fixed set of operations across different matrix types. There may eventually be overlap, if core.matrix gets into compiling expressions into new operation types, which sounds like something they're interested in.
[+] fiatmoney|12 years ago|reply
One thing I've found is that with macros, it can actually be easier to write performant primitive-reliant code. Still not up to Common Lisp standards, but much better than, eg, having to use a scripting language to generate all the primitive specializations of your data structure, like Trove and Fastutil do.
[+] aria|12 years ago|reply
Indeed, the core logic of HipHip is the same for all primitive types and macros generate type-hinted versions for each primitive type.
[+] tick113|12 years ago|reply
Having written my own naive Clojure dot product, I can definitely appreciate what you guys have done!

Any plans to attack sparse vectors? Performance on the sparse vector operations I wrote was poor, but being new to Clojure it wasn't a great implementation.

[+] w01fe|12 years ago|reply
We have sparse vector code built on hiphip that's slated for open-source release down the road (once we get the resources to polish it) -- stay tuned!
[+] Mikera|12 years ago|reply
vectorz-clj has sparse vector support.... it's bit of an hidden feature at the moment (you'll have to use Java interop to instantiate a SparseIndexedVector) but it works and is pretty fast for many operations.
[+] bryansum|12 years ago|reply
FYI: your link to the GitHub project halfway down the article is broken.
[+] w01fe|12 years ago|reply
Oops, thanks for letting us know! Fixed now.
[+] wavesounds|12 years ago|reply
I just got it 'Hip Hip Hurray' ... haha :-)
[+] zinxq|12 years ago|reply
As an interesting aside, you can nearly double the speed of your Java loop by unrolling it a few times. (at least it did that for me in JDK 7)
[+] w01fe|12 years ago|reply
Huh, cool! I kinda assumed the JIT already took care of this sort of low-hanging fruit, we'll test this out and if it works include it in the next version of hiphip.