I do not believe this is the right approach to the problem, but I do appreciate the problem you're trying to solve here. However, in my opinion - Clojure libraries shouldn't be trying to reinvent the wheel. If your goal is to expose a better interface for vector arithmetic in Clojure - write a library that does that really well.
But if your primary concern is performance, please don't roll your own vector or matrix "native" interface. You will certainly never come close in speed to what has come before (BLAS implementations galore, et al). Also it's just a lot of work that is basically keeping you from working on the higher order problems out there that we desperately need to tackle.
If your goal is more "Clojurey" syntax then just spend a day or two wrapping the functions you want over a tried and tested numerics implementation. Additionally, there is likely a pre-existing Java wrapper which does just that for whatever you need considering that Java is still beloved by university professors, a key demographic for fast math libraries.
On the other hand, I think Vertigo ( github: https://github.com/ztellman/vertigo ) is taking a very interesting approach to the Clojure->Native problem, which I believe might be of use to any library wanting to bring performant numerics to Clojure. Unfortunately, ztellman has deprecated his OpenGL and OpenCL libraries, but I think that Vertigo in combination with OpenCL and the kernels courtesy of clMAGMA would be fantastic.
> If your goal is more "Clojurey" syntax then just spend a
> day or two wrapping the functions you want over a tried
> and tested numerics implementation.
This is exactly what we're trying to do: provide some Clojure macros that give nicer syntax for interacting with Java arrays with high performance. We're explicitly not introducing a new vector type.
Most of the work here wasn't in the wrapping -- hiphip itself consists of very little code -- but in figuring out what's fast and what's not, documenting this, and making it easy to do things the fast way.
You've just pretty much described the motivation for core.matrix: it's an API that wraps various other back end vector/matrix libraries (including JBlas etc.) with a nice, standard Clojure API.
All arithmetic operations on these boxed objects are
significantly slower than on their primitive counterparts.
This implementation also creates an unnecessary intermediate sequence
(the result of the map), rather than just summing
the numbers directly.
Clojure's Reducers framework might address the described issues in a future when, in Rich's words, "those IFn.LLL, DDD etc primitive-taking function interfaces spring to life". For now, they only solve the intermediate-collections part of the problem.
We're also anxiously awaiting this -- it seems with gvecs and reducers and primitive fns the pieces are all there, we just need the glue to put them all together. Unfortunately, for now I think we're stuck with arrays, and we're trying to make the most of it :)
Another author here (Emil). It's been a pleasure and a great learning experience working with Prismatic (and Climate) on this. Hopefully it'll show that, given enough macros and coffee, all problems are shallow, or something to that effect.
I've been a big follower of your entire team, specifically on their talents in machine learning and NLP (e.g. http://nlp.stanford.edu/jrfinkel/papers/jrfinkel-thesis.pdf). The fact that you also use Clojure and give back so much is icing on the cake. Thank you!
Looks awesome! One data issue I've seen go relatively unaddressed in the Clojure community is the serialization of big matrices and arrays.
There's a start on a clojure hdf5 (hdf5 is a container format common in scientific circles) implementation, but it's a long ways from done. https://github.com/clojure-numerics/clj-hdf5 I'm not the author, but I am the negligent steward.
I'd love it if someone smarter / better at Clojure than me was interested in helping to think about useful, idiomatic high-level abstractions on top this high-performance data store.
PyTables does a great job of making gobs of hdf5 data easy to work with for analysts--I'm just too novice at Clojure/FP to know what is a reasonable analogue for Clojure.
Without knowing anything about hdf5 specifically, Vertigo [1] will let you treat a memory-mapped file (or a piece of one) as a normal Clojure data structure, as long as the element types are fixed-layout.
Cool library, can imagine how moving away from boxing / unboxing can be a huge boost for them.
I've been looking for something that gave SIMD intrinsics to Java programmers - does anyone know if such a thing exists? Could be a nice addition to this lib.
What brought you to develop this library rather than relying on Incanter/Colt?
The scope of HipHip seems different, of course, but there is enough of an overlap to warrant the question.
We did, and we've been talking to the developers about a potential future collaboration. Our goals are really complementary; hiphip is about getting your code into the inner loop of Java bytecode (not just a set of canned operations), whereas core.matrix is about abstractions for a fixed set of operations across different matrix types. There may eventually be overlap, if core.matrix gets into compiling expressions into new operation types, which sounds like something they're interested in.
One thing I've found is that with macros, it can actually be easier to write performant primitive-reliant code. Still not up to Common Lisp standards, but much better than, eg, having to use a scripting language to generate all the primitive specializations of your data structure, like Trove and Fastutil do.
Having written my own naive Clojure dot product, I can definitely appreciate what you guys have done!
Any plans to attack sparse vectors? Performance on the sparse vector operations I wrote was poor, but being new to Clojure it wasn't a great implementation.
vectorz-clj has sparse vector support.... it's bit of an hidden feature at the moment (you'll have to use Java interop to instantiate a SparseIndexedVector) but it works and is pretty fast for many operations.
Huh, cool! I kinda assumed the JIT already took care of this sort of low-hanging fruit, we'll test this out and if it works include it in the next version of hiphip.
[+] [-] adrianm|12 years ago|reply
But if your primary concern is performance, please don't roll your own vector or matrix "native" interface. You will certainly never come close in speed to what has come before (BLAS implementations galore, et al). Also it's just a lot of work that is basically keeping you from working on the higher order problems out there that we desperately need to tackle.
If your goal is more "Clojurey" syntax then just spend a day or two wrapping the functions you want over a tried and tested numerics implementation. Additionally, there is likely a pre-existing Java wrapper which does just that for whatever you need considering that Java is still beloved by university professors, a key demographic for fast math libraries.
On the other hand, I think Vertigo ( github: https://github.com/ztellman/vertigo ) is taking a very interesting approach to the Clojure->Native problem, which I believe might be of use to any library wanting to bring performant numerics to Clojure. Unfortunately, ztellman has deprecated his OpenGL and OpenCL libraries, but I think that Vertigo in combination with OpenCL and the kernels courtesy of clMAGMA would be fantastic.
[+] [-] w01fe|12 years ago|reply
This is exactly what we're trying to do: provide some Clojure macros that give nicer syntax for interacting with Java arrays with high performance. We're explicitly not introducing a new vector type.
Most of the work here wasn't in the wrapping -- hiphip itself consists of very little code -- but in figuring out what's fast and what's not, documenting this, and making it easy to do things the fast way.
[+] [-] Mikera|12 years ago|reply
[+] [-] vemv|12 years ago|reply
[+] [-] w01fe|12 years ago|reply
[+] [-] w01fe|12 years ago|reply
[+] [-] flakk|12 years ago|reply
[+] [-] stevoski|12 years ago|reply
[+] [-] hashtree|12 years ago|reply
[+] [-] peatmoss|12 years ago|reply
There's a start on a clojure hdf5 (hdf5 is a container format common in scientific circles) implementation, but it's a long ways from done. https://github.com/clojure-numerics/clj-hdf5 I'm not the author, but I am the negligent steward.
I'd love it if someone smarter / better at Clojure than me was interested in helping to think about useful, idiomatic high-level abstractions on top this high-performance data store.
PyTables does a great job of making gobs of hdf5 data easy to work with for analysts--I'm just too novice at Clojure/FP to know what is a reasonable analogue for Clojure.
[+] [-] prospero|12 years ago|reply
[1] https://github.com/ztellman/vertigo
[+] [-] Mikera|12 years ago|reply
[+] [-] 51Cards|12 years ago|reply
I have to comment on the name as well... brilliant. Kudos for something creative that has already stuck firmly in my mind.
[+] [-] aria|12 years ago|reply
[+] [-] netshade|12 years ago|reply
I've been looking for something that gave SIMD intrinsics to Java programmers - does anyone know if such a thing exists? Could be a nice addition to this lib.
[+] [-] fiatmoney|12 years ago|reply
[+] [-] Historiopode|12 years ago|reply
[+] [-] aria|12 years ago|reply
[+] [-] mjw|12 years ago|reply
https://github.com/mikera/matrix-api
[+] [-] w01fe|12 years ago|reply
[+] [-] fiatmoney|12 years ago|reply
[+] [-] aria|12 years ago|reply
[+] [-] tick113|12 years ago|reply
Any plans to attack sparse vectors? Performance on the sparse vector operations I wrote was poor, but being new to Clojure it wasn't a great implementation.
[+] [-] w01fe|12 years ago|reply
[+] [-] Mikera|12 years ago|reply
[+] [-] bryansum|12 years ago|reply
[+] [-] w01fe|12 years ago|reply
[+] [-] wavesounds|12 years ago|reply
[+] [-] zinxq|12 years ago|reply
[+] [-] w01fe|12 years ago|reply