top | item 16406933

(no title)

dlwh | 8 years ago

Main author here.

Breeze has breeze-viz, which is very basic but at the time there wasn't anything else. I highly endorse using something else. I personally like http://sameersingh.org/scalaplot/

They're under the same aegis basically because they're all mine. ScalaNLP started out as really being just NLP, but it scope-crept. That said, Epic is a library for structured prediction first and foremost, and one of the main applications of structured prediction is NLP.

Breeze is basically like SciPy and large chunks of it power Epic. It's really the only thing that doesn't belong in the namespace.

discuss

gravypod|8 years ago

I'm glad you're bringing something like this to the JVM/Scala-ecosystem.

There are some things that I've been interested in asking for in a high level scientific computing library. If you're planning on continuing your visualization library can you please come up with some solution for layout specification? Whenever I'm plotting something and I spend 30 minutes getting all of the data in order the last thing I want to do is fight with the plotting library's label positions because they overlap. Or if I say "Let me take this plot, add some more stacked subplots, and show different catagories" I don't want my labels to be perfect but my scatters to be given a 10x10 pixel box to draw into.

On the HP/numerical computing side of things have you looked into implicit GPU operation types? Something that would let you queue up operations that can be run on a parallel computing system. Basically describe complex operations with the high-level object's normal operations. The objects aren't actually calculating anything, they just organize a GPU kernel in the background. As the final stage you can turn the

    gpumat a(3, 5);
    gpumat b(5, 3);
    gpumat gpu_op_queue = (a * b) + (a * b) * 5;

    function(a, b) operation = gpu_op_queue.compile();
    mat output = operation(some_3x5, some_5x3);

In the backend you'd hopefully be able to great your own types like 'cpumat', 'computerclustermat', or 'gpuclustermat'.

If you had some easy way to generically express extremely parallel numerical operations, an abstract way of implementing high-performance back-ends that take those operations and compile them to GPU kernels, and a visualization engine that doesn't feel like it's from the 80s then your library will really take off.

Personally I feel GPU-optimization and fighting with visualization libraries are the two biggest pain points in scientific computing.

dlwh|8 years ago

Thanks for the questions.

I am very unlikely to take on visualization. I don't acutely need it for what I do, and I am some-but-not-nearly-enough interested in visualization for its own sake. I started to read about the grammar of graphics stuff at one point and decided it was too far down the rabbit hole.

I have looked more into gpu stuff, and agree specifying a compute graph (and then implicitly optimizing it) is more likely to be the future. FWIW, this is basically what XLA (from TensorFlow) and whatever it was FB announced on Friday are doing.

I wrote my thoughts up recently on the Breeze mailing list here: https://groups.google.com/forum/#!topic/scala-breeze/_hEFpnI...

I'm starting to think it through but I'm not sure I have time for that either :(. A 4-month old and a startup take up a lot of time.