Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine

waleedka|9 years ago

At a glance:

  - Only supports fully connected layers for now. No convnets or RNNs.

  - Requires a GPU. No option to run on CPU, not even for development. 

  - Setup instructions for Ubuntu only. No Mac or Windows.

  - Uses JSON to define the network architecture. Which limits what you can build.

  - Takes in data in NetCDF format only.

  - Very little documentation.

  - The name is bad. I'm not going to remember how to spell DSSTNE.

It seems like a very early proof of concept. I wouldn't expect it to be useful to most people at this point. Built-in support for sparse vectors is interesting, but not a strong selling point by itself. I hope Amazon continues to develop it. Or, even better, contribute to one of the existing more mature frameworks.

scottlegrand|9 years ago

It's more than that, and it's in use in production at Amazon. 8 TitanX GPUs can contain networks with up to 6 billion weights. As Geoffrey Hinton once said:

"My belief is that we’re not going to get human-level abilities until we have systems that have the same number of parameters in them as the brain."

And you're right that it's a specialized framework/engine. But IMO making it more general purpose is a matter of cutting and pasting the right cuDNN code or we can double down on emphasizing sparse data. Amazon OSSed this partially IMO to see what people would want here.

taneq|9 years ago

Why does JSON limit what you can build? Or do you just mean it only supports certain architectures because there are no options to specify other ones in JSON?

incepted|9 years ago

> It seems like a very early proof of concept.

Agreed, it looks like a rushed response to TensorFlow.

throwaway6497|9 years ago

Amazon is turning a new leaf. They stopped publishing to any major conferences after their last significant paper, DynamoDB.

My perception of Amazon is that they take everything from open-source but don't actively give back. Amazon and open-source never went hand-in-hand. Making their deep learning frameworks open-source is cool. Kudos to the team which managed to do this. I am sure internally, it must have been a huge struggle to get the approval from execs.

[Edit: Grammar]

throwaway6497|9 years ago

For a second, a thought crossed my mind that Amazon is actively trying to change its external perception after the NY times article and is trying to cozy up to developers. I found this on Glassdoor. Apparently, it will take a long time for them to make their culture less toxic.

===From Glassdoor===

Cons

====

The management process is abusive, and I'm currently a manager. I've seen too much "behind the wall" and hate how our individual performers can be treated. You are forced to ride people and stack rank employees...I've been forced to give good employees bad overall ratings because of politics and stack ranking. Advice to Management Don't pretend that the recent NY Times article was all about "isolated incidents". The culture IS abusive and it WILL backfire once stock value starts to drop. I'm an 8 year veteran and I no longer recommend former peers to interview with Amazon.

== [Edit: Formatted to make it clear what was pulled from Glassdoor]

manigandham|9 years ago

> take everything from open-source but don't actively give back

There's nothing wrong with this. There's no contract when using open-source and this is probably how 99% of people interact with it.

barnacle_bill|9 years ago

You really think Amazon has something to contribute? A popular thing to do at Amazon is take a complex open source package, wrap it in a web server and announce your team has launched a revolutionary new PAAS. Or take an someone else's web service and build a new web service on top of it with minimal new features and more restrictions. Then announce it and hope for Jeff visibility.

ktamura|9 years ago

First TensorFlow and now this. Tensor is quickly becoming a mathematical-term-that-sounds-familiar-to-developers-but-most-don't-know-what-it-is-actually.

Another example is topology =)

vardhanw|9 years ago

When I entered college after high school* in India (around 1990), I was enamored by their library (my school didn't have one), and I was a math enthusiast (also ranked in a few state level math talent competitions). After being introduced to vectors (in math and physics) I chanced upon tensors - it seemed interesting. I found some good books in the catalog, and asked the librarian to issue one. He just refused to lend it to me, saying that it was a topic for "higher level/senior studies" (BSc/MSc). Unfortunately that time I did could not get any other source for it, so it remained sufficiently out of my radar that I never managed to get back to it. Surprisingly, looking back, it never got covered even in my engineering curriculum - probably because it was (is) considered a more Higher mathematics thing without much engineering application. Did come across it while scanning though relativity literature, but never attempted to understand it in depth. Now seems to be the time to do it!

* College or 11th std in India is the same as 11th grade High school in the US.

rdtsc|9 years ago

Other one is isomorphic. Anything that sounds sciency or mathy will be adopted. There is no other way ;-)

qrian|9 years ago

But can we say we know what vectors are though? As far as I know tensors are derived from vectors and I would imagine programmers don't know what vectors are in a mathematical sense.

orm|9 years ago

functor is another one.

ecesena|9 years ago

field, group...

scottlegrand|9 years ago

Lead author of DSSTNE here...

1. DSSTNE was designed two years ago specifically for product recommendations from Amazon's catalog. At that time, there was no TensorFlow, only Theano and Torch. DSSTNE differentiated from these two frameworks by optimizing for sparse data and multi-GPU spanning neural networks. What it's not currently is another framework for running AlexNet/VGG/GoogleNet etc, but about 500 lines of code plus cuDNN could change that if the demand exists. Implementing Krizhevsky's one weird trick is mostly trivial since the harder model parallel part has already been written.

2. DSSTNE does not yet explicitly support RNNs, but it does have support for shared weights and that's more than enough to build an unrolled RNN. We tried a few in fact. CuDNN 5 can be used to add LSTM support in a couple hundred lines of code. But since (I believe) the LSTM in cuDNN is a black box, it cannot be spread across multiple GPUs. Not too hard to write from the ground up though.

3. There are a huge number of collaborators and people behind the scenes that made this happen. I'd love to acknowledge them openly, but I'm not sure they want their names known.

4. Say what you want about Amazon, and they're not perfect, but they let us build this from the ground up and now they have given it away. Google hired me away from NVIDIA (another one of those offers I couldn't refuse) OTOH blind-allocated me into search in 2011 and would not let me work with GPUs despite my being one of the founding members of NVIDIA's CUDA team because they had not yet seen them as useful. I didn't stay there long. DSSTNE is 100% fresh code, warts and all, and I think Amazon both for letting me work on a project like this and for OSSing the code.

5. NetCDF is a nice efficient format for big data files. What other formats would you suggest we support here?

6. I was boarding a plane when they finally released this. I will be benchmarking it in the next few days. TLDR spoilers: near-perfect scaling for hidden layers with 1000 or so hidden units per GPU in use, and effectively free sparse input layers because both activation and weight gradient calculation have custom sparse kernels.

7. The JSON format made sense in 2014, but IMO what this engine needs now is a TensorFlow graph importer. Since the engine builds networks from a rather simple underlying C struct, this isn't particularly hard, but it does require supporting some additional functionality to be 100% compatible.

8. I left Amazon 4 months ago after getting an offer I couldn't refuse. I was the sole GPU coder on this project. I can count the number of people I'd trust with an engine like this with two hands and most of them are already building deep learning engines elsewhere. I'm happy to add whatever functionality is desired here. CNN and RNN support seem like two good first steps and the spec already accounts for this.

8. Ditto for a Python interface, easily implemented IMO through the Python C/C++ extension mechanism: https://docs.python.org/2/extending/extending.html

Anyway, it's late, and it's turned out to be a fantastic day to see the project on which I spent nearly two years go OSS.

shoyer|9 years ago

Thanks for sharing your story!

Let me comment on file formats as someone familiar with both netCDF and deep learning.

I agree that netCDF is a sane binary file format for this application. It's designed for efficient serialization of large arrays of numbers. One downside is that netCDF does not support streaming without writing the data to intermediate files on disk.

Keep in mind that netCDF v4 is itself just a thin wrapper around HDF5. Given that your input format is basically a custom file format written in netCDF, I would have just used HDF5 directly. The API is about as convenient, and this would skip one layer of indirection.

The native file format for TensorFlow is its own custom TFRecords file format, but it also supports a number of other file formats. TFRecords is much simpler technology than NetCDF/HDF5. It's basically just a bunch of serialized protocol buffers [1]. About all you can do with a TFRecords file is pull out examples -- it doesn't support the fancy multi-dimensional indexing or hierarchical structure of netCDF/HDF5. But that's also most of what you need for building machine learning models, and it's quite straightforward to read/write them in a streaming fashion, which makes it a natural fit for technologies like map-reduce.

[1] https://www.tensorflow.org/versions/r0.8/api_docs/python/pyt...

xiphias|9 years ago

Where do you wok now? It's interesting to hear what offer you couldn't refuse after being in so many places

jbandela1|9 years ago

Deep Learning systems are becoming C++11's halo projects. Here are some deep learning libraries from the Internet Big 4.

Amazon DSSTNE - https://github.com/amznlabs/amazon-dsstne

Google TensorFlow - https://github.com/tensorflow/tensorflow/

Microsoft CNTK - https://github.com/Microsoft/CNTK/

Facebook fbcunn - https://github.com/facebook/fbcunn/

They all utilize C++11 or later. Just as Hadoop pushed Java in the big data, map-reduce realm, I think these libraries will push C++11 in the Deep Learning realm.

vr3690|9 years ago

I get the acronym is easy to pronounce with the suggested word, but why not just use the suggested word (destiny) as the name instead of the acronym. So much easier to read and write. They could explain the name's origin in Readme.md

abtinf|9 years ago

"Destiny" would also be ungooglable.

nate_martin|9 years ago

Maybe someone who works on deep learning could comment on what this provides vs other open source systems like theano, tensorflow, torch, etc.

curuinor|9 years ago

They claim it's twice as fast as tensorflow, which is not blow-you-out-of-the-water (compare to like 50x speedup from GPU on most places), but it's a solid speedup.

It's easily parallelizable on GPU's, or so the claim goes.

Its configuration language is much, much shorter than caffe's, but upon inspection it looks like that the configuration language is also much less flexible than caffe's and they implemented a damn sight less stuff. No recurrent anything, for example, or LSTM, no gating stuff that you would need if you were doing LSTM, no residual net stuff, just off the top of my head.

It looks like much, much less complete docs in comparison to TF and Theano and things. Note the probability of dropout given in the user docs, but the actual documentation for dropout feature is hidden away inside the repo.

The important thing, however, is that they claim that there's a significant improvement on doing training on extraordinarily sparse datasets, like recommender systems and things like that. It seems very specialized for that specific exact purpose: see only accepting NetCDF format data, which is common enough in climatology-land but less common in machine learning-land proper.

The test coverage... To a first approximation, there is no test coverage. It seems quite research project-y.

romerocesar|9 years ago

One important difference is model-parallel training. From the FAQ:

DSSTNE instead uses “model-parallel training”, where each layer of the network is split across the available GPUs so each operation just runs faster. Model-parallel training is harder to implement, but it doesn’t come with the same speed/accuracy trade-offs of data-parallel training.

https://github.com/amznlabs/amazon-dsstne/blob/master/FAQ.md

gidim|9 years ago

They claim to perform much better on sparse data sets. "DSSTNE is much faster than any other DL package (2.1x compared to Tensorflow in 1 g2.8xlarge) for problems involving sparse data". It also has good support for distributing the computation over multiple GPUS. Theano for example can't do anything like that. On the other hand using JSON to design my models sound much worse than using a programming language.

Giorgi|9 years ago

Soo... what is the application for this (other than buzzwords)

romerocesar|9 years ago

srsly? all the discussion above 10hrs+ before your comment and that's your question?

RTFM: https://github.com/amznlabs/amazon-dsstne/blob/master/FAQ.md

53 comments