top | item 23607777

(no title)

And what were the results of these experiments? What error rate can you reach with the smallest network architecture you tried for example?

discuss

vishvananda|5 years ago

Unfortunately I don't remember the exact numbers, but I think it was a couple percentage points worse than we were able to get with the large models.

jonath_laurent|5 years ago

This is interesting, thanks! Is there anything else you can tell me about the results of your experiments with small networks? I am really interested in this.

For example: did you notice than increasing or decreasing network size required significant changes in other hyperparameters? Are small networks learning faster at the beginning of training before they start to plateau?