top | item 43962168

(no title)

Hm suppose for argument sake that feeding a batch of data through some moderately large FF architectures takes on the order of 100ms (I realise this depends on a lot parameters - but this seems reasonable for many tasks / networks).

Now suppose instead you have an CTM that allocates 10ms on the standard FF axes, and then multiplies it out by 10 internal “ticks” / recurrent steps?

The exact numbers are contrived, but my point is : couldn’t we conceivably search over that second arch just as easily?

It just boils down to whether the inductive bias of building in some explicit time axis is actually worthwhile, right ?

discuss

No comments yet.