top | item 40484236

(no title)

We're doing something less expensive: essentially, the overall gradient is computed based on certain statistics based on the branch condition and its derivatives when a branch is encountered.

We mention neural networks because DiscoGrad lets you combine branching programs with neural networks (via Torch) and jointly train/optimize them.

discuss

No comments yet.