Locally != independently; the training process is recursive, so interactions between layers are present. From the paper:
"The parameters of the entire SARM are solved recursively. The current ARM’s parameters are calculated using the output from the previous ARM. Then, the output of the current ARM is fed into the subsequent ARM (or the classifier, if the current one is the last ARM), as its input."
orangutango|9 years ago