top | item 42209622

(no title)

I like Bayesian inference for few-parameter models where I have solid grounds for choosing my priors. For neural networks, I like to ask people "what's your prior for ReLU versus LeakyReLU versus sigmoid?" and I've never gotten a convincing answer.

discuss

stormfather|1 year ago

I choose LeakyReLU vs ReLU depending on if it's an odd day of the week, LeakyReLU being the slightly favored odd-days because it's aesthetically nicer that gradients propagate through negative inputs, though I can't discern a difference. I choose sigmoid if I want to waste compute to remind myself that it converges slowly due to vanishing gradients at extreme activation levels. So its empiricism retroactively justified by some mathematical common sense that let's me feel good about the choices. Kind of like aerodynamics.

duvenaud|1 year ago

I agree choosing priors is hard, but choosing ReLU versus LeakyReLU versus sigmoid seems like a problem with using neural nets in general, not Bayesian neural nets in particular. Am I misunderstanding?

pkoird|1 year ago

Kolmogorov Arnold nets might have an answer for you!

dccsillag|1 year ago

Ah, Kolmogorov Arnold Networks. Perhaps the only model I have ever tried that managed to fairly often get AUCs below 0.5 in my tabular ML benchmarks. It even managed to get a frankly disturbing 0.33, where pretty much any other method (including linear regression, IIRC) would get >=0.99!

jwuphysics|1 year ago

Could you say a bit more about how so?

salty_biscuits|1 year ago

I'm sure there is a way of interpreting a relu as a sparsity prior on the layer.