You may be right about that, but also it depends on the requirements that you have. Ludwig gives you a lot of options for those tricks, like for instance gradient clipping or regularizers, or learning rate and batch size scheduling, but those things usually are useful for squeezing that extra 3% performance, and ieven in those cases, having them already implemented is an advantage. My personal experience is that in many cases doing the first step, getting 80% of the final performance, is enough then to convince someone of the value of what you are doing and then you can spend time later improving over it, and with this regard, Ludwig gets you from 0 to 80% really quick.
No comments yet.