(no title)
ljvmiranda | 3 years ago
I am curious if there is a sample threshold where it's worth exploring deep learning approaches to tabular data. I wonder if there are other considerations (e.g., inference speed, explainability, etc.).
ljvmiranda | 3 years ago
I am curious if there is a sample threshold where it's worth exploring deep learning approaches to tabular data. I wonder if there are other considerations (e.g., inference speed, explainability, etc.).
Tenoke|3 years ago
Not especially, but there are tasks where DL models seem to occasionally outperform by a little. If you really want to milk extra accuracy it can be worth it to try a DL model, and if it performs as well/better you can use it to make an ensemble along with your GBM or replace the GBM, though it's rare that it is worth it. If you check tabular data kaggle winner writeups most use gbms or an ensemble for a tiny boost over just a GBM.
Assuming limited time to work on the problem, you'd almost always want to focus on further feature engineering first and likely some hyperparameter tuning second.
max_|3 years ago
michaelscott|3 years ago
What was really interesting was when the dataset had more than 6k or so; the deep learning model was suddenly much more accurate and by a wide gap! At roughly the 10k mark, the DL model was outperforming the tree model easily.
beernet|3 years ago