top | item 32340525

(no title)

coffee_am | 3 years ago

I can't explain it, but I help maintain TensorFlow Decision Forests [1] and Yggdrasil Decision Forests [2], and in an AutoML system at work that trains models on lots of various users data, decision forest models gets selected as best (after AutoML tries various model types and hyperparameters) somewhere between 20% to 40% of the times, systematically. It's pretty interesting. Other ML types considered are NN, linear models (with auto feature crossings generation), and a couple of other variations.

[1] https://github.com/tensorflow/decision-forests [2] https://github.com/google/yggdrasil-decision-forests

discuss

onasta|3 years ago

Super interesting! Do you know the kind of data that it's usually used for? And in the remaining 80% to 60%, do NNs acccount for a large portion of the best models?

Bonus question: are the stats you're mentioning publically available?

coffee_am|3 years ago

Sadly (but correctly) nothing is public, no one ever sees any data, it's a service. Pure FNN (feedforward NN) models, if I recall correctly is also ~30 to 40%.

Since the server doesn't work for all types of data, and probably folks that are experts in ML would do their own hyperparameter tuning, and custom models, this leads to the bias on the type of datasets that are compete.

But this share have been consistent over many months of various unrelated datasets, I believe.