TabPFN is an amazing innovation. But there are some crucial differences in model capabilities that make it hard for a fair comparison.
TabPFN can only operate on a single small table. But real-world datasets are actually multi-table and to make accurate prediction you need to capture signal from multiple tables (for example, customers, products, purchases).
So, the comparison to TabPFN would be unfair as it would only use data from a single table and that would lead to bad performance of TabPFN.
Interesting timing, they have recently reached out to my $dayjob. We will be probably be running a workshop on our (massive) dataset with them. I'd like to evaluate the performance of a couple of analytical models we've manually built against whatever this model can do based on some prompts. Exciting times!
I'll suspect it'll be more like the next little thing. Most of don't interact that much with structured data, so the applications will be very specific.
However, the algo-trading crowd, will likely be very interested in this. They deal with structured data all day and it would surprise me if most of them don't already have things like this working in their networks. They seem to be very secretive, though, so we're not gonna hear much.
So suppose I've got a database of behavioral and neuroimaging data from a research study on autism. Is this something that can be used to predict diagnosis from the other data fields?
Yes, I think this would work. For example, you'd organize the data into 3 tables: patients, behaviors and images. The patients table would have a partially filled-out "diagnosis" column. The model would then predict diagnosis of not-yet-diagnosed patients based on the patterns in data fields of previously diagnosed patients.
interesting! Super cool idea to augment software built with traditional DBs
I had some thoughts [1] around a concept similar to this a while ago, although it was much less refined. My thinking was around whether or not we could have a neural net remember a relational database schema, and be able to be queried for facts it knows, and facts it might predict.
This seems like a much more sensical (and actualised) stab at this kinda concept.
Hey! I'm one of the engineers who worked on this project.
These are all problems that KumoRFM is able to solve given that you have the right relational data of course! So e.g. for predicting restaurant table availability you would need at least an occupancy table which records how many seats were available historically and you can predict its future entries.
But you can also add more relevant data without joining into a single table, so you can add a restaurants table, a holiday-calendar table, weather patterns, etc. and KumoRFM should take it all into account when predicting.
tinyoli|9 months ago
profjure|9 months ago
TabPFN can only operate on a single small table. But real-world datasets are actually multi-table and to make accurate prediction you need to capture signal from multiple tables (for example, customers, products, purchases).
So, the comparison to TabPFN would be unfair as it would only use data from a single table and that would lead to bad performance of TabPFN.
simplesort|9 months ago
He seemed like a good guy and got the sense that he was destined to do something big
stuartjohnson12|9 months ago
I'm also guessing at some point he will probably read this comment, so hey Vid! See you at the next VRSA meetup!
hustwindmaple1|9 months ago
nsbk|9 months ago
bookworm123|9 months ago
perbu|9 months ago
However, the algo-trading crowd, will likely be very interested in this. They deal with structured data all day and it would surprise me if most of them don't already have things like this working in their networks. They seem to be very secretive, though, so we're not gonna hear much.
hbarka|9 months ago
SubiculumCode|9 months ago
profjure|9 months ago
dcrimp|9 months ago
I had some thoughts [1] around a concept similar to this a while ago, although it was much less refined. My thinking was around whether or not we could have a neural net remember a relational database schema, and be able to be queried for facts it knows, and facts it might predict.
This seems like a much more sensical (and actualised) stab at this kinda concept.
[1]: dancrimp.nz/2024/11/01/semantic-db/
EGreg|9 months ago
autorinalagist|9 months ago
These are all problems that KumoRFM is able to solve given that you have the right relational data of course! So e.g. for predicting restaurant table availability you would need at least an occupancy table which records how many seats were available historically and you can predict its future entries.
But you can also add more relevant data without joining into a single table, so you can add a restaurants table, a holiday-calendar table, weather patterns, etc. and KumoRFM should take it all into account when predicting.
Rohitcss|9 months ago