Sadly the paper uses for benchmarks datasets that:
- are known to be pretty useless
- contain mistakes
- can be misleading with a naive F1-score measure. (to be fair they write "we looked at the F1-Score, under which both partial and full anomaly detection are considered correct identification" so this may be mitigated, but it's not clear)
See https://kdd-milets.github.io/milets2021/slides/Irrational%20...
So it's hard to take any benchmark from the paper seriously.
The paper is also ignoring any recent work (like > 2018) on univariate timeseries anomaly detection in the matrix profile space (eg MADRID).
The "practicality of usage" and conclusion sections are pretty correct though: it's expensive, slow, and no-shot is worthless if some other methods can train and infer in orders of magnitude less time.
It would have been interesting to see how the DETECTOR method performs when the LLM forecasting is replaced with some standard forecasting. (eg some auto ETS, if possible robust to anomalies in the training data). It looks like the natural follow up of this article is to remove the LLM altogether.
> For the second approach, called Detector, they use the LLM as a forecaster to predict the next value from a time series. The researchers compare the predicted value to the actual value. A large discrepancy suggests that the real value is likely an anomaly.
Unless there's more to it in the actual paper, this is how just about every anomaly detection technique already works. You fit a model of the distribution of data under normal-enough circumstances, and an observation is an "anomaly" if it seems very improbable (based on your model), or is otherwise extreme if your model isn't explicitly probabilistic.
So yes, this technique would be great if you removed the LLM: it's already the industry standard framework.
There's nothing wrong conceptually with trying to plug in a transformer model here. The problem is the presumption that a "large" pre-trained transformer model can actually work effectively on arbitrary time series.
As opposed to sending crowds of people to interact with the system? We are even more complex and less benchmarked/tested than LLMs. At least the LLM is a known thing, we know its limitations, we can evaluate and compensate for their lacks.
Some people trivialise things in bad faith to score internet points on Hacker News, I guess.
The status quo, per the article, was already using “deep learning” models to perform anomaly detection. We are already talking about complex, black-box systems.
The stated advantage was that they didn’t require deployment-specific training. You can just throw a pre-trained LLM at it. There is also a stated benefit: early-stage detection, without needing to pay for, or wait for, training a custom ML model.
The article is quite open about the fact that the LLM approach doesn’t beat the state of the art in terms of accuracy.
I once had to build a complex NN based system for this exact task. We also pitted this system against a consultant building a basic xgboost classifier working with a series of sliding windows over sensor data. Xgboost won. It was also faster to run and faster to train.
I'd still love to see how a VLM did with sensor data as it's often quite obvious to spot anomalies visually for humans. Especially if you're allowed to do comparison overlays.
This exercise seems like highlighting the need for a cheap (under use), all-purpose framework for an efficient function approximator (as bare ML is too costly).
They are trying LLMs for this purpose, but maybe the structure of an optimal architecture should be studied.
Whether through analogy or an actual underlying isomorphism between the mechanisms underpinning language and other domains, I don’t see a reason LLMs can’t occasionally have insights into non-language problems
Is it better than other methods? No. Is it efficient? Absolutely not.
I don’t work with LLMs but I think a lot of the HN’s users are prematurely skeptical of the potential low-hanging fruit across many domains that can be explored with these new, convenient but invariably suboptimal tools
Yes, those systems can understand a bit of math. As long as they have memory and can compare values, it should work.
But it's weird that they would seriously try that given https://github.com/NX-AI/xlstm already exists. I mean, it's cool to know that it works, but I don't get why would they invest any more time trying to improve the results.
I read this as basically finding a language representation of timeseries data that would be understood by LLMs better than feeding raw records. I'm guessing it tokenizes very similarly to any other LLM. Perhaps I misread, though.
This is the modern day equivalent of "every signal is an image" after the initial success of deep learning algorithms in image classification tasks. It is just the academia chasing the hype train...
> However, they wanted to develop a technique that avoids fine-tuning, a process in which engineers retrain a general-purpose LLM on a small amount of task-specific data to make it an expert at one task.
This method does not avoid fine tuning. It just offloads the task to somebody else (i.e., to the LLM).
I'll buy the promise of the approach when the authors can show that they can vastly outperform an AR time series model or the simple techniques mentioned in the linked article.
[+] [-] cyrilou242|1 year ago|reply
See https://kdd-milets.github.io/milets2021/slides/Irrational%20... So it's hard to take any benchmark from the paper seriously. The paper is also ignoring any recent work (like > 2018) on univariate timeseries anomaly detection in the matrix profile space (eg MADRID).
The "practicality of usage" and conclusion sections are pretty correct though: it's expensive, slow, and no-shot is worthless if some other methods can train and infer in orders of magnitude less time.
It would have been interesting to see how the DETECTOR method performs when the LLM forecasting is replaced with some standard forecasting. (eg some auto ETS, if possible robust to anomalies in the training data). It looks like the natural follow up of this article is to remove the LLM altogether.
[+] [-] nerdponx|1 year ago|reply
Unless there's more to it in the actual paper, this is how just about every anomaly detection technique already works. You fit a model of the distribution of data under normal-enough circumstances, and an observation is an "anomaly" if it seems very improbable (based on your model), or is otherwise extreme if your model isn't explicitly probabilistic.
So yes, this technique would be great if you removed the LLM: it's already the industry standard framework.
There's nothing wrong conceptually with trying to plug in a transformer model here. The problem is the presumption that a "large" pre-trained transformer model can actually work effectively on arbitrary time series.
[+] [-] ericpauley|1 year ago|reply
Welcome to scientific papers in 2024…
[+] [-] glutamate|1 year ago|reply
[+] [-] mensetmanusman|1 year ago|reply
[+] [-] visarga|1 year ago|reply
As opposed to sending crowds of people to interact with the system? We are even more complex and less benchmarked/tested than LLMs. At least the LLM is a known thing, we know its limitations, we can evaluate and compensate for their lacks.
[+] [-] cqqxo4zV46cp|1 year ago|reply
The stated advantage was that they didn’t require deployment-specific training. You can just throw a pre-trained LLM at it. There is also a stated benefit: early-stage detection, without needing to pay for, or wait for, training a custom ML model. The article is quite open about the fact that the LLM approach doesn’t beat the state of the art in terms of accuracy.
It’s like you didn’t click the link.
[+] [-] iandanforth|1 year ago|reply
I'd still love to see how a VLM did with sensor data as it's often quite obvious to spot anomalies visually for humans. Especially if you're allowed to do comparison overlays.
[+] [-] mdp2021|1 year ago|reply
They are trying LLMs for this purpose, but maybe the structure of an optimal architecture should be studied.
[+] [-] qtwhat|1 year ago|reply
[+] [-] lmpdev|1 year ago|reply
Whether through analogy or an actual underlying isomorphism between the mechanisms underpinning language and other domains, I don’t see a reason LLMs can’t occasionally have insights into non-language problems
Is it better than other methods? No. Is it efficient? Absolutely not.
I don’t work with LLMs but I think a lot of the HN’s users are prematurely skeptical of the potential low-hanging fruit across many domains that can be explored with these new, convenient but invariably suboptimal tools
[+] [-] nerdponx|1 year ago|reply
[+] [-] viraptor|1 year ago|reply
But it's weird that they would seriously try that given https://github.com/NX-AI/xlstm already exists. I mean, it's cool to know that it works, but I don't get why would they invest any more time trying to improve the results.
[+] [-] jsemrau|1 year ago|reply
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] beardyw|1 year ago|reply
What? Which the model then tokenizes? I am struggling to make sense of this.
[+] [-] ubercore|1 year ago|reply
[+] [-] beryilma|1 year ago|reply
[+] [-] beryilma|1 year ago|reply
> However, they wanted to develop a technique that avoids fine-tuning, a process in which engineers retrain a general-purpose LLM on a small amount of task-specific data to make it an expert at one task.
This method does not avoid fine tuning. It just offloads the task to somebody else (i.e., to the LLM).
I'll buy the promise of the approach when the authors can show that they can vastly outperform an AR time series model or the simple techniques mentioned in the linked article.
[+] [-] bamboozled|1 year ago|reply