This blog post[1] from the developer helped me to understand this better. It's linked on the github page, but I wanted to provide the link for more direct clarity.
Astonishingly, we can process 20 years’ worth of data, sampled every five minutes, in less than 20 seconds.
I'd love to hear more details about how this is used in production. For example, if you have an anomaly that occurs only twice in a long dataset, the two anomalies would match each other and would have a low matrix profile value and be considered equally normal as a pattern that recurs thousands of times, correct?
I would normally think of an anomaly as a point in a low density region of space, but Matrix Profile seems to have a more strict definition as a point that has a large distance to its nearest neighbour - is that fair?
I'm also interested in your process for setting the parameter of the subquery length. Do you have to already know something about the expected length of an anomaly/motif, or do you sweep over multiple values?
How does this tie into alerting? Do you set a threshold on the matrix profile value that would fire an automated alert? Or is this used more as an offline tool to explore the dataset?
Minor nitpick: on the Target blog post, Prof. Keogh's name is spelled wrong (as Keough)
The nice thing about how the Matrix Profile is built is that you can slice up different regions of time to focus on your use case. To build the MP you start with an NxN matrix that lists the distance between every point (or technically N-m+1 x N-m+1), then find the overall closest distance for each point. However, we've found that first "updating" the NxN matrix allows you to do analyses like your two anomaly example.
In that case, you'd create a parameter "w" that specifies the boundary between when two matching points are a pattern, or if enough time has elapsed so that they should be considered two anomalies. In the NxN matrix, for the ith row you'd then set every value outside the i+w/i-w boundary to infinity. In that way, the resulting Matrix Profile would account for your situation.
Due to the algorithm's speed we do often sweep over multiple values, but try to use domain knowledge where we can. And for alerting, we sometimes have labeled data that we can calibrate the threshold to, but often times that's a matter of customer trial and error.
It is possible that two occurrences of the same motif can overlap.
And
It is possible that two different motifs can overlap.
Lets see both cases, in string analogs. We will start with the second case, using an example from John Cleese…
“…itself…and hence the very meaning of life itselfish bastard, I'll kick him… selfish…”
Here there is a motif “itself” and there is a motif “selfish”. Note that one occurrence of each motif appears overlapping in “itselfish”.
---
Now for the first case:
“….soihsehihrhewCOMICOMICireoqiwwherhqwe…”
Here we have a motif “COMIC”, but they share a letter, the central ‘C’. We can allow motifs to share more letters, but they cannot share ALL letters, that would be a trivial match.
The matrix profile has a simple parameter (the exclusion zone) that lets you control how much overlap you want to allow.
I probably was a bit imprecise but what I want to know is if there is a way to apply this to data that are possibly in a superposition and overlapping meaning that you only see the sum of the events. For example if one wants to analyze a changing electric or magnetic field.
Nevertheless, the points you mentioned are something I did not think about at first, interesting once again.
Thanks for making this available! As someone who also has some TS analysis to do, I appreciate the fact that the code is available and in Python!
I am curious: are you affiliated with the UCR people? What's your opinion on Keogh's claims of the matrix profile making many TS problems easy or trivial?
I'm not affiliated with UCR, though I am a product of the UC system :)
I agree with Keogh that Matrix Profile can help solve a very wide range of problems, but you usually have to go a little bit deeper than just calculating the topline Matrix Profile. A good example of this is that if you calculate the Matrix Profile for something with daily seasonality (say, in-store retail sales), you'll see the same daily pattern in the Matrix Profile. The straightforward fix for this is to normalize by time window (say, only compare the Matrix Profile at the same time each day).
Slight beginner question pertaining to the anomaly detection with STAMPI example: How exactly do the graphs showcase a "detection" by the Matrix Profile?
While the signal graph is clearly out of bounds (100% above last upper bound), the relativ Matrix Profile's "spike in value" fits perfectly within the bounds of that graph.
Yeah, this is actually a good example of why it's important to add a bit to the raw Matrix Profile. The point is anomalous with respect to the pattern preceding it (the "sawtooth"), so in this case one needs to consider the whole Matrix Profile. It's a good callout in that the graph isn't a complete anomaly detection system; it more demonstrates how a single anomalous point can impact the Matrix Profile value.
It definitely can be (I believe one of the academic papers covers that), but we haven't implemented anything yet. But it's definitely on the to-do list
cwal37|7 years ago
Astonishingly, we can process 20 years’ worth of data, sampled every five minutes, in less than 20 seconds.
That sounds quite promising.
[1] https://tech.target.com/2018/12/11/matrix-profile.html
kiv6|7 years ago
I'd love to hear more details about how this is used in production. For example, if you have an anomaly that occurs only twice in a long dataset, the two anomalies would match each other and would have a low matrix profile value and be considered equally normal as a pattern that recurs thousands of times, correct?
I would normally think of an anomaly as a point in a low density region of space, but Matrix Profile seems to have a more strict definition as a point that has a large distance to its nearest neighbour - is that fair?
I'm also interested in your process for setting the parameter of the subquery length. Do you have to already know something about the expected length of an anomaly/motif, or do you sweep over multiple values?
How does this tie into alerting? Do you set a threshold on the matrix profile value that would fire an automated alert? Or is this used more as an offline tool to explore the dataset?
Minor nitpick: on the Target blog post, Prof. Keogh's name is spelled wrong (as Keough)
gdpq11|7 years ago
In that case, you'd create a parameter "w" that specifies the boundary between when two matching points are a pattern, or if enough time has elapsed so that they should be considered two anomalies. In the NxN matrix, for the ith row you'd then set every value outside the i+w/i-w boundary to infinity. In that way, the resulting Matrix Profile would account for your situation.
Due to the algorithm's speed we do often sweep over multiple values, but try to use domain knowledge where we can. And for alerting, we sometimes have labeled data that we can calibrate the threshold to, but often times that's a matter of customer trial and error.
jmmcd|7 years ago
boltzmannbrain|7 years ago
[1] https://github.com/numenta/nupic
[2] https://www.sciencedirect.com/science/article/pii/S092523121...
marmaduke|7 years ago
It also seems like it requires some normalization of the data, this should counted as an effective parameter of the method.
In any case, it’d be useful to a GLM with these profiles against bug or outage reports, log rates etc.
eamonnkeogh|7 years ago
“…itself…and hence the very meaning of life itselfish bastard, I'll kick him… selfish…” Here there is a motif “itself” and there is a motif “selfish”. Note that one occurrence of each motif appears overlapping in “itselfish”. --- Now for the first case: “….soihsehihrhewCOMICOMICireoqiwwherhqwe…”
Here we have a motif “COMIC”, but they share a letter, the central ‘C’. We can allow motifs to share more letters, but they cannot share ALL letters, that would be a trivial match.
The matrix profile has a simple parameter (the exclusion zone) that lets you control how much overlap you want to allow.
HMH|7 years ago
I probably was a bit imprecise but what I want to know is if there is a way to apply this to data that are possibly in a superposition and overlapping meaning that you only see the sum of the events. For example if one wants to analyze a changing electric or magnetic field.
Nevertheless, the points you mentioned are something I did not think about at first, interesting once again.
Topolomancer|7 years ago
I am curious: are you affiliated with the UCR people? What's your opinion on Keogh's claims of the matrix profile making many TS problems easy or trivial?
gdpq11|7 years ago
I agree with Keogh that Matrix Profile can help solve a very wide range of problems, but you usually have to go a little bit deeper than just calculating the topline Matrix Profile. A good example of this is that if you calculate the Matrix Profile for something with daily seasonality (say, in-store retail sales), you'll see the same daily pattern in the Matrix Profile. The straightforward fix for this is to normalize by time window (say, only compare the Matrix Profile at the same time each day).
bra-ket|7 years ago
thenaturalist|7 years ago
Slight beginner question pertaining to the anomaly detection with STAMPI example: How exactly do the graphs showcase a "detection" by the Matrix Profile?
While the signal graph is clearly out of bounds (100% above last upper bound), the relativ Matrix Profile's "spike in value" fits perfectly within the bounds of that graph.
gdpq11|7 years ago
slamstacken|7 years ago
aouyang2|7 years ago
gdpq11|7 years ago
HMH|7 years ago
So what about superpositions of events i.e. two motifs overlapping, anyone any thoughts on that?
I guess this just means a lot more swiping of observed patterns over the timeseries and thus being somewhat slow.
unknown|7 years ago
[deleted]
filleokus|7 years ago
gdpq11|7 years ago