top | item 18824724

(no title)

kiv6 | 7 years ago

Thanks for publishing!

I'd love to hear more details about how this is used in production. For example, if you have an anomaly that occurs only twice in a long dataset, the two anomalies would match each other and would have a low matrix profile value and be considered equally normal as a pattern that recurs thousands of times, correct?

I would normally think of an anomaly as a point in a low density region of space, but Matrix Profile seems to have a more strict definition as a point that has a large distance to its nearest neighbour - is that fair?

I'm also interested in your process for setting the parameter of the subquery length. Do you have to already know something about the expected length of an anomaly/motif, or do you sweep over multiple values?

How does this tie into alerting? Do you set a threshold on the matrix profile value that would fire an automated alert? Or is this used more as an offline tool to explore the dataset?

Minor nitpick: on the Target blog post, Prof. Keogh's name is spelled wrong (as Keough)

discuss

order

gdpq11|7 years ago

The nice thing about how the Matrix Profile is built is that you can slice up different regions of time to focus on your use case. To build the MP you start with an NxN matrix that lists the distance between every point (or technically N-m+1 x N-m+1), then find the overall closest distance for each point. However, we've found that first "updating" the NxN matrix allows you to do analyses like your two anomaly example.

In that case, you'd create a parameter "w" that specifies the boundary between when two matching points are a pattern, or if enough time has elapsed so that they should be considered two anomalies. In the NxN matrix, for the ith row you'd then set every value outside the i+w/i-w boundary to infinity. In that way, the resulting Matrix Profile would account for your situation.

Due to the algorithm's speed we do often sweep over multiple values, but try to use domain knowledge where we can. And for alerting, we sometimes have labeled data that we can calibrate the threshold to, but often times that's a matter of customer trial and error.

jamesb93|7 years ago

You say that the algorithm is fast, and the literature certainly points to this too but I tried the python implementation (linked here) on some audio data sets. 1 second of audio at a reasonable quality is 44.1k data points and it was taking minutes to process this data.

I tried an R implementation which was multi-threaded and a lot faster, but still the algorithm took ages to test lots of different window sizes and data sets.