The author of the paper here. The cubic time and quadratic space complexity is with respect to the number of dimensions (n), not the sequential time steps. The time and space complexity w.r.t. the number of time steps (L) are both linear, specifically O(Ln^3) time and O(Ln^2) space. The method gives larger speed up with longer sequence (>1k time steps) but with relatively small number of dimensions (<= 64 dimensions from our experiments).
cs702|2 years ago
Your work looks more interesting to me now, even though cubic time and quadratic space in the number of dimensions are still a significant drag.
Consider: State-of-the-art models often work with dimensions 2-3 orders of magnitude greater than 64. For example, LLaMA 2 models operate on visible and hidden states on 4096 and 11008 dimensions, respectively.
Anyway, thank you again! I'm adding your paper to my reading list.
mfkasim1|2 years ago
I agree with the cubic time and quadratic space is a big limitation for now and I'm looking for ways to make them linear (or close to linear).
nalzok|2 years ago
mfkasim1|2 years ago
mfkasim1|2 years ago