top | item 37641435

(no title)

mfkasim1 | 2 years ago

The author of the paper here. The cubic time and quadratic space complexity is with respect to the number of dimensions (n), not the sequential time steps. The time and space complexity w.r.t. the number of time steps (L) are both linear, specifically O(Ln^3) time and O(Ln^2) space. The method gives larger speed up with longer sequence (>1k time steps) but with relatively small number of dimensions (<= 64 dimensions from our experiments).

discuss

order

cs702|2 years ago

Thank you for posting on HN and clarifying that. I was so off-the-mark! I'm used to the convention in most deep-learning papers of using N for time steps and D for number of dimensions.

Your work looks more interesting to me now, even though cubic time and quadratic space in the number of dimensions are still a significant drag.

Consider: State-of-the-art models often work with dimensions 2-3 orders of magnitude greater than 64. For example, LLaMA 2 models operate on visible and hidden states on 4096 and 11008 dimensions, respectively.

Anyway, thank you again! I'm adding your paper to my reading list.

mfkasim1|2 years ago

Yes, sorry for that. I'm coming from physics, so L for sequence length and n for the number of dimensions make more sense :D

I agree with the cubic time and quadratic space is a big limitation for now and I'm looking for ways to make them linear (or close to linear).

nalzok|2 years ago

Can you share a link to your code repository? I want to play with your algorithm and observe the speedup on my own. Also I'm a big fan of JAX!

mfkasim1|2 years ago

We're preparing (i.e. cleaning up) the code for the repo. Will update you when we release the code.