I didn't see in the paper why these methods are not used in practice. There are parallel implementations of strassens algorithm out there (I have written one). Yes they are faster but they do not guarantee stability. Also they are significantly harder to parallelize than block cyclic MM.
> There are parallel implementations of strassens algorithm out there
Do you have one off the top of your head you could link the code too? Doesn't have to be your implementation specifically. I just want to read the code for a parallel implementation.
It seems that not very many mathematicians are working on optimizing this problem but matrix multiplication used every day by many people and companies.
Did we already exhaust economics of scale on this one?
There has been an enormous amount of effort placed into optimizing matrix multiplication.
The key here is that the methods have impracticably huge constant factors, such that for anything beyond Strassen would necessitate matrices larger than we could conceivably use to be more effective.
This comment also seems to suggest that legions of scientists and engineers aren't trying to optimize matrix multiplication. They are. Here are a few directions to pay attention to:
1. Specialized matrix multiplications for specific sizes and with reduced precision.
For example, Intel's libxsmm, methods for multiplying with reduced precision on [CGT]PU.
2. Structured matrix multiplication
Circulant and FFT-like matrices can be multiplied in linearithmic time. If you can set your problem up to use these instead (see Choromanski's Orthogonal Random Features [2016] or Smola's Deep-Fried Convnets [2015]/Fastfood [2014]), you can dramatically improve the speed and memory efficiency of your application.
3. Hardware-specific optimizations.
Optimizing algorithms by cache efficiency, using FMA (fused multiply-addition) and SIMD instructions, and prefetching all play a big part in improving these multiplications. I'm commenting specifically on CPUs, but similar issues and methods exist on all hardware backends.
In summary, many people are working very hard on improving matrix multiplication. Lower asymptotic complexity algorithms are not the answer for this due to the absurd constant factors they require.
Well most matrix multiplications tend to be small matrices (eg 4x4) and the time complexity of multiplying 4x4 matrices is O(1). These are actively optimised in the practical, hand-written assembly and new hardware, sense.
If one has to multiply very large matrices then the goal is usually not so much to optimise matrix multiplication in general but to optimise the multiplication of the subset of matrices one is interested in. In these situations one cares about eg density and symmetry and choice of basis. There is a lot of information in a large random matrix and there is typically less information in an interesting large matrix. Therefore the goal is to skip the work required by all the entropy one does not have
The algorithms with the best asymptotic complexity aren't used in practice because they are slow for the n we care about. In practice optimising matrix multiplication comes down to making efficient use of SIMD, multiple cores and caches.
In addition to all the other reasons given (large constant, etc.), these faster algorithms are also often less stable numerically, as another poster alluded to above. See, e.g., https://en.wikipedia.org/wiki/Strassen_algorithm .
At least in my limited experience, matrix multiplication optimization is some combination of a) so specific to the specific matrices being used that the optimization amounts to finding "tricks" to reduce the problem (dgemm to sgemm or sgemm to syrk, etc.) and thus is not generally useful or b) so highly tied to IP that general optimizations (be they mathematical or advances in GPU or CPU low-level code) are kept secret.
The OP's WordPress install appears to be infected with malicious adware; every time I load the site I get quickly redirected to a random one of a number of scummy pages which hijack the back button, have autoplaying sound, display false error messages, and attempt to ask for browser-level permissions.
Put some nicely formatted text such as Google ad results and clearly mention it as an ad. WHY the fuck do ad companies have to do insanely annoying borderline unethical things? Why is this industry so rotten to the core?
[+] [-] costrouc|7 years ago|reply
[+] [-] throwawaymath|7 years ago|reply
Do you have one off the top of your head you could link the code too? Doesn't have to be your implementation specifically. I just want to read the code for a parallel implementation.
[+] [-] salty_biscuits|7 years ago|reply
[+] [-] dan-robertson|7 years ago|reply
I would however be impressed if it were constructively proven to be 2
[+] [-] utopcell|7 years ago|reply
[+] [-] foxes|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] std_throwaway|7 years ago|reply
Did we already exhaust economics of scale on this one?
[+] [-] stochastic_monk|7 years ago|reply
The key here is that the methods have impracticably huge constant factors, such that for anything beyond Strassen would necessitate matrices larger than we could conceivably use to be more effective.
This comment also seems to suggest that legions of scientists and engineers aren't trying to optimize matrix multiplication. They are. Here are a few directions to pay attention to:
1. Specialized matrix multiplications for specific sizes and with reduced precision.
For example, Intel's libxsmm, methods for multiplying with reduced precision on [CGT]PU.
2. Structured matrix multiplication
Circulant and FFT-like matrices can be multiplied in linearithmic time. If you can set your problem up to use these instead (see Choromanski's Orthogonal Random Features [2016] or Smola's Deep-Fried Convnets [2015]/Fastfood [2014]), you can dramatically improve the speed and memory efficiency of your application.
3. Hardware-specific optimizations.
Optimizing algorithms by cache efficiency, using FMA (fused multiply-addition) and SIMD instructions, and prefetching all play a big part in improving these multiplications. I'm commenting specifically on CPUs, but similar issues and methods exist on all hardware backends.
In summary, many people are working very hard on improving matrix multiplication. Lower asymptotic complexity algorithms are not the answer for this due to the absurd constant factors they require.
[+] [-] dan-robertson|7 years ago|reply
If one has to multiply very large matrices then the goal is usually not so much to optimise matrix multiplication in general but to optimise the multiplication of the subset of matrices one is interested in. In these situations one cares about eg density and symmetry and choice of basis. There is a lot of information in a large random matrix and there is typically less information in an interesting large matrix. Therefore the goal is to skip the work required by all the entropy one does not have
[+] [-] jules|7 years ago|reply
[+] [-] kkylin|7 years ago|reply
[+] [-] davidmr|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] kibwen|7 years ago|reply
[+] [-] ronsor|7 years ago|reply
[+] [-] fermienrico|7 years ago|reply
Put some nicely formatted text such as Google ad results and clearly mention it as an ad. WHY the fuck do ad companies have to do insanely annoying borderline unethical things? Why is this industry so rotten to the core?