A similar issue has come up for codes that I write. Among other things, I write low level mathematical optimization codes that need fast linear algebra to run effectively. While there's a lot of emphasis on BLAS/LAPACK, those libraries work on dense linear algebra. In the sparse world, there are fewer good options. For things like sparse QR and Choleski, the two fastest codes that I know about are out of SuiteSparse and Intel MKL. I've not tried it, but the SuiteSparse routines will probably work fine on ARM chips, but they're dual licensed GPL/commercial and the commercial license is incredibly expensive. MKL has faster routines and is completely free, but it won't work on ARM. Note, it works fantastically well on AMD chips. Anyway, it's not that I can't make my codes work on the new Apple chips, but I'd have to explain to my commercial clients that there's another $50-100k upcharge due to the architecture change and licensing costs due to GPL restrictions. That's a lot to stomach.
Accelerate is highly performant on Apple hardware (the current Intel arch). I expect Apple to ensure same for their M-series CPUs, potentially even taking advantage of the tensor and GPGPU capabilities available in the SoC.
> I'd have to explain to my commercial clients that there's another $50-100k upcharge due to the architecture change and licensing costs due to GPL restrictions.
Your complaint is kind of strange. You're blaming "GPL restrictions" but the cost is for a commercial license.
Have you tried PETSc?
It does sparse (and dense) LU and Cholesky, plus a wide variety of Krylov methods with preconditioners.
It can be compiled to use MKL, MUMPS, or SuiteSparse if available, but also has its own implementations. So you could easily use it as a wrapper to give you freedom to write code that you could compile on many targets with varying degree of library support.
I've made the point that GCC and free linear algebra is infinitely faster on platforms of interest (geometric mean of x86_64, aarch64, ppc64le) while still having similar performance on x86_64.
I thought MKL used suitesparse, or is that just matlab?
Would their workflow allow just keeping a server on hand to do the number crunching, and still getting to use Apple Silicon on a relatively thin client?
> The ARM architecture floating point units (VFP, NEON) support RunFast mode, which includes flush-to-zero and default NaN. The latter means that payload of NaN operands is not propagated, all result NaNs have the default payload, so in R, even NA * 1 is NaN. Luckily, RunFast mode can be disabled, and when it is, the NaN payload propagation is friendlier to R NAs than with Intel SSE (NaN + NA is NA). We have therefore updated R to disable RunFast mode on ARM on startup, which resolved all the issues observed.
Hmm. ELF object files for Arm can represent this with build attributes [1]:
Tag_ABI_FP_denormal, (=20), uleb128
0 The user built this code knowing that denormal numbers might be flushed to (+) zero
1 The user permitted this code to depend on IEEE 754 denormal numbers
2 The user permitted this code to depend on the sign of a flushed-to-zero number being
preserved in the sign of 0
Tag_ABI_FP_number_model, (=23), uleb128
0 The user intended that this code should not use floating point numbers
1 The user permitted this code to use IEEE 754 format normal numbers only
2 The user permitted numbers, infinities, and one quiet NaN (see [RTABI32_])
3 The user permitted this code to use all the IEEE 754-defined FP encodings
Seems like their code should be tagged Tag_ABI_FP_denormal = 1, Tag_ABI_FP_number_model = 3 if it were an ELF .o, .so, or executable, in which case <waves hands> some other part of the toolchain or system would automatically configure the floating point unit to provide the required behavior.
I wonder what happens if you `dlopen` a shared object that wants stricter behavior than the current executable and loaded shared objects. Does it somehow coordinate changing the state for all existing threads?
Does that second setting imply that those NaNs need to be propagated? If not, then those settings aren't great. Sure, there are lots of chips where denormal behavior and NaN preservation are the same setting, but those could and probably should be split up in the future.
I am probably the last person to talk about the difference between fortran version, but isn't the linked compiler for FORTRAN 2003 and 2008, whereas R needs a FORTRAN 90 compiler?
R doesn't even work that well on Intel, at least in Ubuntu. Recompiling the package with AVX support often leads to a 30% performance increase on modern CPUs.
IMO the R base package should dynlink different shared libraries for different processors since vector extensions are mostly tailored to the kind of floating point numerical work that R does.
As a data scientist who is proficient in both Python and R ecosystems, in my opinion R/tidyverse is substantially better for ad hoc EDA and data visualization.
However, Python is better for nearly everything else in the field (namely, working with nontabular data, external APIs, deep learning, and productionization).
However, to me R appears like a little better Swiss Army Knife to do initial analysis. ggplot2, tidyverse, ...
R is far superior for interactive exploration/analysis and report writing. However Python is far superior if you are writing a program that does other things too.
My rule of thumb is that if a Python program is 70% or more Numpy/Pandas/Matplotlib etc then it should be R. Whereas an R program does comparatively little analysis and a lot of logic and integration, it should be Python. No one size fits all.
think of it like shell scripting for statistics, although not nearly as limited as bash is compared to other programming languages.
it works best if it's used semi-interactively, as a glue language between statistical packages which may be written in other languages. or to write simple "batch" scripts that basically just run a bunch of procedures in a row.
RStudio makes the whole experience much nicer in terms of plotting, and RMarkdown is great for preparing documents.
of course like shell scripting you can write fairly complicated programs in it, and sometimes people do, but due to backwards compatibility and weird design choices meant to make interactive use easier, programming "in the large" can get weird.
the analogy works for Python too -- it is definitely reasonable to use Python for shell scripting, but using Python interactively to pipe things from one program to another is slightly more frustrating than doing it in the shell, although might be preferred due to its other advantages.
In my experience I’ve seen R used in more exploratory/ad hoc type analysis and algorithm development by “non-developers”—-statisticians, scientists, etc. usually without performance consideration—-and that code is then turned into production code with the dev team using Python or C or something more performant or maintainable.
I work with people who mostly have a background in the social sciences or humanities and who work in R pretty much every day. They dont see themselves as programmers and Python is complete gibberish for them, while R just makes sense. When i meet people from other companies in roughly the same space (i work in healthcare doing data analysis), it's mostly the same. I actually meet more people who use SAS/SPSS than Python.
For data analysis, R is in my opinion better than Python. It's when you have to integrate it in existing workflows that Python quickly becomes a better choice.
For what it is worth (not at all clear), TIOBE ranked R as the 9th most popular programming language in the world this month: https://www.tiobe.com/tiobe-index/. For comparison, Python is ranked number 2.
Very popular in academia, moderately popular in industry when it comes to data science/analysis. In any case, very powerful while Python has certainly numerous advantages over it.
I reckon they'd finally have to get R working natively on the new chip. I don't foresee Apple offering the fat binary support in the long term. It's probably only an intermediate solution for the transitional period. Also, does it mean the native version of R will finally work on the iPad? I know Apple doesn't allow compilers but there are a few examples like Pythonista and Apple's own Swift playground. It'd be cool to get R Studio on the iPad.
Just to be clear, PPC-Intel fat/universal binaries are still supported even on Big Sur, the PPC portion is just ignored. I don't expect Intel-Arm binaries to go away any time soon.
I believe what you're really thinking of is Rosetta though. That, indeed, is sadly unlikely to be around forever. We have history as an indication of that.
When Apple transitioned from PowerPC to Intel the fat binary support (Rosetta) lasted 3 OS updates or about 3 years. Definitely won't be a super long term thing, but there's plenty of time I guess.
Does anybody know what the status is of Accelerate [1]? Is it implemented for Apple Silicon? Is it optimized for it? To me it seems very few people use this framework.
Similarly, Matlab is also not initially available for Apple Silicon natively now, and they are preparing an update to let Matlab run in Rosetta 2 instead, until the development cycle of native version completes.
Hi everyone, did any of you try out R or SPSS under a new M1 Macbook, do either of these work fine under Rosetta 2, as I suppose none has a native ARM version yet.
In addition, did anyone try CorelDraw as well?
I am asking these question, because I think a lot of us working in data science have second thoughts about moving to ARM, at least for the next year or so....
It sounds like R's design decision to use a non-standard NaN value to represent NA is an obscenely bad one. Wasn't it obvious that this would become a problem someday?
[+] [-] kxyvr|5 years ago|reply
[+] [-] roseway4|5 years ago|reply
https://developer.apple.com/documentation/accelerate/sparse_...
Accelerate is highly performant on Apple hardware (the current Intel arch). I expect Apple to ensure same for their M-series CPUs, potentially even taking advantage of the tensor and GPGPU capabilities available in the SoC.
[+] [-] bachmeier|5 years ago|reply
Your complaint is kind of strange. You're blaming "GPL restrictions" but the cost is for a commercial license.
[+] [-] neolog|5 years ago|reply
I'm curious do people in numerical specialties say "codes" (instead of "code")? I don't often hear it that way but I'm not in that specialty.
[+] [-] semi-extrinsic|5 years ago|reply
It can be compiled to use MKL, MUMPS, or SuiteSparse if available, but also has its own implementations. So you could easily use it as a wrapper to give you freedom to write code that you could compile on many targets with varying degree of library support.
[+] [-] gnufx|5 years ago|reply
[+] [-] jjgreen|5 years ago|reply
[+] [-] brundolf|5 years ago|reply
[+] [-] coldtea|5 years ago|reply
It will probably be ported though, if there's a demand...
[+] [-] hobby-coder-guy|5 years ago|reply
[deleted]
[+] [-] willglynn|5 years ago|reply
Hmm. ELF object files for Arm can represent this with build attributes [1]:
Seems like their code should be tagged Tag_ABI_FP_denormal = 1, Tag_ABI_FP_number_model = 3 if it were an ELF .o, .so, or executable, in which case <waves hands> some other part of the toolchain or system would automatically configure the floating point unit to provide the required behavior.Does Mach-O have a similar mechanism?
[1] https://github.com/ARM-software/abi-aa/blob/master/addenda32...
[+] [-] johncolanduoni|5 years ago|reply
[+] [-] tieze|5 years ago|reply
[+] [-] Dylan16807|5 years ago|reply
[+] [-] wodenokoto|5 years ago|reply
NA - Not available NaN - Not a number
See this short and concise article on the differences: https://jameshoward.us/2016/07/18/nan-versus-na-r/
[+] [-] BooneJS|5 years ago|reply
[+] [-] wodenokoto|5 years ago|reply
[+] [-] sgt|5 years ago|reply
[+] [-] pdpi|5 years ago|reply
[+] [-] CoolGuySteve|5 years ago|reply
IMO the R base package should dynlink different shared libraries for different processors since vector extensions are mostly tailored to the kind of floating point numerical work that R does.
[+] [-] analog31|5 years ago|reply
[+] [-] disgruntledphd2|5 years ago|reply
Edit: actually, looks like I'm wrong emghost appears to know more about this than me.
[+] [-] melling|5 years ago|reply
I started learning it because I want to make an attempt to do some projects on Kaggle. Most people use Pandas, Seaborn, etc, which I will also use.
However, to me R appears like a little better Swiss Army Knife to do initial analysis. ggplot2, tidyverse, ...
Any help leveling up would be appreciated.
[+] [-] minimaxir|5 years ago|reply
However, Python is better for nearly everything else in the field (namely, working with nontabular data, external APIs, deep learning, and productionization).
It's about knowing which tool to use.
[+] [-] kickout|5 years ago|reply
Millions and millions of users that have no idea what this blog post is technically about (but is interesting nonetheless)
[+] [-] goatinaboat|5 years ago|reply
R is far superior for interactive exploration/analysis and report writing. However Python is far superior if you are writing a program that does other things too.
My rule of thumb is that if a Python program is 70% or more Numpy/Pandas/Matplotlib etc then it should be R. Whereas an R program does comparatively little analysis and a lot of logic and integration, it should be Python. No one size fits all.
[+] [-] jhfdbkofdcho|5 years ago|reply
[+] [-] currymj|5 years ago|reply
it works best if it's used semi-interactively, as a glue language between statistical packages which may be written in other languages. or to write simple "batch" scripts that basically just run a bunch of procedures in a row.
RStudio makes the whole experience much nicer in terms of plotting, and RMarkdown is great for preparing documents.
of course like shell scripting you can write fairly complicated programs in it, and sometimes people do, but due to backwards compatibility and weird design choices meant to make interactive use easier, programming "in the large" can get weird.
the analogy works for Python too -- it is definitely reasonable to use Python for shell scripting, but using Python interactively to pipe things from one program to another is slightly more frustrating than doing it in the shell, although might be preferred due to its other advantages.
[+] [-] J253|5 years ago|reply
[+] [-] FranzFerdiNaN|5 years ago|reply
For data analysis, R is in my opinion better than Python. It's when you have to integrate it in existing workflows that Python quickly becomes a better choice.
[+] [-] williamstein|5 years ago|reply
[+] [-] ekianjo|5 years ago|reply
[+] [-] totalperspectiv|5 years ago|reply
[+] [-] coldtea|5 years ago|reply
Very popular. To the point of even having quite a lot of Microsoft support, lots of books, etc.
[+] [-] ryanar|5 years ago|reply
[+] [-] jp0d|5 years ago|reply
[+] [-] Wowfunhappy|5 years ago|reply
I believe what you're really thinking of is Rosetta though. That, indeed, is sadly unlikely to be around forever. We have history as an indication of that.
[+] [-] RandallBrown|5 years ago|reply
[+] [-] FullyFunctional|5 years ago|reply
[+] [-] Will_Do|5 years ago|reply
Previous benchmarks[0] show that the overhead on Intel Macbooks for the Docker Linux VM is quite low for scientific computing.
Would the x86 emulation hurt performance substantially or is there some other issue with this approach?
[0]: https://lemire.me/blog/2020/06/19/computational-overhead-due...
[+] [-] stabbles|5 years ago|reply
[1] https://developer.apple.com/documentation/accelerate
[+] [-] stevefan1999|5 years ago|reply
[+] [-] istvan60|5 years ago|reply
In addition, did anyone try CorelDraw as well?
I am asking these question, because I think a lot of us working in data science have second thoughts about moving to ARM, at least for the next year or so....
[+] [-] gnufx|5 years ago|reply
[+] [-] gok|5 years ago|reply
[+] [-] olliej|5 years ago|reply
[+] [-] ineedasername|5 years ago|reply
[+] [-] motorbreath|5 years ago|reply
[+] [-] superbatfish|5 years ago|reply
[+] [-] sbassi|5 years ago|reply