lausiant's comments

lausiant | 8 years ago | on: Why isn't everything normally distributed?

I could probably be characterized as a social scientist, at least a behavioral scientist.

What you're saying is probably part of it, although in my experience that criticism can be leveled as much, if not more, at wet-lab-type biologists who eschew all but the most minimal stats.

With the social sciences, though, there's another phenomenon at play, which is that the phenomena are so abstract often that there's not really a good theoretical reason to assume anything in particular. And if that's the case, because the normal is the entropy-maximizing distribution, you're actually better off assuming that rather than some other distribution. You could also use nonparametric stats, but that has its own advantages and disadvantages.

Bias-variance dilemma and all that.

The truth is, it's hard to beat the normal even when it's wrong. And if you subscribe to the inferential philosophy that every model is wrong, you're better off being conservatively wrong, which implies a normal.

I'm not saying everything should be assumed to be normal. But unless things are (1) obviously super non-normal, or (2) you have some very strongly justified model that produces a non-normal distribution, you're probably best off using a normal if you're going to go parametric. And I think those two conditions are met much more often than we like to admit.

The normal distribution is kind of over-maligned, I think. I started my stats career being enamoured of rigorously nonparametric stats, and still am (esp. exact tests, bootstrapping/permutation-based inference, and empirical likelihood), but have grown to strongly appreciate normal distributions (or whatever maxent distribution is appropriate).

lausiant | 8 years ago | on: Back to the Future: Lisp as a Base for a Statistical Computing System [pdf]

This thread is fascinating to me as someone who learned stats as an undergrad using Lisp, and then learned S, and then learned about and starting using R, and read all of these forewarnings by Ihaka and Tierney.

The current era is both exciting and dispiriting from this perspective. It seems like there's a lot of traction in this area with languages like Julia, OCaml, Nim, to name just a few, which is wonderful. The discussions here are great in this regard. However, it's somewhat frustrating that warnings like the linked piece -- that have been around for a long time -- seem to have been ignored. My personal experience, too, is that many of the claims of recent languages regarding "C-like" speed are maybe overstated; my sense is that the slowest toy benchmarks are accurate reflections of the bottlenecks that will slow down a large program. For small programs, they are C-like; as program/library length increases, you start to approach Python or R speeds, which makes me wonder if it's better to just use C to begin with.

Relying on wrapped C, like in Numpy, is also misleading because eventually you end up bumping up against part of the program that can't be pushed down into lower-level code.

I often wish that x-lisp-stat had taken off rather than R. I love both in their syntax, but in retrospect, it seemed like there was a fork in the path of numerical computing, either toward a more "native" approach, the other toward a model where a high-level language is used to interface with a low-level language, to abstract away some of the complexity. I understand the rationale for both, but kinda feel like the latter approach, which has become more dominant, isn't really sustainable. Moreover, this is all occurring while I've watched C++ become impressively abstracted--if things like Eigen had been around earlier, I'm not sure Python or R would have ascended as much as they have.

The "new" issue that seems to be arising all the time in these discussions is parallel GC models and implementations. Not sure where this will all lead. If it's lisp I'm going to spit out my drink.