top | item 45569046

(no title)

owlbite | 4 months ago

What I suspect he really means is that FORTRAN lays out its arrays column-major, whilst C choose row-major. Historically most math software was written in the former, including the de facto standard BLAS and LAPACK APIs used for most linear algebra. Mix-and-matching memory layouts is a recipe for confusion and bugs, so "mathematicians" (which I'll read as people writing a lot of non-ML matrix-related code) tend to prefer to stick with column major.

Of course things have moved on since then and a lot of software these days is written in languages that inherited their array ordering from C, leading to much fun and confusion.

The other gotcha with a lot of these APIs is of course 0 vs 1-based array numbering.

discuss

Const-me|4 months ago

> is written in languages that inherited their array ordering from C

It’s not just C. Modern GPU hardware only supports row major memory layout for 2D and 3D textures (ignoring specialized layouts like swizzling and block compression but none of them are column major either). Modern image and video codecs only support row major layout for bitmaps.

bee_rider|4 months ago

The MKL blas/lapack implementation also provides the “cblas” interface (I’m sure most blas implementations do, I’m just familiar with MKL—BLIS seems quite willing to provide additional interfaces to I bet they provide it as well) which explicitly accepts arguments for row or column ordering.

Internally the matrix is tiled out anyway (for gemm at least) so column vs row ordering is probably a little less important nowadays (which isn’t to say it never matters).

owlbite|4 months ago

Oh yes, from an actual implementation POV you can just apply some transpose and ordering transforms to convert from row major to column major or vice-versa. cblas is pretty universal though I don't think any LAPACK C API ever gained as wide support for non column-major usage (and actually has some routines where you can't just pull transpose tricks for the transformation).

Certain layouts have performance advantages for certain operations on certain microarchitectures due to data access patterns (especially for level 2 BLAS), but that's largely irrelevant to historical discussion of the API's evolution.