top | item 14329167

Writing Performance Sensitive OCaml Code

131 points| sndean | 9 years ago |janestreet.github.io | reply

58 comments

[+] gsg|9 years ago|reply

This is about a decade old, I think. With a substantial optimisation suite added to ocamlopt since, some of it will be out of date. The advice on inlining in particular is obsolete.

[+] harrisi|9 years ago|reply

Hm, could you say why you think that? Looking at the github history it's hard to tell. The earliest history I can find for this page is from 2013[0], but it seems to be pretty clearly a transition and not the original writing of it. However, it's the only thing I can find in terms of aging the document.

[0]: https://github.com/janestreet/janestreet.github.com/commit/7...

[+] tempodox|9 years ago|reply

Even so, it still contains lots of helpful explanations and starting points for those who haven't learned to eat assembly for breakfast yet. Current OCaml docs and own exploration should do the rest.

[+] pkhagah|9 years ago|reply

> The OCaml garbage collector is a modern hybrid generational/incremental collector which outperforms hand-allocation in most cases. Unlike the Java GC, which gives GCs a bad name, the OCaml GC doesn't allocate huge amounts of memory at start-up, nor does it appear to have arbitrary fixed limits that need to be overridden by hand.

From the linked article. Can anyone confirm or deny this? I have hard time believing this statement.

[+] gsb|9 years ago|reply

As far as it goes I think it is a pretty fair statement. However it is important to know of other decisions in the language which make it easier for the GC. Particularly, the use of tagged values and a GIL. Tagging makes it simpler to distinguish heap values from other values, and the GIL means that the GC does not need to operate concurrently.

[+] jblow|9 years ago|reply

It is simply not true unless you have a pathological idea of "hand-allocation" (which to be fair, some programmers do program like).

Let me put it this way ... all "garbage collection is fast" claims are saying the following thing:

"It is faster for the programmer to destroy information about his program's memory use (by not putting that information into the program), and to have the runtime system dynamically rediscover that information via a constantly-running global search and then use what it gleans to somehow be fast, than it is for the programmer to just exploit the information that he already knows."

It sure sounds like nonsense to me.

[+] qznc|9 years ago|reply

Java gave GCs a bad name? It made them mainstream!

[+] mcguire|9 years ago|reply

Let's break it down. Hopefully someone can fill in the parts I don't know.

> The OCaml garbage collector is a modern hybrid generational/incremental collector which outperforms hand-allocation in most cases.

Ocaml is mostly functional and likes to allocate many short lived objects. With enough memory, a moving collector is very good at handling that load.

> Unlike the Java GC, which gives GCs a bad name, the OCaml GC doesn't allocate huge amounts of memory at start-up, nor does it appear to have arbitrary fixed limits that need to be overridden by hand.

Java's GC (-s; there's a bunch) use a heap with a fixed maximum size and an initial allocation. It has also received much more research attention over the decades. It also has a lot of knobs to fiddle with.

I've run production Java web app servers with multi-gig heaps where almost all requests were handled in the young generation. The knobs and visibility were very nice.

Ocaml doesn't use a fixed size heap, so it can conceivably take over all of memory. It also doesn't have all of the knobs. But it works pretty well.

[+] joshmarlow|9 years ago|reply

Jane Street are big players in the OCaml community (they build/maintain what's arguably the defacto standard library for the language - Jane Street Core) and apparently make a lot of money being good at what they do; so I'm inclined to believe them.

[+] AlphaSite|9 years ago|reply

It may be the but from a couple of talks I've gotten they go to pretty extreme lengths to maintain that performance.

[+] Cieplak|9 years ago|reply

@pron probably has the answer to this

[+] searealist|9 years ago|reply

It's nonsense, the only way to make a GC fast is to use many times as much memory as manual management (about 6x to reach parity[1]).

[1] https://people.cs.umass.edu/~emery/pubs/04-17.pdf

[+] mcguire|9 years ago|reply

"In the file called test_asm.ml the function f

    let f a = a+1

"The generated asm looks like the following:"

      camlTest_asm__f_33700:
      .L119:
          addq    $2, %rax
          ret

You sure about that? $2?

[+] brilee|9 years ago|reply

From the article:

Unlike floats, ints are represented as immediates, i.e., they are not allocated on the heap. To tag ints OCaml steals the LSB bit of an int and sets it to 1 always. This means that we get 63 bits of precision in OCaml ints and any int value n will appear in the assembly as 2*n+1 (the LSB is always 1 and the bits of the number shifted one place in).

[+] mcguire|9 years ago|reply

"Unlike floats, ints are represented as immediates, i.e., they are not allocated on the heap. To tag ints OCaml steals the LSB bit of an int and sets it to 1 always. This means that we get 63 bits of precision in OCaml ints and any int value n will appear in the assembly as 2n+1 (the LSB is always 1 and the bits of the number shifted one place in).*"

Crap. Forgot about the tag bit. :-)

[+] gcc_programmer|9 years ago|reply

In the first code snippet, they havr add $2, %rax but the function actually adds 1 to its argument.

[+] thelema314|9 years ago|reply

This is correct; OCaml integers are tagged with their LSB being 1 to indicate they're not a pointer. This means that the integer n is stored as 2n+1, and adding 1 requires modifying the stored value by 2.

[+] unknown|9 years ago|reply

[deleted]

[+] jackmott|9 years ago|reply

anyone that can explain or refer to explanations of why ocaml boxes floats?

[+] rbehrends|9 years ago|reply

OCaml boxes pretty much everything that isn't an integer or a reference (though if you're using arrays/records of floats, those don't suffer an additional level of boxing).

The result is that any type can be represented in a single machine word, making it easy to support polymorphism without specializing code (allowing, e.g., for shared generics).

Keep in mind that OCaml's backend was designed decades ago when machines were much smaller, at a time when C++ was still reluctant to add generics with specialization because of code bloat.

[+] tempodox|9 years ago|reply

OCaml GC uses bit 0 of a machine word as a flag that the address in that word was allocated under GC. Hence, integers only have 63 bits for the value, and bit 0 saying that this is not a GC-able address. IEEE doubles, however, need all 64 bits. You can't chop off bit 0 for a different use.

[+] thedufer|9 years ago|reply

The section "OCaml Blocks and Values" at https://realworldocaml.org/v1/en/html/memory-representation-... is a good explanation.

[+] mookid11|9 years ago|reply

To allow IEEE compliant float with tagged pointers.

However, note that floats are not boxed in the OCaml's arrays. Boxing floats is a pain because it makes the price of defining subroutines taking float arguments high.

[+] unknown|9 years ago|reply

[deleted]

[+] chimtim|9 years ago|reply

re-write it in C++