(no title)
smarks | 1 year ago
One of the issues we discussed is trying to do this with Java arrays or collections. As he observes, Java arrays have a maximum size of 2^31 - 1 elements, because they're indexed by a signed 32-bit int. In practice, the Hotspot JVM has a maximum array size of 2^31 - 2 elements. (I'm not entirely sure why this is. I'd guess that it's because the various GC implementations have a maximum memory block size of 2^31 words, and two words are consumed by the object header.)
The author considered trying to store 10B elements in a Java Collection, but there are some roadblocks that make this either difficult or insurmountable.
A Java ArrayList stores all its elements in a single array; thus it has the same limitation as Java's arrays. This is a pretty hard limit.
A Java LinkedList can actually store more than 2^31 elements. It's a non-intrusive linked list, so each element has a separate list node, which is another Java object. Each list node object consumes 40 bytes (including the object header and fields) which means that storing two billion elements in a LinkedList has 80GB of overhead, not including the actual data elements. (And ten billion elements will require 400GB of memory for overhead.) This will probably work if you have enough memory, but this amount of overhead seems prohibitive.
The author instead used the `fastutil` library to work around these limitations.
Pretty impressive that Pharo Smalltalk is able to do this.
No comments yet.