(no title)
jp1016 | 8 days ago
The big-endian naming convention (source_index, target_index instead of index_source, index_target) is also interesting. It means related variables sort together lexicographically, which helps with grep and IDE autocomplete. Small thing but it adds up when you're reading unfamiliar code.
One thing I'd add: this convention is especially valuable during code review. When every variable that represents a byte quantity ends in _size and every item count ends in _count, a reviewer can spot dimensional mismatches almost mechanically without having to load the full algorithm into their head.
Shish2k|8 days ago
At that point I'd rather make them separate data types, and have the compiler spot mismatches actually-mechanically o.o
jkaptur|8 days ago
layla5alive|8 days ago
u32AllocSumSizeInBytes
f32AllocAvgSizeInKBytes
Its easy to code review the interaction between these - the valid code almost writes itself from the names - lots of operations will just obviously read as wrong in ways that they may not with i.e. num_allocs, total_alloc and avg_alloc, or num, total, avg.
I find it useful to use statically typed variable names. My mental syntax checker benefits from the extra type metadata being embedded at each usage, reducing mental indirections/my reliance on the larger context and enabling me to quickly detect many types of mistakes via more efficient local reasoning.
layer8|8 days ago
I would call it “English naming” [0], it’s just more readable to start with, in an anglophone environment.
[0] as opposed to “naming, English”, I suppose ;)
zahlman|8 days ago
For that matter, many languages, especially "object-oriented" ones, treat heterogeneous containers as the default. They might not even offer native containers that can store everything inline in a single contiguous allocation, except perhaps for strings. In which case, "number of bytes" is itself ambiguous; are you including the indirected objects or not?
"Count" is also overloaded — it commonly means, and I normally only understand it to mean, the number of elements in a collection meeting some condition. Hence the `.count` method of Python sequences, as well as the jargon "population count" referring to the number of set bits in an integer. Today, Python's integers have both a `.bit_count` and a `.bit_length`, and it's obvious what both of them do; calling either `.bit_size` would be confusing in my mental framework, and a contradiction in terms in the OP's.
I would disagree that even C's `strlen` refers to byte size. C comes from a pre-Unicode world; the type is called `char` because that was naively considered sufficient at the time to represent a text character. (Unicode is still in that sense naive, but it at least allows for systems that are acutely aware of the distinction between "characters" and graphemes.) But notice: C's "strings" aren't proper objects; they're null-terminated sequences, i.e. their length is signaled in-band. So that metadata is also just part of the data, in a single allocation with no indirection; the "size" of a string could only reasonably be interpreted to include that null terminator. Yet the result of `strlen` excludes it! Further, if `strlen` is used on a string that was placed within some allocated buffer, it knows nothing about that buffer.
(Similarly, Rust `str::len` is properly named by this scheme. It gives the number of valid 1-byte-sized elements in a collection, not the byte size of the buffer they're stored within. It's still ambiguous in a sense, but that's because of the convention of using UTF-8 to create an abstraction of "character" elements of non-uniform size. This kind of ambiguity is properly resolved either with iterators, like the `Chars` iterator in Rust, or with views.)
Also consider: C has a `sizeof` operator, influencing Python's `.__sizeof__()` methods. That's because the concept of "size" equally makes sense for non-sequences; neither "count" nor "length" does. So of course "length" cannot mean what the author calls "size".
maleldil|8 days ago
akst|8 days ago