Seriously, it all depends on whether u're counting the items themselves (1-based) or the spaces btwn them (0-based).
The former uses natural numbers, while the latter uses non-negative integers
For instance, when dealing with memory words, do u address the word itself or its starting location (the first byte)?
The same consideration applies to coordinate systems: r u positioning urself at the center of a pixel or at the pixel's origin?
Almost 2 decades ago I designed an hierarchical IDs system (similar to OIDs[1]). First I made it 0-based for each index in the path. After a while I understood that I need to be able to represent invalid/dummy IDs also, so I used -1 for that. It was ugly - so I made it 1-based, & any 0 index would've made the entire ID - invalid
Zero is a natural number. It is in the axioms of Peano arithmetic, and any other definition is just teachers choosing a taxonomy that best fits their lesson.
It's funny because pi is the joke compromise between 0 and tau, the circumference of the unit circle, and the period length of the trigonometric functions.
Actually, the duality arises from counting (there can be 0 items) and ordering (there is only a 1st item), conceptually. Which is why the year 2000 can and cannot be the start of the 3rd millenium, for instance.
That makes sense for the ID system or a database, but for arrays in a language I still prefer starting at 0. It makes frame buffers easier easier to index
The center of the debate is that outside of pure mathematics numbers and number systems can only be signifiers for some physical or conceptual object. It is the signified object that determines the meaning of the number and the semantics of the mathematics.
I totally disagree, but it's only my opinion and probably not scientific at all.
From a logical point of view I think it's totally unnatural to start at 1. You have 10 diffferent "chars" available in the decimal system. Starting at 1 mostly leads to counting up to 10. But 10 is already the next "iteration". How do you explain a kid, who's learning arithmetics, that the decimal system is based on 10 numbers and at the same time you always refer to this list from 1 to 10.
Python supports a third way: start at -1. And if you think about it a little (but not too much) then there's some real appeal to it in C. If you allocate an array of length n and store its length and rewrite the pointer with (*a+=n)=n, then a[0] is the length, a[-1] is the first element (etc) and you free(a-a[0]) when you're done. As a nice side effect, certain tracing garbage collectors will never touch arrays stored in this manner.
Upshot: if you take the above seriously (proposed by Kelly Boothby), the median proposed by Kelly-Bootle returns to the only sensible choice: zero.
He was right. If the first fencepost is centered at x=0 and the second at x=1, and you want to give the rail in-between some identifier that corresponds to its position (as opposed to giviung it a UUID or calling it "Sam" or something), 0.5 makes perfect sense.
In computer programming we often only need the position of the gap to the left, though, so calling it "the rail that starts at x=0" works. Calling it "the rail that ends at x=1" is alright, I guess, if that's what you really want, but leads to more minus ones when you have to sum collections of things.
Perhaps ideally we'd change English to count the "first" entry in a sequence as the "zeroth" item, but the path dependency and the effort required to do that is rather large to say the least.
At least we're not stuck with the Roman "inclusive counting" system that included one extra number in ranges* so that e.g. weeks have "8" days and Sunday is two days before Monday since Monday is itself included in the count.
> At least we're not stuck with the Roman "inclusive counting" system that included one extra number in ranges* so that e.g. weeks have "8" days and Sunday is two days before Monday since Monday is itself included in the count.
Yes, we are. C gives pointers one past the end of an array meaningful semantics.
That's in the standard. You can compare them and operate on them but not de-reference them.
Amusingly, you're not allowed to go one off the end at the beginning of a C or C++ array. (Although Numerical Recipes in C did it to mimic FORTRAN indices.) So reverse iterators in C++ are not like forward iterators. They're off by 1.
Note that 'first' and 'second' are not etymologically related to one or two, but to 'foremost'. Therefore, it is would make sense to use this sequence of ordinals:
In terms of another thread the item is the "rail" between the "fence posts". The address of the 'first' item starts at 0, but it isn't complete until you've reached the 1.
Where is the first item? Slot 0. How much space does one item take up* (ignoring administrative overheads)? The first and only item takes up 1 space.
The 1980s were not a particularly enlightened time for programming language design; and Dijkstra's opinions seem to carry extra weight mainly because his name has a certain shock and awe factor.
It isn't usual for me to agree with the mathematical convention for notations, but the 1st element of a sequence being denoted with a "1" just seems obviously superior. I'm sure there is a culture that counts their first finger as 0 and I expect they're mocked mercilessly for it by all their neighbours. I've been programming for too long to appreciate it myself, but always assumed it traces back to memory offsets in an array rather than any principled stance because 0-counting sequences represents a crazy choice.
I've heard the statement "Let's just see if starting with 0 or 1 makes the equations and explanations prettier" quite a few times. For example, a sequence <x, f(x), f(f(x)), ...> is easier to look at if a_0 has f applied 0 times, a_1 has f applied 1 time, and so on.
0-based indexing aligns better with how memory actually works, and is therefore more performant, all things being equal.
Assuming `a` is the address of the beginning of the array, the 0-based indexing on the left is equivalent to the memory access on the right (I'm using C syntax here):
> The 1980s were not a particularly enlightened time for programming language design; and Dijkstra's opinions seem to carry extra weight mainly because his name has a certain shock and awe factor.
Zero based indexing had nothing to do with Dijkstra's opinion but the practical realities of hardware, memory addressing and assembly programming.
> I'm sure there is a culture that counts their first finger as 0
Not a one because zero as a concept was discovered many millenia after humans began counting.
For math too, 0-based indexing is superior. When taking sub-matrices (blocks), with 1-based indexing you have to deal with + 1 and - 1 terms for the element indices. E.g. the third size-4 block of a 16x16 matrix begins at (3-1)*4+1 in 1-based indexing, at 2*4 in 0-based indexing (where the 2 is naturally the 0-indexed block index).
Also, the origin is at 0, not at 1. If you begin at 1, you've already moved some distance away from the origin at the start.
Pretty much any algorithm that involves mul/div/mod operations on array indexes will naturally use 0-based indexes (i.e. if using 1-based indexes they will have to be converted to/from 0-based to make the math work).
To me this is a far more compelling argument for 0-based indexes than anything I've seen in favor of 1-based indexes.
Both are fine, IMO. In a context where array indexing is pointer plus offset, zero indexing makes a lot of sense, but in a higher level language either is fine. I worked in SmallTalk for a while, which is one indexed, and sometimes it made things easier and sometimes it was a bit inconvenient. It evens out in the end. Besides, in a high level language, manually futzing around with indexing is frequently a code smell; I feel you generally want to use higher level constructs in most cases.
I've always appreciated Ada's approach to arrays. You can create array types and specify both the type of the values and of the index. If zero based makes sense for your use, use that, if something else makes sense use that.
e.g.
type Index is range 1 .. 5;
type My_Int_Array is
array (Index) of My_Int;
It made life pretty nice when working in SPARK if you defined suitable range types for indexes. The proof steps were generally much easier and frequently automatically handled.
Many BASIC dialects had this too, which could make some code a bit easier to read e.g.
DIM X(5 TO 10) AS INTEGER
I recall in one program I made the array indices (-1 TO 0) so I could alternate indexing them with the NOT operator (in QuickBASIC there were only bitwise logical operators).
On the other hand, if you receive an unconstrained array argument (such as S : String, which is an array (Positive range <>) of Character underneath), you are expected to access its elements like this:
S (S'First), S (S'First + 1), S (S'First + 2), …, S (S'Last)
If you write S (1) etc. instead, the code is less general and will only work for subarrays that start at the first element of the underlying array.
So effectively, indexing is zero-based for most code.
I think lower..higher index ranges for arrays were used in Algol-68, PL/1, and Pascal long before Ada
At least in standard Pascal arrays with different index ranges were of different incompatible types, so it was hard to write reusable code, like sort or binary search. The solution was either parameterized types or proprietary language extensions
I found it devastating that there are no distinct agreed-upon words denoting zero- and one-based addressing. Initially I thought that the word "index" clearly denotes zero-base, and for one-base there is "order", "position", "rank" or some other word, but after rather painful and humiliating research I stood corrected. ("Index" is really used in both meanings, and without prior knowledge of the context, there is really no way to tell what base it refers to.)
So to be clear, we have to tediously specify "z e r o - b a s e d " or "o n e - b a s e d" every single time to avoid confusion. (Is there a chance for creating some new, terser consensus here?)
I don't think "index" by itself should imply any starting value. After all many induces start at higher numbers and then you'd have to invent words for 2-based, 3-based and so on as well.
This article is one my pet peeves. It always shows up in discussions as "proof" that 0 indexing is superior, but it hides under the carpet all the cases the where it is not. For instance, backwards iteration needs a "-1" and breaks with unsigned ints.
Always beware the word should. I agree with Dijkstra's logic in the context that he presents it, but there are other contexts where I don't think it applies.
Personally, I find that in compiler writing, which is the only programming I do these days, the only things I use indexes for are line numbers and character offsets into strings. Calling the first character the zeroth character is ridiculous to me, so I just store a leading 0 byte in all strings and then can use one based indexing with no performance hit. Alternatively, since I am the compiler writer, I could just internally store the pointer to the string - 1 to avoid the < 1 byte per string average overhead (I also machine word align strings so often the leading zero doesn't affect the string size in memory).
If you are often directly working with array indices, you are likely doing low level programming. It is worth asking if the task at hand requires that, or if you would be better off using higher level constructs and/or a higher level language. Low level details ideally should not leak into the application level.
Not only is this preference not restricted to compiler programming, but it's not even restricted to programming.
Try to count 4 seconds, if you start at 1 you messed up.
Babies start at 0 years old. Etc..
I do agree it's a convention though. Months and years start at 1, but especially for years, only intervals are meaningful, so it doesn't really matter what zero is (even though christ is totally king)
1-based numbering is nonsense. How many years old are you when you’re born?
I notice almost all defenses of 1-based indexing are purely based on arbitrary cultural factors or historical conventions, e.g. “that’s how it’s done in math”, rather than logical arguments.
You have lived zero full years and are in the first year of your life. In most (but not all) countries the former is considered "your age".
That's consistent with both zero-based and one-based indexing. Both agree on cardinal numbers (an array [1, 2] has length 2), just not on ordinal numbers (whether the 1 in that array is the "first" or "zeroth" element).
> I notice almost all defenses of 1-based indexing are purely based on arbitrary cultural factors or historical conventions, e.g. “that’s how it’s done in math”, rather than logical arguments.
I think it's largely a matter of taste in either direction. But, I'd raise this challenge:
If you're unfamiliar with Python (zero-based, half-open ranges exclusive of end), that's taking a slice from index 3 to index 1, backwards (step -1). How quickly can you intuit what it'll print?
Personally I feel like I have to go through a few steps of reasoning to reach the right answer - even despite having almost exclusively used languages with 0-based indexing. If Python were instead to use MatLab-style indexing (one-based, inclusive ranges), I could immediately say ['C', 'B', 'A'].
Consider the following from the Irish Constitution:
> 12.4.1° Gach saoránach ag a bhfuil cúig bliana tríochad slán, is intofa chun oifig an Uachtaráin é.
and the official translation to English:
> 12.4.1° Every citizen who has reached his thirty-fifth year of age is eligible for election to the office of President.
For those unfortunate few who do not understand Irish, that version says "Every citizen who is at least thirty-five years old", whereas the translation should in principle (arguably) allow a thirty-four-year-old.
Luckily the Irish version takes precedence legally. A constitutional amendment which would have lowered the minimum age to 21 and resolved the discrepancy was inexplicably rejected by the electorate in 2015.
The question arises when people get confused between a cut and span. These two are opposite concepts, and they make up a continuum, and they define each other.
So, it depends on what you understand as "numbering". If it is about counting objects, the word "first object" refers to existence of non-zero number of objects. This shows why the first one can't be called as zero, as zero is not equal to non-zero.
If the numbering is about continuous scale such as tape measure, then the graduations can start with zero. But still the first meter refers to one meter, not zero meters.
It looks silly when people have their book chapters numbering to begin with zero. They have no clue whether the chapter refers to a span or a cut. Sure, they can call the top surface of their book cover as zero, though. But still they can't number a page as zero.
The use of zero index for memory location comes from possible magnetic states of array of bits. Each such state is a kind of a cut, not a span. It's like a graduation on the tape measure, or mile stone on the side of the road. So it can start with zero.
So, if you are counting markers or separators, with zero magnitude, you can start at zero. And when you count spans or things of non-zero magnitude, you start at one. If you count apples, start at one. If you count spaces between apples start at zero.
I see people bringing up arrays, and an array index is represented by a number, you can do math on it, but it's not a regular number for counting a sequence of items. It's a unique reference to a location in the memory, and it's dangerous to treat an array index like it's just any old number.
Behold, the really stupid things you can do in Javascript:
let myArr = [];
let index = 0;
myArr[--index] = 5;
console.log(myArr.length); // 0
console.log(myArr[index]); // 5
I recently picked up Lua for a toy project and I got to say that decades of training with 0-based indexes makes it hard for me to write correct lua code on the first try.
I suppose 1-based index is more logical, but decades of programming languages choosing 0-based index is hard to ignore.
> decades of programming languages choosing 0-based index is hard to ignore
Yes - many file formats also work with zero-based indices, not to mention the hardware itself.
One-based indexing is particularly problematic in Lua, because the language is designed to interoperate closely with C - so you're frequently switching between indexing schemes.
Interestingly, it also poses great challenges for LLMs. GPT-4 can translate Perl into Python almost flawlessly, but its Lua is full of off-by-one errors.
I have a similar experience with pythons negative-indexing. In Python, you can access elements counting from the back by using negative numbers. But for this, they start with 1, not 0. Which is inconsistent, as they start for the normal forward indexing at 0. I guess it comes from reducing n.length-1 to -1, but it's still kinda annoying to have two different indexing-systems at work.
I don't use Lua as much anymore, but there were a few years where I used Lua and C++ both daily and very quickly you can easily handle both zero and one-indexing, even while switching between languages frequently. As with most things it's just practice.
It is easy to miss that his argument boils down to that zero-based is “nicer” in a specific select case. The paper is written in the style of a mathematical proof but hinges on a completely subjective opinion.
>> when starting with subscript 1, the subscript range 1 ≤ i < N+1; starting with 0, however, gives the nicer range 0 ≤ i < N.
What about the range 0 < i ≤ N which starts with 1? Why only use ≤ on the lower end of the range? This zero-based vs one-based tends to come up in programming and mathematics, and both are used in both areas. Isn't it obvious that there is no universally correct way to index things?
I believe the main argument (from the OP) is that you have to specify the range with two bounds, and that it is common to want a 0 (assuming a 0-based indexing world), and so in order to refer to a range that includes index 0 you'll need to use a number that is not in the set of valid indexes to define the bound.
I would note that the argument is weakened when you look at the later bound, since you have the same problem there, it's just more subtle and less commonly encountered -- yet it routinely creates security bugs!
It's because we don't work with integers, we work with fixed-size intervals within the set of integers (usually a power of two consecutive integers). So `for (i = 0; i < 256; i++)` is just weird when you're using 8-bit integers: your upper range is an inexpressible value, and could easily be compiled down to `for (i = 0; i < 0; i++)` with two's complement, eg if you did `uint8_t upper = 256; for (uint8_t i = 0; i < upper; i++)`. That case is simple, but it gets nastier when you are trying to properly check for overflow in advance and the actual upper value is computed. `if (n >= LIMIT) { return error; }` doesn't work if your LIMIT is based on the representable range. Nor does `if (n * elementSize >= LIMIT) { return error; }`. Even doing `limit = LIMIT / elementSize; if (n >= limit) { return error; }` requires doing the `LIMIT / elementSize` intermediate calculation in larger-width numbers. (In addition to the off-by-one if LIMIT is not evenly divisible by elementSize.)
So when dealing with overflow checks, 0 ≤ i ≤ N may be better. Well, a little better. `for (i = 0; i <= LIMIT; i++)` could easily be an infinite loop if LIMIT is the largest in-domain value. You want `i = 0; while (true) do { ...stuff...; if (i == LIMIT) break; i++; }` and at that point, you've lost all simple correspondence with mathematical ranges.
> Isn't it obvious that there is no universally correct way to index things?
I don't know about "obvious", but I agree that there is no universally correct way to index things.
"The above has been triggered by a recent incident, when, in an emotional outburst, one of my mathematical colleagues at the University —not a computing scientist— accused a number of younger computing scientists of "pedantry" because —as they do by habit— they started numbering at zero. "
Same in Germany, just that we usually call it ground floor instead of 0th floor.
You could argue it's a bit of a translation error. The French and German words for floor are referring to ways to add platforms above ground. Either by referring to walls, wooden columns or floor joists. Over the course of language evolution those words have both broadened and specialized, referring to building levels in general. But the way they are counted still reflects that they originally refer to levels built above ground. The English "floor" on the other hand counts the number of levels that are ground-like, which naturally starts at the actual ground.
Zero means nothing (not that it has no importance :-) but that it symbolises the void). So the symbol 0 could be also a single space or any other predetermined. So, it is not a number and should not be used like one (pun intended)
Perhaps we can extend this to everyday language? Taylor Swift had a number zero hit, ones company, two's a crowd, I won the race and came in at number zero and so on?
I appreciate Dijkstra's arguments, but the fact remains that no non-technical user is ever going to jibe with a zero-indexed system, no matter the technical merits.
Languages aimed at casual audiences (e.g. scripting languages like Lua) should maybe just provide two different ways of indexing into arrays: an `offset` method that's zero-indexed, and an `item` method that's one-indexed. Let users pick, in a way that's mildly less confusing than languages that let you override the behavior of the indexing operator (an operator which really doesn't particularly need to exist in a world where iterators are commonplace).
Dijkstra's objective was to make programming into an intellectually respectable, rigorous branch of mathematics, a motivation he mentions obliquely here. He was, generally speaking, opposed to non-technical programmers and languages aimed at casual audiences, such as BASIC and APL; they were at best irrelevant to his goal and at worst (I suspect, though he never said) threats to its credibility.
Starting from zero saves memory.
If I have a variable used as an index for an array of 256 elements, starting from 0 allows me to store it in a single byte. If I start from 1, I need two bytes, effectively doubling the memory usage—an unnecessary 100% increase.
Now, multiply this inefficiency across every instance where programs encounter a similar situation.
This discrepancy appears in physics too. It's common to use 1,2,3 for spatial indices, but when you reach enlightenment and think in terms of spacetime you add a zero index and not a four.
That's only because you insist on explicitly mentioning the last element, which you can only do when the sequence is finite and non-empty (more generally, when it is indexed by a successor ordinal). So your choice of notation is not only inelegant, it cannot even express all possible sequences.
Hard ask considering there's effectively 2 Americas: only the scientific one using scaleable units like mg/g/kg, cm/m/km- everyone else using randomized trash... ft, mile, yard, inch, pound....
I know I'll get downvoted to Hell for this, but I have a mental list of traits poor programmer's have, and one of them is "Excessively complains about 1 based indexing".
nivertech|11 months ago
For instance, when dealing with memory words, do u address the word itself or its starting location (the first byte)?
The same consideration applies to coordinate systems: r u positioning urself at the center of a pixel or at the pixel's origin?
Almost 2 decades ago I designed an hierarchical IDs system (similar to OIDs[1]). First I made it 0-based for each index in the path. After a while I understood that I need to be able to represent invalid/dummy IDs also, so I used -1 for that. It was ugly - so I made it 1-based, & any 0 index would've made the entire ID - invalid
---
1. https://en.wikipedia.org/wiki/Object_identifier
umanwizard|11 months ago
jonahx|11 months ago
I've never heard the distinction stated this way. It's clarifying.
jacksnipe|11 months ago
bmacho|11 months ago
pfortuny|11 months ago
01HNNWZ0MV43FF|11 months ago
littlestymaar|11 months ago
In fact “natural numbers” is ambiguous, as it can both contain zero or exclude it depending on who uses it.
See https://en.m.wikipedia.org/wiki/Natural_number
throwway120385|11 months ago
y42|11 months ago
From a logical point of view I think it's totally unnatural to start at 1. You have 10 diffferent "chars" available in the decimal system. Starting at 1 mostly leads to counting up to 10. But 10 is already the next "iteration". How do you explain a kid, who's learning arithmetics, that the decimal system is based on 10 numbers and at the same time you always refer to this list from 1 to 10.
pansa2|11 months ago
boothby|11 months ago
Upshot: if you take the above seriously (proposed by Kelly Boothby), the median proposed by Kelly-Bootle returns to the only sensible choice: zero.
TOGoS|11 months ago
In computer programming we often only need the position of the gap to the left, though, so calling it "the rail that starts at x=0" works. Calling it "the rail that ends at x=1" is alright, I guess, if that's what you really want, but leads to more minus ones when you have to sum collections of things.
kps|11 months ago
Aransentin|11 months ago
At least we're not stuck with the Roman "inclusive counting" system that included one extra number in ranges* so that e.g. weeks have "8" days and Sunday is two days before Monday since Monday is itself included in the count.
* https://en.wikipedia.org/wiki/Counting#Inclusive_counting
davidgay|11 months ago
French (and likely other Latin languages?) are not quite so lucky. "En 8" means in a week, "une quinzaine" (from 15) means two weeks...
01HNNWZ0MV43FF|11 months ago
Animats|11 months ago
Yes, we are. C gives pointers one past the end of an array meaningful semantics. That's in the standard. You can compare them and operate on them but not de-reference them.
Amusingly, you're not allowed to go one off the end at the beginning of a C or C++ array. (Although Numerical Recipes in C did it to mimic FORTRAN indices.) So reverse iterators in C++ are not like forward iterators. They're off by 1.
[1] https://devblogs.microsoft.com/oldnewthing/20211112-00/?p=10...
zajio1am|11 months ago
first, second, twoth, third, fourth, ...
or shortened:
0st, 1nd, 2th, 3th, 4th ...
mjevans|11 months ago
Where is the first item? Slot 0. How much space does one item take up* (ignoring administrative overheads)? The first and only item takes up 1 space.
windward|11 months ago
spookie|11 months ago
roenxi|11 months ago
It isn't usual for me to agree with the mathematical convention for notations, but the 1st element of a sequence being denoted with a "1" just seems obviously superior. I'm sure there is a culture that counts their first finger as 0 and I expect they're mocked mercilessly for it by all their neighbours. I've been programming for too long to appreciate it myself, but always assumed it traces back to memory offsets in an array rather than any principled stance because 0-counting sequences represents a crazy choice.
tetha|11 months ago
branko_d|11 months ago
Assuming `a` is the address of the beginning of the array, the 0-based indexing on the left is equivalent to the memory access on the right (I'm using C syntax here):
For 1-based indexing: This extra "-1" costs some performance (through it can be optimized-away in some cases).pwdisswordfishz|11 months ago
So you claim this is just an appeal to authority and as a rebuttal you give appeal to emotion without being an authority at all?
> the 1st element of a sequence being denoted with a "1" just seems obviously superior
> I'm sure there is a culture that counts their first finger as 0 and I expect they're mocked mercilessly for it by all their neighbours
> 0-counting sequences represents a crazy choice
5G chess move.
personalaccount|11 months ago
Zero based indexing had nothing to do with Dijkstra's opinion but the practical realities of hardware, memory addressing and assembly programming.
> I'm sure there is a culture that counts their first finger as 0
Not a one because zero as a concept was discovered many millenia after humans began counting.
Aardwolf|11 months ago
Also, the origin is at 0, not at 1. If you begin at 1, you've already moved some distance away from the origin at the start.
sham1|11 months ago
I'll concede that it's not all that significant as a difference, but at least IMO it's nicer.
Also could argue that modular arithmetic and zero-based indexing makes more sense for negative indexing.
xandrius|11 months ago
silotis|11 months ago
To me this is a far more compelling argument for 0-based indexes than anything I've seen in favor of 1-based indexes.
pwdisswordfishz|11 months ago
arnsholt|11 months ago
noneeeed|11 months ago
e.g.
It made life pretty nice when working in SPARK if you defined suitable range types for indexes. The proof steps were generally much easier and frequently automatically handled.tdeck|11 months ago
fweimer|11 months ago
So effectively, indexing is zero-based for most code.
nivertech|11 months ago
At least in standard Pascal arrays with different index ranges were of different incompatible types, so it was hard to write reusable code, like sort or binary search. The solution was either parameterized types or proprietary language extensions
ninalanyon|11 months ago
myfonj|11 months ago
So to be clear, we have to tediously specify "z e r o - b a s e d " or "o n e - b a s e d" every single time to avoid confusion. (Is there a chance for creating some new, terser consensus here?)
pansa2|11 months ago
aero142|11 months ago
xandrius|11 months ago
- Offset: 0-based
- Index: 1-based
Rygian|11 months ago
I can suggest "z8d" and "o7d" otherwise. (/jk)
weinzierl|11 months ago
otikik|11 months ago
scoofy|11 months ago
lupire|11 months ago
ufo|11 months ago
6yyyyyy|11 months ago
norir|11 months ago
Personally, I find that in compiler writing, which is the only programming I do these days, the only things I use indexes for are line numbers and character offsets into strings. Calling the first character the zeroth character is ridiculous to me, so I just store a leading 0 byte in all strings and then can use one based indexing with no performance hit. Alternatively, since I am the compiler writer, I could just internally store the pointer to the string - 1 to avoid the < 1 byte per string average overhead (I also machine word align strings so often the leading zero doesn't affect the string size in memory).
If you are often directly working with array indices, you are likely doing low level programming. It is worth asking if the task at hand requires that, or if you would be better off using higher level constructs and/or a higher level language. Low level details ideally should not leak into the application level.
TZubiri|11 months ago
Try to count 4 seconds, if you start at 1 you messed up. Babies start at 0 years old. Etc..
I do agree it's a convention though. Months and years start at 1, but especially for years, only intervals are meaningful, so it doesn't really matter what zero is (even though christ is totally king)
krukah|11 months ago
Start from 0 if you are counting boundaries (fenceposts, memory addresses)
Start from 1 if you are counting spaces (pages in a book, ordinals)
Floors are a case where both make intuitive sense, which is maybe how we ended up with European vs American floor numbering.
IshKebab|11 months ago
* Start from 0 if you are indexing. I.e. you are identifying an item or its position.
* Start from 1 if you are counting. I.e. you are saying how many items there are.
It doesn't matter what it is. I don't know why you think pages in a book are somehow different to memory addresses.
yazantapuz|11 months ago
laurentlb|11 months ago
personalityson|11 months ago
The mark on your measure tape corresponds to the total sum/amount.
If you count from zero, number of elements no longer corresponds to the size/length. So already here you deviate away from your tape principle.
You have one whole element when your tape measure shows 1, not zero.
kalb_almas|11 months ago
umanwizard|11 months ago
I notice almost all defenses of 1-based indexing are purely based on arbitrary cultural factors or historical conventions, e.g. “that’s how it’s done in math”, rather than logical arguments.
Ukv|11 months ago
You have lived zero full years and are in the first year of your life. In most (but not all) countries the former is considered "your age".
That's consistent with both zero-based and one-based indexing. Both agree on cardinal numbers (an array [1, 2] has length 2), just not on ordinal numbers (whether the 1 in that array is the "first" or "zeroth" element).
> I notice almost all defenses of 1-based indexing are purely based on arbitrary cultural factors or historical conventions, e.g. “that’s how it’s done in math”, rather than logical arguments.
I think it's largely a matter of taste in either direction. But, I'd raise this challenge:
If you're unfamiliar with Python (zero-based, half-open ranges exclusive of end), that's taking a slice from index 3 to index 1, backwards (step -1). How quickly can you intuit what it'll print?Personally I feel like I have to go through a few steps of reasoning to reach the right answer - even despite having almost exclusively used languages with 0-based indexing. If Python were instead to use MatLab-style indexing (one-based, inclusive ranges), I could immediately say ['C', 'B', 'A'].
Y_Y|11 months ago
Consider the following from the Irish Constitution:
> 12.4.1° Gach saoránach ag a bhfuil cúig bliana tríochad slán, is intofa chun oifig an Uachtaráin é.
and the official translation to English:
> 12.4.1° Every citizen who has reached his thirty-fifth year of age is eligible for election to the office of President.
For those unfortunate few who do not understand Irish, that version says "Every citizen who is at least thirty-five years old", whereas the translation should in principle (arguably) allow a thirty-four-year-old.
Luckily the Irish version takes precedence legally. A constitutional amendment which would have lowered the minimum age to 21 and resolved the discrepancy was inexplicably rejected by the electorate in 2015.
nkrisc|11 months ago
Typically 3 months shy of 1 year, so about 0.75.
hans_castorp|11 months ago
pasc1878|11 months ago
personalityson|11 months ago
If it helps: you have one whole array element only if one year has passed
What are you trying to do: do you want to know where each element starts, or do you want to measure the total sum/accumulated amount?
sunflowerfly|11 months ago
zkmon|11 months ago
So, it depends on what you understand as "numbering". If it is about counting objects, the word "first object" refers to existence of non-zero number of objects. This shows why the first one can't be called as zero, as zero is not equal to non-zero.
If the numbering is about continuous scale such as tape measure, then the graduations can start with zero. But still the first meter refers to one meter, not zero meters.
It looks silly when people have their book chapters numbering to begin with zero. They have no clue whether the chapter refers to a span or a cut. Sure, they can call the top surface of their book cover as zero, though. But still they can't number a page as zero.
The use of zero index for memory location comes from possible magnetic states of array of bits. Each such state is a kind of a cut, not a span. It's like a graduation on the tape measure, or mile stone on the side of the road. So it can start with zero.
So, if you are counting markers or separators, with zero magnitude, you can start at zero. And when you count spans or things of non-zero magnitude, you start at one. If you count apples, start at one. If you count spaces between apples start at zero.
calibas|11 months ago
Behold, the really stupid things you can do in Javascript:
IshKebab|11 months ago
Indexing should clearly start from 0. It leads to far more elegant code and lower risk of off-by-one mistakes.
barotalomey|11 months ago
I recently picked up Lua for a toy project and I got to say that decades of training with 0-based indexes makes it hard for me to write correct lua code on the first try.
I suppose 1-based index is more logical, but decades of programming languages choosing 0-based index is hard to ignore.
pansa2|11 months ago
Yes - many file formats also work with zero-based indices, not to mention the hardware itself.
One-based indexing is particularly problematic in Lua, because the language is designed to interoperate closely with C - so you're frequently switching between indexing schemes.
kragen|11 months ago
booleandilemma|11 months ago
ginko|11 months ago
I wouldn't say it's more logical. More intuitive perhaps.
slightwinder|11 months ago
dxuh|11 months ago
umanwizard|11 months ago
Why?
nikolayasdf123|11 months ago
[0,1,2) + [2,3,4) = [0,1,2,3,4)
meanwhile
[0,1,2] + [2,3,4] = [0,1,2,2,3,4] — this double counting is just ugly
bazoom42|11 months ago
phkahler|11 months ago
What about the range 0 < i ≤ N which starts with 1? Why only use ≤ on the lower end of the range? This zero-based vs one-based tends to come up in programming and mathematics, and both are used in both areas. Isn't it obvious that there is no universally correct way to index things?
sfink|11 months ago
I would note that the argument is weakened when you look at the later bound, since you have the same problem there, it's just more subtle and less commonly encountered -- yet it routinely creates security bugs!
It's because we don't work with integers, we work with fixed-size intervals within the set of integers (usually a power of two consecutive integers). So `for (i = 0; i < 256; i++)` is just weird when you're using 8-bit integers: your upper range is an inexpressible value, and could easily be compiled down to `for (i = 0; i < 0; i++)` with two's complement, eg if you did `uint8_t upper = 256; for (uint8_t i = 0; i < upper; i++)`. That case is simple, but it gets nastier when you are trying to properly check for overflow in advance and the actual upper value is computed. `if (n >= LIMIT) { return error; }` doesn't work if your LIMIT is based on the representable range. Nor does `if (n * elementSize >= LIMIT) { return error; }`. Even doing `limit = LIMIT / elementSize; if (n >= limit) { return error; }` requires doing the `LIMIT / elementSize` intermediate calculation in larger-width numbers. (In addition to the off-by-one if LIMIT is not evenly divisible by elementSize.)
So when dealing with overflow checks, 0 ≤ i ≤ N may be better. Well, a little better. `for (i = 0; i <= LIMIT; i++)` could easily be an infinite loop if LIMIT is the largest in-domain value. You want `i = 0; while (true) do { ...stuff...; if (i == LIMIT) break; i++; }` and at that point, you've lost all simple correspondence with mathematical ranges.
> Isn't it obvious that there is no universally correct way to index things?
I don't know about "obvious", but I agree that there is no universally correct way to index things.
jmount|11 months ago
sim7c00|11 months ago
"The above has been triggered by a recent incident, when, in an emotional outburst, one of my mathematical colleagues at the University —not a computing scientist— accused a number of younger computing scientists of "pedantry" because —as they do by habit— they started numbering at zero. "
:')
AndrewSwift|11 months ago
You see zero in elevators all the time.
wongarsu|11 months ago
You could argue it's a bit of a translation error. The French and German words for floor are referring to ways to add platforms above ground. Either by referring to walls, wooden columns or floor joists. Over the course of language evolution those words have both broadened and specialized, referring to building levels in general. But the way they are counted still reflects that they originally refer to levels built above ground. The English "floor" on the other hand counts the number of levels that are ground-like, which naturally starts at the actual ground.
tsoukase|11 months ago
effdee|11 months ago
https://www.cs.utexas.edu/~EWD/ewd08xx/EWD831.PDF
tim333|11 months ago
1970-01-01|11 months ago
I have Ar apples for sale.
Only $A.J each!
kibwen|11 months ago
Languages aimed at casual audiences (e.g. scripting languages like Lua) should maybe just provide two different ways of indexing into arrays: an `offset` method that's zero-indexed, and an `item` method that's one-indexed. Let users pick, in a way that's mildly less confusing than languages that let you override the behavior of the indexing operator (an operator which really doesn't particularly need to exist in a world where iterators are commonplace).
kragen|11 months ago
umanwizard|11 months ago
anigbrowl|11 months ago
- statements dreamed up by the utterly deranged.
They have played us for absolute fools.
djmips|11 months ago
sfink|11 months ago
But thank you for reminding me that I am zero centuries old. More decades old than I would like, but zero centuries.
pmarreck|11 months ago
frhack|11 months ago
hulitu|11 months ago
Computer memory.
golol|11 months ago
Y_Y|11 months ago
This discrepancy appears in physics too. It's common to use 1,2,3 for spatial indices, but when you reach enlightenment and think in terms of spacetime you add a zero index and not a four.
pwdisswordfishz|11 months ago
nikolayasdf123|11 months ago
but good point, remembering academic linear algebra, seeing 0..n-1 in sigma/sums notations would be not convenient
geenat|11 months ago
neves|11 months ago
It settles the discussion of array numbering. F*ck off Visual Basic, MS Javascript, and all the languages that said you should start with 1.
personalityson|11 months ago
ks2048|11 months ago
shrubble|11 months ago
personalityson|11 months ago
BeetleB|11 months ago
unknown|11 months ago
[deleted]