This is time efficient* but rather wasteful of space.
The best way to save space is to use a Bloom Filter.
If we capture all the even numbers, that would sadly only give us "Definitely not Even" or "Maybe Even".
But for just the cost of doubling our space, we can use two Bloom filters!
So we can construct one bloom filter capturing even numbers, and another bloom filter capturing odd numbers.
Now we have "Definitely not Even" and "Maybe Even" but also "Definitely not Odd" and "Maybe Odd".
In this manner, we can use the "evens" filter to find the odd numbers and the "odds" filter to find the even numbers.
Having done this, we'll be left with just a handful of unlucky numbers that are recorded as both "Maybe even" and "Maybe odd". These will surely be few enough in number that we can special case these in our if/else block.
The filters as a first-pass will save gigabytes of memory!
> But for just the cost of doubling our space, we can use two Bloom filters!
We can optimize the hash function to make it more space efficient.
Instead of using remainders to locate filter positions, we can use a mersenne prime number mask (like say 31), but in this case I have a feeling the best hash function to use would be to mask with (2^1)-1.
You are absolutely write! Let me write a Go program to implement this idea. The bloom filters will take approximately 5gb (for a 1% error rate) and take a few minutes to populate on a modern Macbook Pro.
Ah, yes, exactly the pointless diversion I needed for my lunch break. For science: generating a C# switch statement for similar purposes took 7 minutes on similar-ish hardware, but the resulting 99.2GB file could not be opened or compiled ('Stream is too long'), which was slightly disappointing.
Optimization efforts included increasing the internal buffer size of the StreamWriter used to create the source code: this reduced the runtime to around 6 minutes, as well as running a non-debug build, as it was observed that the poor Visual Studio metrics gathering process was contributing significantly to disk activity as well, but that ultimately didn't matter much. So, ehhm, yes, good job on that I guess?
Isn't the obvious thing to generate different classes for different ranges over the input? Then the classes could be loaded lazily.
And if you then make the ranges tree-shaped you get logarithmic complexity, which massively cuts down the O(n) of the rather naive chained `if` statements.
I haven't found any authoritative source, but I strongly suspect that the .NET bytecode format has 32-bit limits all over the place. Maaaybe you could break up the code into functions less than 1 GB in size and then chain them together.
I wonder if you could generate it via a Roslyn incremental source generator instead of as a file to bypass this limit. I'm guessing not, but it does sound like fun.
Gemini took 4 seconds to answer this prompt: "Here is a number 4200020010101. Think deeply about it and tell me if it is not or or not even."
So if you're concerned with privacy issues, you can run the assembly version proposed in the article locally and be well within the same order of performance.
Let's thank the author of the article for providing a decent alternative to Google.
ah, but the license is not that good we can't reproduce his code.
> any value over 2^31 seems to give random results.
Wow he really lucked out... On his way to perfecting a fully functioning and performant Even/Odd Detector, he stumbled upon a fully functioning and performant Coin Flip Simulator!
I took an ASIC design class in college, unfortunately with a heavy course load that didn't allow me to focus on it. For our final project we were given a numbered dictionary and asked to design a chip that would accept the characters on a 7 bit interface (ASCII), one character per clock cycle and output the dictionary number on an output interface but I can't remember how wide. We were graded on the size of the resulting ASIC and how many clock cycles it took from the last character in to the number on the output.
I started designing my modules, a ROM, a register with a ROM pointer, etc, etc, writing the Verilog and working out the clock sync between modules. Then I got 'lazy' and wrote a trie tree like implementation in Java, and have it spit out the whole tree in Verilog. It worked and just one clock cycle after the last letter my number would output. Fastest in the class! Also the most number of gates in the class. Surprised I got a 90 grade given I didn't use any of the advanced ASIC design the class taught. The TA didn't know what the hell they were looking at.
Yep! Something a bit counterintuitive on circuit design is that dedicated transistors will always beat reusing existing components. If we do reuse existing components like ALUs, multipliers, or state machines, we save on chip area but pay the penalty in clock cycles. Your approach was the extreme version of this tradeoff. You essentially unrolled the entire dictionary lookup into pure combinatorial logic (well, with registers for the input characters). One clock cycle latency because you weren't doing any sequential searching, comparing, or state machine transitions just racing electrons through logic gates.
This could be obviously done with much less code: Just add "if"s for all even number, and at the end just return "odd" if none of the evens matched. 50% less code!
Or even simpler: If it's 0, return "even". If not, do a recursive call to n-1, if that equals "even", return "odd", otherwise return "even".
But the best way is probably to just use a library. Yes, 500MB of additional dependencies, but then it's a one-liner.
> Now, this is a time-memory tradeoff, but my time on this earth is limited so I decided to meta-program the if statements using a programmer program in a different programming language.
Yeah... I come here to talk about that. Should have been
for i in range(0, 2**8, 2):
print(" if (number == "+str(i)+")")
print(" printf(\"even\\n\");")
print(" if (number == "+str(i + 1)+")")
print(" printf(\"odd\\n\");")
or
for i in range(0, 2**8, 2):
print(f""" if (number == {i})
puts("even");
if (number == {i + 1})
puts("odd");""")
I think we can improve this. Just make a microservice who generates the code on the fly and streams it to the compiler. Then you also just have to create the necessary code and don't waste the SSD with unused code-paths.
I recently asked a Qwen model to write me some code to remove errant spaces ("c he es e" instead of "cheese") in OCR'd text. It proceeded to write 'if' blocks for every combo of all English words and all possible errant spaces. I did not wait for it to finish...
If the author is available for consulting I have this bag of rice I need cooked. Should be around 30,000 grains, each needs about 1mL of water and 2m on the stove. Will pay $10 (2025 dollars)
I see this is 2023… the article refs GPT even then. Can’t believe it’s already that much time gone by, still seems like “last years big news”
I was gonna comment “this is what I really like to see on HN”. Then I saw the date and was sad that we’re having to dip into the history box to find fun/interesting articles more often of late it seems.
Anyone else interested in a temporary moratorium on all things LLM? We could have GPT-free-Wednesday, or something like that :)
> Anyone else interested in a temporary moratorium on all things LLM? We could have GPT-free-Wednesday, or something like that :)
I would be interested in a permanent moratorium, personally. There's no interesting content to be had in the various LLM articles that litter HN these days. Or failing a topic ban, at least give a way to filter it for those of us who are sick of hearing about AI hype.
With data-driven programming, I would have expected an SQL table containing all the precomputed results. Unless you carelessly add an index, it has the same asymptotic performance!
I know it's silly, but I just want to fix his first version with the minimum possible changes;
/* Copyright 2023. All unauthorized distribution of this source code
will be persecuted to the fullest extent of the law*/
#include <stdio.h>
#include <stdint.h>
#include <string.h>
int main(int argc, char* argv[])
{
uint8_t number = argc>1 ? argv[1][strlen(argv[1])-1]-'0' : printf("Usage: odd-or-even number\n");
if (number == 0)
printf("even\n");
if (number == 1)
printf("odd\n");
if (number == 2)
printf("even\n");
if (number == 3)
printf("odd\n");
if (number == 4)
printf("even\n");
if (number == 5)
printf("odd\n");
if (number == 6)
printf("even\n");
if (number == 7)
printf("odd\n");
if (number == 8)
printf("even\n");
if (number == 9)
printf("odd\n");
if (number == 10)
printf("even\n");
}
This way it basically works. It's a shame that it doesn't call out a non numeric argument but that's about the only problem. It relies on a trick, printf() returns the number of characters printed, so the error message string needs to be longer than 10.
Thanks for making me doubt myself & googling who that guy who made python was again, because surely "van der Gussom" isn't a normal Dutch name. Well played.
> As a side note, the program is amazingly performant. For small numbers the results are instantaneous and for the large number close to the 2^32 limit the result is still returned in around 10 seconds.
This reminds me of when I learned to program on my casio calculator.
There was a function to detect a key press which would return a number identifying the pressed key.
I needed to map that number to the letter printed on the key to print it on the screen.
I don't remember whether there was no hashmap data structure or I just didn't know about it, but I implemented it with a serie of if.
The problem with that solution is that while mapping A was fast, Z was very slow because it was at the end of the list.
That is how I discovered divide and conquer/ dichotomy with if branches.
I just had flash backs to a previous job where I was brought in to optimize another teams builds since they were now taking minutes instead of seconds.
I tracked it down to a folder with thousands of C++ files called things like uint_to_int.cc and inch_to_cm.cc and cm_to_m.cc. Basically the developer in charge of writing the conversion library took our typed units library and autogenerated a C++ file for every possible conversion the application might need to make.
Every time we added a new typed unit it would create another couple of dozen files to be compiled.
> I decided to implement this in the C programming language as it’s by far the fastest language on the planet to this day (thanks to the visionary genius Dennis Richie)
Am I lost?
Aren't the compiler/linker responsible for fast code, not the language itself?
There are language issues as well. 99% of C programs are valid C++, and if you compile with a C++ compiler instead of a C++ compiler will be slightly faster! C++ ha a stronger type system and so once in a while (very rarely) those C programs compile but give incorrect results since C++ allowed the optimizer to make an assumption that wasn't true. Fortran is often even faster because the language allows for even more assumptions. I don't know where Rust fits in here (Rust is hurt today because the better optimizes are designed for C++ and so don't take advantage of extra assumptions Rust allows - it was designed to allow different assumptions from C++ and likely could be better would a ground up optimizer but that would take a large team a decade+ to write: expensive)
Most of the difference in speed is the optimizer/linker. Assuming a fair competition the difference between Ada, C, C++, D, Fortran, Rust, Zig (and many others I can't think of) is very rarely even 1% in any real world situation. Of course it isn't hard add pessimization to make any language lose, sometimes accidentally, so fair competitions are very hard to find/make.
> I decided to use the slowest language on the planet, Python (thanks to the visionary genius of Ross van der Gussom).
given the article, it's fair to assume the author was joking around
that being said, the way the language is used and its ecosystem do contribute to the executable's efficiency. yet, given C's frugality, or the proximity between its instructions and the executed ones, it's not unfair to say that "C is fast"
Both, usually. A language's semantics can limit how much a compiler can speed up the language. Python, for example, is extremely difficult to make fast due to the fact that almost everything has the semantics of a hashmap lookup. C, in comparison, has relatively little in it that can't be mapped fairly straightforwardly to assembly, and then most of it can be mapped in a more difficult way to faster assembly.
> I saw from the SSD was around 800 MB/s (which doesn’t really make sense as that should give execution speeds at 40+ seconds, but computers are magical so who knows what is going on).
If anyone knows what’s actually going on, please do tell.
Infact, some really performant code such as glMatrix.js do not use any for loops for matrix math, just to allow the javascript engine to optimize the code as much as possible.
> How did I do this? Well I jumped online, using a mix of my early life experience coding emulators and hacking and looked into the x86(-64) architecture manuals to figure out the correct opcodes and format for each instruction. … Just kidding, that’s horrible. I asked ChatGPT
Ok but if you do want to play with writing binary code manually I recommend Casey Muratori's performance course
I really enjoy posts like this one because I always learn something. Sure the whole program can be written using bitwise comparison or modulus, but that's not the point. The thing I learned is how to map that into address space! That's a cool trick!
A much cooler approach would have been to generate the ASM from the same program, rather than generate a file from python and load that file from C++. The multi-stage build and filesystem are completely unnecessary.
The technique actually has a lot of practical applications, so it's useful to have a C++ library that helps you with generating amd64 machine code.
The interesting part isn’t the if-statement count but how quickly the compiler and branch predictor erase the differences. It’s a nice demo of why “manual cleverness” rarely beats modern toolchains.
I would also like to praise the visionary genius Ross van der Gussom, without whom this wonderful achievement in software engineering would not have been possible!
This reminds me of my personal "prime number" grabber research https://github.com/tigranbs/prime-numbers I needed to create the unique graph nodes and assign prime numbers, and to make things efficient, I thought, why not just download the list of known prime numbers instead of generating them one by one. So I did and compiled everything with a single Go binary. Ehh, good old days with a nice feith in making "the best" crappy software out there.
printf("%d is %s\n", n, last_binary_bit(n) == 0 ? "even" : "odd");
and the rest is trivial:
int last_binary_bit(int n) {
if (n == 0) return 0;
if (n == 1) return 1;
if (n == 2) return 0;
...
}
Come to think of it, with a little fancy math you could divide and conquer:
int last_binary_bit(int n) {
// Handle the easy cases.
if (n == 0) return 0;
if (n == 1) return 1;
// Number may be large. Divide and conquer. It doesn't matter where we split it,
// so use a randomized algorithm because those are fast.
for (;;) {
int r = random();
if (r < n) {
// Smaller numbers are easier.
int smaller1 = r;
int smaller2 = n - 4;
int bit1 = last_binary_bit(smaller1);
int bit2 = last_binary_bit(smaller2);
// Fancy math: even + even is even, even + odd is odd, etc.
if (bit1 == 0 && bit2 == 0) return 0;
if (bit1 == 0 && bit2 == 1) return 1;
if (bit1 == 1 && bit2 == 0) return 1;
if (bit1 == 1 && bit2 == 1) return 0;
}
}
}
> As a side note, the program is amazingly performant. For small numbers the results are instantaneous and for the large number close to the 2^32 limit the result is still returned in around 10 seconds.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
if (argc < 2) {
fprintf(stderr, "Usage: %s <string>\n", argv[0]);
return 1;
}
char *s = argv[1];
int i;
/* find the end of the string */
for (i = 0; s[i] != '\0'; ++i)
;
/* make sure the string wasn't empty */
if (i == 0) {
fprintf(stderr, "Error: empty string\n");
return 1;
}
/* last character is at s[i - 1] */
char d = s[i - 1];
if (d == '0')
printf("even\n");
if (d == '1')
printf("odd\n");
if (d == '2')
printf("even\n");
if (d == '3')
printf("odd\n");
if (d == '4')
printf("even\n");
if (d == '5')
printf("odd\n");
if (d == '6')
printf("even\n");
if (d == '7')
printf("odd\n");
if (d == '8')
printf("even\n");
if (d == '9')
printf("odd\n");
return 0;
}
xnorswap|2 months ago
The best way to save space is to use a Bloom Filter.
If we capture all the even numbers, that would sadly only give us "Definitely not Even" or "Maybe Even".
But for just the cost of doubling our space, we can use two Bloom filters!
So we can construct one bloom filter capturing even numbers, and another bloom filter capturing odd numbers.
Now we have "Definitely not Even" and "Maybe Even" but also "Definitely not Odd" and "Maybe Odd".
In this manner, we can use the "evens" filter to find the odd numbers and the "odds" filter to find the even numbers.
Having done this, we'll be left with just a handful of unlucky numbers that are recorded as both "Maybe even" and "Maybe odd". These will surely be few enough in number that we can special case these in our if/else block.
The filters as a first-pass will save gigabytes of memory!
gopalv|2 months ago
We can optimize the hash function to make it more space efficient.
Instead of using remainders to locate filter positions, we can use a mersenne prime number mask (like say 31), but in this case I have a feeling the best hash function to use would be to mask with (2^1)-1.
alexjurkiewicz|2 months ago
https://gist.github.com/alexjurkiewicz/1abf05f16fd98aabf380c...
nixpulvis|2 months ago
It's a joke post with some interesting bits and details.
ZeroConcerns|2 months ago
Optimization efforts included increasing the internal buffer size of the StreamWriter used to create the source code: this reduced the runtime to around 6 minutes, as well as running a non-debug build, as it was observed that the poor Visual Studio metrics gathering process was contributing significantly to disk activity as well, but that ultimately didn't matter much. So, ehhm, yes, good job on that I guess?
afandian|2 months ago
And if you then make the ranges tree-shaped you get logarithmic complexity, which massively cuts down the O(n) of the rather naive chained `if` statements.
jiggawatts|2 months ago
I haven't found any authoritative source, but I strongly suspect that the .NET bytecode format has 32-bit limits all over the place. Maaaybe you could break up the code into functions less than 1 GB in size and then chain them together.
opticfluorine|2 months ago
szundi|2 months ago
mft_|2 months ago
thih9|2 months ago
[1]: https://aphyr.com/posts/342-typing-the-technical-interview
[2]: https://www.richard-towers.com/2023/03/11/typescripting-the-...
wiz21c|2 months ago
So if you're concerned with privacy issues, you can run the assembly version proposed in the article locally and be well within the same order of performance.
Let's thank the author of the article for providing a decent alternative to Google.
ah, but the license is not that good we can't reproduce his code.
brap|2 months ago
I think I just experienced a segfault
classified|2 months ago
Did you want to test the LLM's grammatical comprehension?
moffkalast|2 months ago
kayge|2 months ago
Wow he really lucked out... On his way to perfecting a fully functioning and performant Even/Odd Detector, he stumbled upon a fully functioning and performant Coin Flip Simulator!
thedougd|2 months ago
I started designing my modules, a ROM, a register with a ROM pointer, etc, etc, writing the Verilog and working out the clock sync between modules. Then I got 'lazy' and wrote a trie tree like implementation in Java, and have it spit out the whole tree in Verilog. It worked and just one clock cycle after the last letter my number would output. Fastest in the class! Also the most number of gates in the class. Surprised I got a 90 grade given I didn't use any of the advanced ASIC design the class taught. The TA didn't know what the hell they were looking at.
weli|2 months ago
blauditore|2 months ago
Or even simpler: If it's 0, return "even". If not, do a recursive call to n-1, if that equals "even", return "odd", otherwise return "even".
But the best way is probably to just use a library. Yes, 500MB of additional dependencies, but then it's a one-liner.
majkinetor|2 months ago
layer8|2 months ago
xg15|2 months ago
_ache_|2 months ago
PurpleRamen|2 months ago
cowsandmilk|2 months ago
catigula|2 months ago
layer8|2 months ago
_kst_|2 months ago
niccl|2 months ago
mring33621|2 months ago
orzig|2 months ago
franciscop|2 months ago
travisgriggs|2 months ago
I was gonna comment “this is what I really like to see on HN”. Then I saw the date and was sad that we’re having to dip into the history box to find fun/interesting articles more often of late it seems.
Anyone else interested in a temporary moratorium on all things LLM? We could have GPT-free-Wednesday, or something like that :)
virgilp|2 months ago
bigstrat2003|2 months ago
I would be interested in a permanent moratorium, personally. There's no interesting content to be had in the various LLM articles that litter HN these days. Or failing a topic ban, at least give a way to filter it for those of us who are sick of hearing about AI hype.
LPisGood|2 months ago
We don’t _have_ to. You could start a blog and display the indomitable human spirit.
oneeyedpigeon|2 months ago
layer8|2 months ago
projektfu|2 months ago
https://en.wikipedia.org/wiki/IBM_1620#Anecdotes
Zambyte|2 months ago
enopod_|2 months ago
billforsternz|2 months ago
pcthrowaway|2 months ago
Or is the performance considered worse because it becomes O(n) (where n < MAX_UINT) vs. constant time ( O(MAX_UINT) )
jagged-chisel|2 months ago
willguest|2 months ago
processing the whole number is absurd
thewisenerd|2 months ago
https://news.ycombinator.com/item?id=38790597
4B If Statements (469 comments)
unwind|2 months ago
isoprophlex|2 months ago
Thanks for making me doubt myself & googling who that guy who made python was again, because surely "van der Gussom" isn't a normal Dutch name. Well played.
mcny|2 months ago
https://news.ycombinator.com/item?id=38790754
SiempreViernes|2 months ago
Amazing!
tanseydavid|2 months ago
fainpul|2 months ago
Save space.
Ditto. Sure, this overflows the stack, but you look cool doing it. Save time. Just buy more RAM.bobbylarrybobby|2 months ago
lubujackson|2 months ago
isEven is a performant, hand-compiled evenness checker for any 32 bit integer. A single file import that does one job and one job only!
not-so-darkstar|2 months ago
rossdavidh|2 months ago
robotguy|2 months ago
"Is it odd or Even?"
"YES"
avandecreme|2 months ago
There was a function to detect a key press which would return a number identifying the pressed key.
I needed to map that number to the letter printed on the key to print it on the screen. I don't remember whether there was no hashmap data structure or I just didn't know about it, but I implemented it with a serie of if.
The problem with that solution is that while mapping A was fast, Z was very slow because it was at the end of the list. That is how I discovered divide and conquer/ dichotomy with if branches.
Ensorceled|2 months ago
I tracked it down to a folder with thousands of C++ files called things like uint_to_int.cc and inch_to_cm.cc and cm_to_m.cc. Basically the developer in charge of writing the conversion library took our typed units library and autogenerated a C++ file for every possible conversion the application might need to make.
Every time we added a new typed unit it would create another couple of dozen files to be compiled.
KeplerBoy|2 months ago
bspammer|2 months ago
> Lets compile the code, disabling optimizations with /Od to make sure that the pesky compiler doesn’t interfere with our algorithm
encom|2 months ago
oneeyedpigeon|2 months ago
And that weird flag is because it's a windows compiler: cl.exe, not gcc.
dorianmariecom|2 months ago
sedatk|2 months ago
rplnt|2 months ago
whynotmaybe|2 months ago
Am I lost? Aren't the compiler/linker responsible for fast code, not the language itself?
bluGill|2 months ago
Most of the difference in speed is the optimizer/linker. Assuming a fair competition the difference between Ada, C, C++, D, Fortran, Rust, Zig (and many others I can't think of) is very rarely even 1% in any real world situation. Of course it isn't hard add pessimization to make any language lose, sometimes accidentally, so fair competitions are very hard to find/make.
Then again, the article was clearly sarcastic.
mbivert|2 months ago
given the article, it's fair to assume the author was joking around
that being said, the way the language is used and its ecosystem do contribute to the executable's efficiency. yet, given C's frugality, or the proximity between its instructions and the executed ones, it's not unfair to say that "C is fast"
rcxdude|2 months ago
reedf1|2 months ago
croes|2 months ago
philippta|2 months ago
If anyone knows what’s actually going on, please do tell.
ricardo81|2 months ago
hellzbellz123|2 months ago
zkmon|2 months ago
https://github.com/toji/gl-matrix/blob/master/dist/gl-matrix...
SatvikBeri|2 months ago
Ok but if you do want to play with writing binary code manually I recommend Casey Muratori's performance course
mal10c|2 months ago
mgaunard|2 months ago
The technique actually has a lot of practical applications, so it's useful to have a C++ library that helps you with generating amd64 machine code.
1a527dd5|2 months ago
unknown|2 months ago
[deleted]
runtimepanic|2 months ago
DeathArrow|2 months ago
klaff|2 months ago
ku1ik|2 months ago
unknown|2 months ago
[deleted]
riwsky|2 months ago
quux|2 months ago
layer8|2 months ago
Moreover, interviewers will be smitten by the hand-optimized assembly code.
almosthere|2 months ago
tigranbs|2 months ago
tomaskafka|2 months ago
Sulf1re|2 months ago
sfink|2 months ago
nonethewiser|2 months ago
taylorallred|2 months ago
wvbdmp|2 months ago
d--b|2 months ago
wiz21c|2 months ago
asgs|2 months ago
shevy-java|2 months ago
ks2048|2 months ago
Exuma|2 months ago
pcthrowaway|2 months ago
Lol
tornadofart|2 months ago
nmilo|2 months ago
ajsnigrutin|2 months ago
Many gigabytes saved!
/s
ITniggah|2 months ago
[deleted]
d-lisp|2 months ago
else
imtringued|2 months ago
./check_digit 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999