This is a fun puzzle. Here's my take on it. I stopped reading the article after the introduction so I could attack the problem fresh, so hopefully this isn't the same as what the author did.
The trick is working backwards. We want to quickly count the number of occurrences of each letter in the list of number words. Knowing how to do that, we just have to do a linear scan over the cumulative counts for A, B, C, ... to find the interval into which the query index fits. If it helps prime your intuition, my inspiration is counting sort with the counts evaluated partly analytically.
Number words can be divided into a fixed set of recurring components: one, two, three, four, ..., ten, eleven, twelve, thirteen, ..., twenty, thirty, ..., hundred, thousand, million, billion. Build an inverse index that maps each letter to the set of all component words that contain the letter, paired with an occurrence count. For example, the inverse index would map T to {(two, 1), (three, 1), (ten, 1), (twelve, 1), (thirteen, 2), ...}.
With this in hand, the problem is reduced to counting the number of each component word in the string.
Let's consider that subproblem for the component 'nine' ('nineteen' and 'ninety' are separate components) for numbers with up to n digits. Numbers with the digit pattern xxx...xxx9 each contribute one occurrence of 'nine', and there are 10^(n-1) such numbers ('... nine'). Likewise each xxx...xxx9xx contributes an occurrence, and there are again 10^(n-1) such numbers ('... nine hundred ...'), the pattern xxx...xxx9xxx has 10^(n-1) members ('... nine thousand ...'), etc.
This same analysis applies to all ones components (one, two, three, ..., nine), and an analogous analysis applies to other components.
I have a ruby solution that takes 20s to run on a machine that's about 5 years old.
I just brute-force generate the strings from 1-999999, as well as "million-prefix" strings of the form "(1-999)million". These are then sorted in the same array. Also compute the sum of the length of the strings from 1-999999 while we're at it.
Then, run through the sorted array, keeping a running total of the length. Each time you hit a "million-prefix" string, it's easy to compute the total length of all the numbers that start with the prefix - it's just the sum of the string lengths from 1-999999 plus 999999*prefix-length.
If this subtotal doesn't push you past the 51B limit, then keep going. If it does, then run through the 1-999999 numbers only, adding each one to the total individually, until you reach the magic 51B mark.
The practical upshot is that you can skip millions of numbers at a time.
Cool! A very nice read.. thanks for sharing. I actually looked at that puzzle but decided strawberry fields was more interesting. My solution comes up with reasonably efficient answers within a minute or so, even on a 1000h eeepc :) I don't have a nice writeup like you but the README contains a brief explanation (in concordance with the rules for the puzzle): https://github.com/BruceJillis/Strawberry-Fields I decided to post this because you mention constraints and my solution is written up in a constraint logic programming language (eclipse), it might be interesting to translate your solution and compare.
Ken and Dylan's solution is brilliant, and readable, but has a lot of magic. Don't be surprised if you read (and run!) their solution end-to-end, understanding each piece, but still don't believe that the whole thing works!
(Note, the solution is in 4 parts, but Part 4 is not in the table of contents.)
First, understand that it relies on Haskell's built-in (implicit/invisible) theorem-prover, lazines, and purity to work. It is not possible to directly translate their code into (much uglier) code in C/Python/Java and expect is to work.
Second, while they tersely explain all the math they use, like "automatic (formal) differentiation", "seminearring", it really helps if you are already familiar with these concepts from a college "modern algebra" class, in order to maintain the intuition in your head to follow the exposition.
Third, a lot of the magic of Haskell isn't explicitly explained in their blog post, such as:
0. Where is the, um, "execution" ("main" function)?
When not using the IO monad or other explicitly sequential operations, Haskell programs do not have any instructions (loops or command statement), only declarations and definitions/assigments. The only "instruction" in the program is the command to print the value of a function on the console. Haskell's built-in call-by-need need-resolution engine then does all the work of deciding which functions need to be called in which order to compute and print the one top-level function.
1. why differentiation works to solve this problem (Part 2). (The reason is not hard to see if you play with the functions, but is hard to see just by reading: The concrete examples of string lists are all in section 1, but differentiation is defined in section 2, and they don't show any examples of differentiation applied to concrete lists of strings.)
2. Why summing over vast trees is efficient. (Part 2 and Part 3)
Each node in the tree contains a reference to (basically) a small function, describing how to combine its branches. That function is collapsed to a value the first time it's value is needed (Haskell's lazy call-by-need semantics).
Note that the tree is extremely redundant! It contains something like (but not quite) log (n) distinct nodes in a tree of n total nodes representing n integeres. The use of indirection, which is implicit in Haskell (like Java!), means that by computing the a sum over part of the tree, we get many other computations for free, even though the code superficially appears to copying values.
Haskell's purity (and therefore, equivalence between functions and their results), means that each function call (including arguments is only ever evaluated more than once, without any explicit visible coding of that evaluation+memoization (not like Java).
3. Why sorting is efficient. The reason is basically the same as why summing is efficient (above).
couldn't you just use "sort" to sort the file on disk and then offset into the file to find the answer? i would expect sort to work fine for files larger than memory (and if it doesn't i bet there is some tool that does - merge sort with tapes is not so old...)
[edit: you'd need to add, then strip, carriage returns between numbers, i guess]
[+] [-] psykotic|14 years ago|reply
The trick is working backwards. We want to quickly count the number of occurrences of each letter in the list of number words. Knowing how to do that, we just have to do a linear scan over the cumulative counts for A, B, C, ... to find the interval into which the query index fits. If it helps prime your intuition, my inspiration is counting sort with the counts evaluated partly analytically.
Number words can be divided into a fixed set of recurring components: one, two, three, four, ..., ten, eleven, twelve, thirteen, ..., twenty, thirty, ..., hundred, thousand, million, billion. Build an inverse index that maps each letter to the set of all component words that contain the letter, paired with an occurrence count. For example, the inverse index would map T to {(two, 1), (three, 1), (ten, 1), (twelve, 1), (thirteen, 2), ...}.
With this in hand, the problem is reduced to counting the number of each component word in the string.
Let's consider that subproblem for the component 'nine' ('nineteen' and 'ninety' are separate components) for numbers with up to n digits. Numbers with the digit pattern xxx...xxx9 each contribute one occurrence of 'nine', and there are 10^(n-1) such numbers ('... nine'). Likewise each xxx...xxx9xx contributes an occurrence, and there are again 10^(n-1) such numbers ('... nine hundred ...'), the pattern xxx...xxx9xxx has 10^(n-1) members ('... nine thousand ...'), etc.
This same analysis applies to all ones components (one, two, three, ..., nine), and an analogous analysis applies to other components.
[+] [-] strags|14 years ago|reply
I just brute-force generate the strings from 1-999999, as well as "million-prefix" strings of the form "(1-999)million". These are then sorted in the same array. Also compute the sum of the length of the strings from 1-999999 while we're at it.
Then, run through the sorted array, keeping a running total of the length. Each time you hit a "million-prefix" string, it's easy to compute the total length of all the numbers that start with the prefix - it's just the sum of the string lengths from 1-999999 plus 999999*prefix-length.
If this subtotal doesn't push you past the 51B limit, then keep going. If it does, then run through the 1-999999 numbers only, adding each one to the total individually, until you reach the magic 51B mark.
The practical upshot is that you can skip millions of numbers at a time.
Ugly code here: http://codepuppies.com/ben/ita/t2.txt
[+] [-] BruceJillis|14 years ago|reply
[+] [-] mukyu|14 years ago|reply
[+] [-] darkane|14 years ago|reply
[+] [-] lurker14|14 years ago|reply
(Note, the solution is in 4 parts, but Part 4 is not in the table of contents.)
First, understand that it relies on Haskell's built-in (implicit/invisible) theorem-prover, lazines, and purity to work. It is not possible to directly translate their code into (much uglier) code in C/Python/Java and expect is to work.
Second, while they tersely explain all the math they use, like "automatic (formal) differentiation", "seminearring", it really helps if you are already familiar with these concepts from a college "modern algebra" class, in order to maintain the intuition in your head to follow the exposition.
Third, a lot of the magic of Haskell isn't explicitly explained in their blog post, such as:
0. Where is the, um, "execution" ("main" function)? When not using the IO monad or other explicitly sequential operations, Haskell programs do not have any instructions (loops or command statement), only declarations and definitions/assigments. The only "instruction" in the program is the command to print the value of a function on the console. Haskell's built-in call-by-need need-resolution engine then does all the work of deciding which functions need to be called in which order to compute and print the one top-level function.
1. why differentiation works to solve this problem (Part 2). (The reason is not hard to see if you play with the functions, but is hard to see just by reading: The concrete examples of string lists are all in section 1, but differentiation is defined in section 2, and they don't show any examples of differentiation applied to concrete lists of strings.)
2. Why summing over vast trees is efficient. (Part 2 and Part 3) Each node in the tree contains a reference to (basically) a small function, describing how to combine its branches. That function is collapsed to a value the first time it's value is needed (Haskell's lazy call-by-need semantics).
Note that the tree is extremely redundant! It contains something like (but not quite) log (n) distinct nodes in a tree of n total nodes representing n integeres. The use of indirection, which is implicit in Haskell (like Java!), means that by computing the a sum over part of the tree, we get many other computations for free, even though the code superficially appears to copying values.
Haskell's purity (and therefore, equivalence between functions and their results), means that each function call (including arguments is only ever evaluated more than once, without any explicit visible coding of that evaluation+memoization (not like Java).
3. Why sorting is efficient. The reason is basically the same as why summing is efficient (above).
[+] [-] andrewcooke|14 years ago|reply
[edit: you'd need to add, then strip, carriage returns between numbers, i guess]
[+] [-] lurker14|14 years ago|reply
[+] [-] anonymoushn|14 years ago|reply
[+] [-] lurker14|14 years ago|reply
[+] [-] danielharan|14 years ago|reply
Using Gauss's insight, Ruby can solve 10,000 times faster than a naive C solution.
I solved this a while back, while high on drugs my dentist gave me. Blog post is down, but code is here: http://refactormycode.com/codes/8-integer-puzzle
[+] [-] lurker14|14 years ago|reply
ThousandRange: def child_values @thousands * 1_000_000 + 499500 end
MillionRange: