I've been working on an alternative implementation of this but using tau instead, I'm not sure if my calculations are correct but I think it offers a 2x speedup over this method.
It's still in stealth mode but I'm hoping to unveil it sometime in late June.
I think it is fair to say my gene sequence is somewhere in there. It is also fair to say that my gene sequence is somewhere with alterations where my stomach processes Chipotle burritos with less.. fanfare. So it sounds like I need to come up with some kinda of a CAS9-ish process of finding this better burrito processing version of my gene sequence inside pi. The better version of my 'life of pi' for burritos.
> pingfs is a filesystem where the data is stored only in the Internet itself,
> as ICMP Echo packets (pings) travelling from you to remote servers and
>back again.
So... if your Internet connection goes down, you lose all your data?
I wonder if it might be easier to start with a immutable tree of typical office documents. Then filesystem just stores the diff. Eg you save a memo document and its pretty much the same as one of the standard memos. Think of the savings.
IIRC, since there are an infinite number of digits in Pi, and there are no infinitely repeating finite sequences in Pi, there are therefore an infinite number of finite sequences in Pi.
Granted, the index might be larger to represent than the actual sequence you're indexing, but...
Someone needs to alert the White House about this. Terrorists implementing piFS will have access to all the secrets of the US.
The movie industry should just shut down now. Any two people implementing piFS will have already shared all possible movie files, including future, unreleased films.
Your medical records are now available to anyone with time and patience.
The horror!
(Seriously, we should see if we can wind up Spicer about that first one and see if he explodes).
> Now, we all know that it can take a while to find a long sequence of digits in π, so for practical reasons, we should break the files up into smaller chunks that can be more readily found. In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.
So basically like a regular filesystem except now lookup each byte every time you intend to use it?
This is a reference to a common situation that arises anywhere in the internet that compression is discussed. There is a segment of people who seem to find it personally, mathematically, or even perhaps morally offensive that there can exist no algorithm that can take all possible inputs and be guaranteed to emit something compressed by at least one bit. This seems to be true despite the fact the proof of this statement is very, very simple; it can be sketched on a literal post-it note and you could easily convince an open-minded 10-year-old of its truth.
In those situations, you can bet money that some compression scheme will be proposed that fits into at least one of the two categories "did not check to see if it actually compresses data" or "works by hiding bits where you forgot to count, but they still count".
"I'll just give an offset into pi" is in the first category; it does, technically, work, in the sense that it can validly represent the original data, but it turns out that in order to represent the offsets into pi it takes more bits than the content had in the first place. (You can get lucky every once in a while, and store 14159265 as "0/8", but the vast, vast majority of sequences get expanded.) Everyone proposes this one. It's almost funny how often it gets proposed, given that it is actually a very slow method for making your data bigger. Don't hold your breath for your Fields Medal for that one.[1]
An example of the second case is "I'll just store the first few bytes in the filename of the file and then 'forget' that I have to count the filenames in the size of the compressed content". Vigorous, month-long debates about whether or not something "counts" as part of the content will follow, which can be resolved by pointing out that the challenge is basically to give a stream of bytes over the network to a computer that only has the algorithm on it that fully reconstructs all the content, but this rarely satisfies Our Hero who has discovered perfect compression.
I add the word "moral" not just as a jab, but as the only way I have to explain how these people react to things like pointing out "if that worked, I could compress all files to one bit/one byte" or "I can't actually reconstruct the original content with that information". They get offended.
[1]: Just for fun, let's implement the ideal "index into a sequence" compression algorithm. This is not written to proof standards, but "give the reader the flavor of what is going on" standards. (Technically it assumes the proof, in an attempt to illuminate it from another direction.) The problem with pi is that it tends to repeat a lot. Let's chunk up 8 bits into a byte, since that's pretty common, and let's build the idea sequence of bytes that we can index into easily. For completeness, let's just index the 8-bit bytes in a one sequence:
00000000000000010000001000000011
and so on, 0, 1, 2, etc, up to 255. The most efficient way to index this list is to break it up along the 8-bit boundaries, because you'll find (if you try it) that any more clever attempts to exploit overlapping numbers will end up at best matching that performance and at worse taking more bits. Therefore, we can index into that list to get a 0 as... 0. And the index for 1 is... uhhh... 1. And 143 is... 143. And so on.
You'll find this approach, extended into as many consecutive bits in a row as you please, will wildly outperform pi as a sequence to index into to get values. Because this is, of course, just the identity transform, and pi in practice does way worse than that. This transform is also wildly more efficient than the pi indexing operation, having computational complexity O(zero), if you'll pardon me the abuse of notation.
This brings to mind a practical question - what's the largest valid file that's been located in the digits of pi? Has anyone yet found a valid jpg, say? Or gif? PDF? I suppose a text file would be easiest.
They were talking about Pi day on the radio. "Today is the day the whole world celebrates Pi day". No, it really doesn't. It's just the US, the only country in the world that uses the MM/DD/YYYY order.
If you could convert all algorithms to constant time operation you could simulate the universe at any point in the future or the past and just read the bits out of memory from simulated future RAM as long as any correct algorithm was used to read and write bits to disk.
The pigeonhole principle means that compression can never be effective on average. For every input that can be compressed, there must be others that are expanded, such that the average compression ratio is at best 1.
Useful compression algorithms take advantage of the fact that inputs aren't evenly distributed. Real-world data tends to have repetition, self-similarity, or predictability, which real-world algorithms use to produce smaller output.
It's unlikely that πfs would provide useful compression for any real data, since it would have to line up entirely by random chance.
If computation is instantaneous, then you could recurse down to a single pointer, which points to a pair of pointers, which each point to another pair, and so on until you have an arbitrary amount of your data. Since computation is instantaneous, it is instantaneous to compute this first pointer for the contents of any hard disk. And since computation is instantaneous, rebuilding the files on the disk from that pointer is instantaneous.
You mean, if there was an oracle for instant pi-index lookup? All finite-length strings can be expressed as natural numbers, so the pi-index is just a mapping from that number to another, pi: N -> N, with no guarantee of any more efficiency. Over the set of all finite-length strings, I do not think this would provide an advantage. But it might be fun.
As long as you know the index into p of your file and its length,
its a simple task to extract the file using the Bailey–Borwein–Plouffe
formula Similarly, you can use the formula to initially find the index of your file
Now, we all know that it can take a while to find a long sequence of digits in p,
so for practical reasons, we should break the files up into smaller chunks that
can be more readily found.
In this implementation, to maximise performance, we consider each individual
byte of the file separately, and look it up in p.
I'm not entirely convinced my files would be present in those smaller chunks. You would need a copy of π (p) going way past the 5 trillion digit world record. Even then it's not guaranteed the bytes needed would be there.
I work in bioinformatics. I remember writing a rough implementation of this for the human genome during my phd as a way to compress sequence information. Turns out it doesn't perform better than gzip.
I bet that the index will be larger than the data for most cases... but then you could store the index of the index... of the index, and the number of times this goes on.
lol, this reminds me of the time when I took the numerical calculus class at the uni and I came up with the best compression algorithm ever: compress a file by using the bytes as points, so the only thing you need to recreate the file is to calculate the newton poly for that set of numbers. Of course, that also requires the metadata for the newton poly, and since you don't want a lossy algorithm, you need to make sure all points are represented.
[+] [-] jboggan|9 years ago|reply
It's still in stealth mode but I'm hoping to unveil it sometime in late June.
[+] [-] abulman|9 years ago|reply
I'm sure that you can do that before noon of the first of next month.
[+] [-] Retr0spectrum|9 years ago|reply
"At present it contains all possible pages of 3200 characters, about 104677 books." (https://libraryofbabel.info/About.html)
[+] [-] jlebar|9 years ago|reply
Copy-paste error caused a dropped exponent. It's 10^4677 books.
[+] [-] TremendousJudge|9 years ago|reply
[+] [-] david-given|9 years ago|reply
The library, by the way, is quite big. You couldn't fit it all in our universe --- in fact, you'd need 10^4594.
[+] [-] cyanbane|9 years ago|reply
[+] [-] torvald|9 years ago|reply
[+] [-] pimlottc|9 years ago|reply
So... if your Internet connection goes down, you lose all your data?
[+] [-] zkms|9 years ago|reply
[+] [-] ape4|9 years ago|reply
[+] [-] d4nt|9 years ago|reply
[+] [-] dandelion_lover|9 years ago|reply
https://en.wikipedia.org/wiki/Illegal_number
[+] [-] eliaskg|9 years ago|reply
[+] [-] BinaryIdiot|9 years ago|reply
[+] [-] paxcoder|9 years ago|reply
[+] [-] nannal|9 years ago|reply
[+] [-] univacky|9 years ago|reply
[+] [-] wruza|9 years ago|reply
Pi is not proven to be normal, but is suspected.
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] falcolas|9 years ago|reply
Granted, the index might be larger to represent than the actual sequence you're indexing, but...
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] lern_too_spel|9 years ago|reply
[+] [-] drivingmenuts|9 years ago|reply
The movie industry should just shut down now. Any two people implementing piFS will have already shared all possible movie files, including future, unreleased films.
Your medical records are now available to anyone with time and patience.
The horror!
(Seriously, we should see if we can wind up Spicer about that first one and see if he explodes).
[+] [-] jlas|9 years ago|reply
So basically like a regular filesystem except now lookup each byte every time you intend to use it?
[+] [-] notatoad|9 years ago|reply
[+] [-] jerf|9 years ago|reply
In those situations, you can bet money that some compression scheme will be proposed that fits into at least one of the two categories "did not check to see if it actually compresses data" or "works by hiding bits where you forgot to count, but they still count".
"I'll just give an offset into pi" is in the first category; it does, technically, work, in the sense that it can validly represent the original data, but it turns out that in order to represent the offsets into pi it takes more bits than the content had in the first place. (You can get lucky every once in a while, and store 14159265 as "0/8", but the vast, vast majority of sequences get expanded.) Everyone proposes this one. It's almost funny how often it gets proposed, given that it is actually a very slow method for making your data bigger. Don't hold your breath for your Fields Medal for that one.[1]
An example of the second case is "I'll just store the first few bytes in the filename of the file and then 'forget' that I have to count the filenames in the size of the compressed content". Vigorous, month-long debates about whether or not something "counts" as part of the content will follow, which can be resolved by pointing out that the challenge is basically to give a stream of bytes over the network to a computer that only has the algorithm on it that fully reconstructs all the content, but this rarely satisfies Our Hero who has discovered perfect compression.
I add the word "moral" not just as a jab, but as the only way I have to explain how these people react to things like pointing out "if that worked, I could compress all files to one bit/one byte" or "I can't actually reconstruct the original content with that information". They get offended.
[1]: Just for fun, let's implement the ideal "index into a sequence" compression algorithm. This is not written to proof standards, but "give the reader the flavor of what is going on" standards. (Technically it assumes the proof, in an attempt to illuminate it from another direction.) The problem with pi is that it tends to repeat a lot. Let's chunk up 8 bits into a byte, since that's pretty common, and let's build the idea sequence of bytes that we can index into easily. For completeness, let's just index the 8-bit bytes in a one sequence:
and so on, 0, 1, 2, etc, up to 255. The most efficient way to index this list is to break it up along the 8-bit boundaries, because you'll find (if you try it) that any more clever attempts to exploit overlapping numbers will end up at best matching that performance and at worse taking more bits. Therefore, we can index into that list to get a 0 as... 0. And the index for 1 is... uhhh... 1. And 143 is... 143. And so on.You'll find this approach, extended into as many consecutive bits in a row as you please, will wildly outperform pi as a sequence to index into to get values. Because this is, of course, just the identity transform, and pi in practice does way worse than that. This transform is also wildly more efficient than the pi indexing operation, having computational complexity O(zero), if you'll pardon me the abuse of notation.
[+] [-] falcolas|9 years ago|reply
[+] [-] pimlottc|9 years ago|reply
[+] [-] dannypgh|9 years ago|reply
[+] [-] stdgy|9 years ago|reply
To anyone confused: Look today's date!
[+] [-] Twirrim|9 years ago|reply
[+] [-] _joel|9 years ago|reply
[+] [-] omginternets|9 years ago|reply
[+] [-] narrator|9 years ago|reply
[+] [-] kukx|9 years ago|reply
[+] [-] mikeash|9 years ago|reply
Useful compression algorithms take advantage of the fact that inputs aren't evenly distributed. Real-world data tends to have repetition, self-similarity, or predictability, which real-world algorithms use to produce smaller output.
It's unlikely that πfs would provide useful compression for any real data, since it would have to line up entirely by random chance.
[+] [-] greggyb|9 years ago|reply
Your compression ratio would approach infinity.
[+] [-] aratno|9 years ago|reply
[+] [-] rnhmjoj|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] gesman|9 years ago|reply
Thus - storing and maintaining the offset in PI becomes more expensive than storing data elsewhere.
[+] [-] remx|9 years ago|reply
[+] [-] daxelrod|9 years ago|reply
You would not need to store the entire number.
https://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%9...
You
[+] [-] daemonk|9 years ago|reply
[+] [-] taejo|9 years ago|reply
[+] [-] 725686|9 years ago|reply
[+] [-] funkaster|9 years ago|reply
[+] [-] psadri|9 years ago|reply
A really great literary read btw.