jabowery's comments

jabowery | 1 year ago | on: Panic at the Job Market

Replace the 16th Amendment with a single tax on net assets at the interest rate on government debt, assessed at their liquidation value ... and use the revenue to privatize government with a citizen's dividend.

https://ota.polyonymo.us/others-papers/NetAssetTax_Bowery.tx...

When we got a law passed to privatize space launch services back in 1990

https://www.youtube.com/watch?v=boLdXiLJZoY

we were in the midst of a quasi-depression so I decided to address the problem of private capitalization of technology with the aforelinked proposal.

jabowery | 1 year ago | on: Brain overgrowth dictates autism severity, new research suggests

Don't count on it. My experience is that funding goes to those who are not serious about autism epidemiology. Back in the mid-1990s, I was at a startup in Silicon Valley with about 100 employees where, during a few year period, 5 of the employees had children diagnosed with autism severe enough that they were barely verbal at best. This struck me as a great opportunity to discover the cause so I contacted a Berkeley epidemiologist who had been funded to do autism research. His comment was simply that "Yes we know that these microclusters exist." and that was that. No follow up.

jabowery | 1 year ago | on: Compiling with Constraints

This is reminiscent of an argument I had with the Mercury Prolog guys regarding "typing" in logic programming. My point boils down to this:

Any predicate can be considered a constraint. Types are constraints. While it may be reasonable to have syntactic sugars for type declarations that, at compile time, are transformed into predicates, it is unreasonable to lard a completely different kind of semantics on top of an already adequate semantic such as first order logic.

https://groups.google.com/g/comp.lang.prolog/c/8yJxmY-jbG0/m...

jabowery | 2 years ago | on: Claude 3 model family

Dear Claude 3, please provide the shortest python program you can think of that outputs this string of binary digits: 0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

Claude 3 (as Double AI coding assistant): print('0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111')

jabowery | 2 years ago | on: Learning Theory from First Principles [pdf]

Learning theory is the attempt to formalize natural science up to decision. Natural science's unstated assumption is that a sufficiently sophisticated algorithmic world model can be used to predict future observations from past observations. Since this is the same assumption as Solomonoff's assumption in his proof of inductive inference, you have to start there: with Turing complete coding rather than Rissanen's so-called "universal" coding.

It's ok* to depart from that starting point in creating subtheories but if you don't start there you'll end up with garbage like the last 50 years of confusion over what "The Minimum Description Length Principle" really means.

*It is, however, _not_ "ok" if what you are trying to do is come up with causal models. You can't get away from Turing complete codes if you're trying to model dynamical systems even though dynamical systems can be thought of as finite state machines with very large numbers of states. In order to make optimally compact codes you need Turing complete semantics that execute on a finite state machine that just so happens to have a really large but finite number of flipflops or other directed cyclic graph of universal (eg NOR, NAND, etc.) gates.

jabowery | 2 years ago | on: John Walker, founder of Autodesk, has died

"In order to implement its universal transclusion and DRM (yes, Xanadu had a scheme for DRM and micropayments to creators), Xanadu had to be centralized."

Fallback positions from the idealized "roadmap" are what happens when VCs get involved with a system that offers that Zero To One advantage -- but you have to have a One to offer the VCs, which Memex didn't. The question then becomes how much of your road map can be recovered or, perhaps more to the point, do you even _want_ to recover in the light of ground truth experience? At present there is a lot of potential for Information Centric Networking that would be more likely realized in a Ship-Dumbed-Down-Decentralized-Xanadu1994 alternative universe than is likely to be realized now.

jabowery | 2 years ago | on: John Walker, founder of Autodesk, has died

1994: In the next room from me at Memex Corp. poor Keith Henson was draped over a chair (due to a bad back) working, alone, on the C++ Xanadu code to debug garbage collection among other things, because the original Smalltalk source had been lost. Memex Corp. was early enough in HTTP's development of lock-in network effects, that its acquisition of Xanadu _might_ yet have turned the tide. Why had the Smalltalk code been lost? Well, all I can tell you as that from my work with Roger (starting in 1996 on a rocket engine) that my understanding of events differs from that reported in Wired (and most others including, to some extent, Roger himself) and involves some pretty, shall we say, "bad behavior" on the part of certain parties that were more than a little partial to C++. Since this is hearsay, I won't go into more depth stating things "as fact". But it is pretty clear to me that the effort and investment put into making HTML, JS, etc. de facto standards, combined with Memex's acquisition of Xanadu rights (and potential willingness to open up the Xanadu protocols and implementation) at that critical juncture was fatally hampered by the C++-only handicap suffered by the Xanadu source.

Why didn't I step in and help poor Keith? Ever heard of Croquet's TeaTime?

https://dl.acm.org/doi/abs/10.1145/1094855.1094861

I was in a position to resurrect at least _that_ much of the original work I'd one at Viewtron Corp. of America based on David P. Reed's PhD thesis, and Reed was just down the street from us at Interval Research at that time, which rather tempted me away from helping Keith, even if I'd been authorized to do so, which I wasn't.

jabowery | 2 years ago | on: Learning Universal Predictors

That's what all "information criteria for model selection" are about. The difference is that Algorithmic Information is the only such information criterion that has been proven (by Solomonoff) optimal under the assumptions of natural science.

jabowery | 2 years ago | on: Learning Universal Predictors

As the guy who suggested to Marcus a lossless compression prize to replace the Turing Test, I've got to confess that all this pedantic sophistry "critiquing" algorithmic information is there for a good reason. In the immortal words of Mel Brooks: "We've got to protect our phoney baloney jobs gentlemen!"

https://youtu.be/bpJNmkB36nE

There is actually more at stake here than machine learning. This gets to the root of "bias" in the scientific method. Imagine what horrors, what risks, what chaos would be ours if a truly objective information criterion for causal model selection were to exist! Why, virtually every "sociologist" would be hauled to Hume's Guillotine in a Reign of Terror!

https://github.com/jabowery/HumesGuillotine

But to be clear, Marcus and I have a disagreement about pragmatics of such an approach to dispute processing in the natural sciences. He believes, for example, that the dispute over climate change should be handled by the standard processes in place with academia. My approach differs, based on my hard won experience with reforming institutional incentives:

https://jimbowery.blogspot.com/2018/04/necessity-and-incenti...

When it comes to multi-trillion dollar scientific questions, the conflicts of interest become so intense that you really need to apply a gold standard for objectivity and that is the single number: How big is your executable archive of the data in evidence.

While I understand the machine learning world looms as a rival for "unbiased" academic research, it nevertheless remains true that even in this emerging "marketplace of ideas", there is no formal definition of "bias" that disciplines discourse and thereby guides development at the institutional, let alone technical level. Everyone is weighing in with their fuzzy notions of "bias" that betray intense motivations when there has been, for over 50 years, a very clear and present mathematical definition.

jabowery | 2 years ago | on: The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest

This recent paper provides one of two rigorous measures (with code) of machine intelligence that could be used to discipline discourse about "AGI". The other (with code) is a seminal paper by Shane Legg and Joel Veness "An Approximation of the Universal Intelligence Measure".

jabowery | 2 years ago | on: Getting Lossless Compression Adopted for Rigorous LLM Benchmarking

The increasing recognition that "Language Modeling Is Compression" https://arxiv.org/pdf/2309.10668.pdf has not yet been accompanied by recognition that lossless compression is the most principled unsupervised loss function for world models in general, including foundation language models in particular.

Take, for instance, the unprincipled definition of "parameter count" not only in the LLM scaling law literature, but the Zoo of what statisticians called "Information Criteria for Model Selection". https://en.wikipedia.org/wiki/Model_selection#Criteria

The reductio ad absurdum of "parameter count" is arithmetic coding where an entire dataset can be encoded as a single "parameter" of arbitrary precision.

By contrast, the algorithmic bit of information (whether part of an executable instruction or program literal) is an unambiguous quantity up to the choice of instruction set. If you want to quibble about that instruction set choice, take it up with John Tromp https://tromp.github.io/cl/cl.html because what I'm about to propose obviates that along with a lot of other "arguments".

Since any executable archive of any kind of data can serve as a model of the world generating that data, it follows that any executable archive of any text corpus can serve as a language model with a rigorous "parameter count". Therefore, a procedure which runs LLM benchmarks against any such executable archive as a language model, contributes a uniquely rigorous data point to the literature on LLM scaling laws.

So, what I'm proposing is that authors of lossless compression algorithms consider adding a command-line option that, at the end of decompression, saves the state of the decompression process in a file that can be read back in and executed as a language model -- with the full understanding that these language models will perform very poorly on the vast majority of LLM benchmarks. The point is not to produce high quality language models. The point is to increase rigor in the research community by providing some initial data points that exemplify the approach.

jabowery | 2 years ago | on: Bayesians moving from defense to offense

It's always struck me as rather strange that since the motive for creating any kind of model is to calculate predictions, and that the most general kind of calculation is algorithmic, people use anything but algorithmic probability as the gold standard against which other approaches are compared. The "problems" with algorithmic probability (uncomputability, UTM "choice" etc.) seem to be "the dog ate my homework" excuses. No scientific model is required to prove itself to be the best of all possible models relative to a given set of observations in order to be considered the best current model relative to those observations. No "UTM" chosen on the basis of the observations to be modeled is reasonably considered anything but post-hoc theorizing.
page 1