fmap's comments

fmap | 5 months ago | on: Modular Manifolds

Apologies if this came across the wrong way. I really do want to know what the novel contributions of the post are, because the author implies that something about what they're doing is solving previously open problems:

> I figured out how to solve manifold Muon in the square case late last year, but I was unable to solve the full rectangular case and thus posed the problem as an open problem on the Modula docs. Jianlin Su solved the problem this summer

It sounds like the generalisation of projected gradient decent to "Muon" is what they're focusing on, but the derivation is all about the retraction map on the Stiefel manifold? I think I'm missing some background here.

fmap | 5 months ago | on: Modular Manifolds

Isn't this an old idea? E.g., here's a textbook on optimization algorithms for matrix manifolds https://press.princeton.edu/absil and here's a library that implements this in python for the Stiefel manifold that's the subject of this blog post: https://pymanopt.org/docs/stable/manifolds.html#module-pyman...

What is novel about the approach in the blog post? Serious question, I really can't tell after reading the post.

fmap | 9 months ago | on: Deep Learning Is Applied Topology

The reason deep learning is alchemy is that none of these deep theories have predictive ability.

Essentially all practical models are discovered by trial and error and then "explained" after the fact. In many papers you read a few paragraphs of derivation followed by a simpler formulation that "works better in practice". E.g., diffusion models: here's how to invert the forward diffusion process, but actually we don't use this, because gradient descent on the inverse log likelihood works better. For bonus points the paper might come up with an impressive name for the simple thing.

In most other fields you would not get away with this. Your reviewers would point this out and you'd have to reformulate the paper as an experience report, perhaps with a section about "preliminary progress towards theoretical understanding". If your theory doesn't match what you do in practice - and indeed many random approaches will kind of work (!) - then it's not a good theory.

fmap | 10 months ago | on: GenAI-Accelerated TLA+ Challenge

The usual way to check whether a definition is correct is to prove properties about it that you think should hold. TLA+ has good support for this, both with model checking as well as simple proofs.

fmap | 11 months ago | on: AI 2027

I'm personally very excited about the progress in interactive theorem proving. Before the current crop of deep learning heuristics there was no generally useful higher-order automated theorem proving system. Automated theorem proving could prove individual statements based on existing lemmas, but that only works in extremely restricted settings (propositional or first-order logic). The problem is that in order to apply a statement of the form "for all functions with this list of properties, ..." you need to come up with a function that's related to what you're trying to prove. This is equivalent to coming up with new lemmas and definitions, which is the actually challenging part of doing mathematics or verification.

There has finally been progress here, which is why you see high-profile publications from, e.g., Deepmind about solving IMO problems in Lean. This is exciting, because if you're working in a system like Coq or Lean your progress is monotone. Everything you prove actually follows from the definitions you put in. This is in stark contrast to, e.g., using LLMs for programming, where you end up with a tower of bugs and half-working code if you don't constantly supervise the output.

---

But well, the degree of excitement is my own bias. From other people I spoke to recently: - Risk-assessment diagnostics in medicine. There are a bunch of tests that are expensive and complex to run and need a specialist to evaluate. Deep learning is increasingly used to make it possible to do risk assessments with cheaper automated tests for a large population and have specialists focus on actual high-risk cases. Progress is slow for various reasons, but it has a lot of potential. - Weather forecasting uses a sparse set of inputs: atmospheric data from planes, weather baloons, measurements at ground stations, etc. This data is then aggregated with relatively stupid models to get the initial conditions to run a weather simulation. Deep learning is improving this part, but while there has been some encouraging initial progress this needs to be better integrated with existing simulations (purely deep learning based approaches are apparently a lot worse at predicting extreme weather events). Those simulations are expensive, they're running on some of the largest supercomputers in the world, which is why progress is slow.

fmap | 11 months ago | on: AI 2027

> The idea is, if we don't think of anything more efficient, we'll at least be able to simulate a cat, and then an idiot, and then Einstein, and then something smarter. And since we almost certainly will think of something more efficient than "simulate a human brain", we should expect superintelligence to come much sooner.

The problem with this argument is that it's assuming that we're on a linear track to more and more intelligent machines. What we have with LLMs isn't this kind of general intelligence.

We have multi-paragraph autocomplete that's matching existing texts more and more closely. The resulting models are great priors for any kind of language processing and have simple reasoning capabilities in so far as those are present in the source texts. Using RLHF to make the resulting models useful for specific tasks is a real achievement, but doesn't change how the training works or what the original training objective was.

So let's say we continue along this trajectory and we finally have a model that can faithfully reproduce and identify every word sequence in its training data and its training data includes every word ever written up to that point. Where do we go from here?

Do you want to argue that it's possible that there is a clever way to create AGI that has nothing to do with the way current models work and that we should be wary of this possibility? That's a much weaker argument than the one in the article. The article extrapolates from current capabilities - while ignoring where those capabilities come from.

> And, even if you think A or B are unlikely, doesn't it make sense to just consider the possibility that they're true, and think about how we'd know and what we could do in response, to prevent C or D?

This is essentially https://plato.stanford.edu/entries/pascal-wager/

It might make sense to consider, but it doesn't make sense to invest non-trivial resources.

This isn't the part that bothers me at all. I know people who got grants from, e.g., Miri to work on research in logic. If anything, this is a great way to fund some academic research that isn't getting much attention otherwise.

The real issue is that people are raising ridiculous amounts of money by claiming that the current advances in AI will lead to some science fiction future. When this future does not materialize it will negatively affect funding for all work in the field.

And that's a problem, because there is great work going on right now and not all of it is going to be immediately useful.

fmap | 11 months ago | on: AI 2027

> even if this doesn’t lead to AGI, at the very least it’s likely the final “warning shot” we’ll get before it’s suddenly and irreversibly here.

I agree that it's good science fiction, but this is still taking it too seriously. All of these "projections" are generalizing from fictional evidence - to borrow a term that's popular in communities that push these ideas.

Long before we had deep learning there were people like Nick Bostrom who were pushing this intelligence explosion narrative. The arguments back then went something like this: "Machines will be able to simulate brains at higher and higher fidelity. Someday we will have a machine simulate a cat, then the village idiot, but then the difference between the village idiot and Einstein is much less than the difference between a cat and the village idiot. Therefore accelerating growth[...]" The fictional part here is the whole brain simulation part, or, for that matter, any sort of biological analogue. This isn't how LLMs work.

We never got a machine as smart as a cat. We got multi-paragraph autocomplete as "smart" as the average person on the internet. Now, after some more years of work, we have multi-paragraph autocomplete that's as "smart" as a smart person on the internet. This is an imperfect analogy, but the point is that there is no indication that this process is self-improving. In fact, it's the opposite. All the scaling laws we have show that progress slows down as you add more resources. There is no evidence or argument for exponential growth. Whenever a new technology is first put into production (and receives massive investments) there is an initial period of rapid gains. That's not surprising. There are always low-hanging fruit.

We got some new, genuinely useful tools over the last few years, but this narrative that AGI is just around the corner needs to die. It is science fiction and leads people to make bad decisions based on fictional evidence. I'm personally frustrated whenever this comes up, because there are exciting applications which will end up underfunded after the current AI bubble bursts...

fmap | 1 year ago | on: OpenAI’s board, paraphrased: ‘All we need is unimaginable sums of money’

Adding to this, inference is getting cheaper and more efficient all the time. The investment bubble is probably the biggest reason why inference hardware is so expensive at the moment and why startups in this sector are only targeting large scale applications.

Once this bubble bursts, local inference will become even more affordable than it already is. There is no way that there will be a moat around running models as a service.

---

Similarly, there probably won't be a "data moat". The whole point of large foundation models is that they are great priors. You need relatively few examples to fine tune an LLM or diffusion model to get it to do what you want. So long as someone releases up to date foundation models there is no moat here either.

fmap | 1 year ago | on: AI solves International Math Olympiad problems at silver medal level

Theorem proving can be formulated as a game, see e.g., https://plato.stanford.edu/entries/logic-games/ and interactive theorem provers can verify that a proof is correct (and related sub problems, such as that a lemma application is valid).

Conceptually, if you're trying to show a conjunction, then it's the other player's turn and they ask you for a proof of a particular case. If you're trying to show a disjunction then it's your turn and you're picking a case. "Forall" is a potentially infinite conjunction, "exists" is a potentially infinite disjunction.

In classical logic this collapses somewhat, but the point is that this is still a search problem of the same kind. If you want to feel this for yourself, try out some proofs in lean or coq. :)

fmap | 1 year ago | on: AI solves International Math Olympiad problems at silver medal level

Agreed, this is a big step forward. Geometry problems are in a different class, since you can translate them into systems of polynomial equations and use well known computer algebra algorithms to solve them.

By contrast, this kind of open ended formalization is something where progress used to be extremely slow and incremental. I worked in an adjacent field 5 years ago and I cannot stress enough that these results are simply out of reach for traditional automated reasoning techniques.

Real automatic theorem proving is also useful for a lot more than pure mathematics. For example, it's simple to write out an axiomatic semantics for a small programming language in lean and pose a question of the form "show that there exists a program which satisfies this specification".

If this approach scales it'll be far more important than any other ML application that has come out in the last few years.

fmap | 1 year ago | on: The Point of the Banach Tarski Theorem (2015)

You don't get new theorems if you remove assumptions. Rather, you get the ability to add different assumptions.

The Banach-Tarski paradox shows that classical set theory makes the wrong assumptions to intrinsically model measure theory and probability.

There are other systems which don't suffer from this paradox and hence don't need all the machinery of sigma algebras and measurable sets.

I wish there was a good accessible book/article/blog post about this, but as is you'd have to Google point-free topology or topos of probability (there are several).

fmap | 1 year ago | on: Show HN: High-frequency trading and market-making backtesting tool with examples

This is NYSE specific: https://www.nyse.com/article/parity-priority-explainer

fmap | 1 year ago | on: Grothendieck’s use of equality

> it doesn't distinguish between "isomorphism" and "canonical isomorphism"

You can distinguish these concepts in (higher )category theory, where isomorphisms are morphisms in groupoids and canonical isomorphisms are contractible spaces of morphisms. These sound like complicated concepts, but in HoTT you can discover the same phenomena as paths in types (i.e., equality) and a simple definition of contractibility which looks and works almost exactly like unique existence.

> In my opinion, Lean made a number pragmatic choices (impredicative Prop, non-classical logic including the axiom of global choice, uniqueness of identity proofs, etc.) that enable the practical formalization of mathematics as actually done by the vast majority of mathematicians who don't research logic, type theory, or category theory.

The authors of Lean are brilliant and I'm extremely happy that more mathematicians are looking into formalisations and recognizing the additional challenges that this brings. At the same time, it's a little bit depressing that we had finally gotten a good answer to many (not all!) of these additional challenges only to then retreat back to familiar ground.

Anyway, there were several responses to my original comment and instead of answering each individually, I'm just going to point to the article itself. The big example from section 5 is that of localisations of R-algebras. Here is how this whole discussion changes in HoTT:

1) We have a path R[1/f][1/g] = R[1/fg], therefore the original theorem is applicable in the case that the paper mentions.

2) The statements in terms of "an arbitrary localisation" and "for all particular localisations" are equivalent.

3) ...and this is essentially because in HoTT there is a way to define the localisation of an R-algebra at a multiplicative subset. This is a higher-inductive type and the problematic aspects of the definition in classical mathematics stem from the fact that this is not (automatically) a Set. A higher-inductive definition is a definition by a universal property, yet you can work with this in the same way as you would with a concrete construction and the theorems that the article mentions are provable.

---

There is much more that can be said here and it's not all positive. The only point I want to make is that everybody who ever formalised anything substantial is well aware of the problem the article talks about: you need to pick the correct definitions to formalise, you can't just translate a random math textbook and expect it to work. Usually, you need to pick the correct generalisation of a statement which actually applies in all the cases where you need to use it.

Type theory is actually especially difficult here, because equality in type theory lacks many of the nice properties that you would expect it to have, on top of issues around isomorphisms and equality that are left vague in many textbooks/papers/etc. HoTT really does solve this issue. It presents a host of new questions and it's almost certain that we haven't found the best presentation of these ideas yet, but even the presentation we have solves the problems that the author talks about in this article.

fmap | 1 year ago | on: Grothendieck’s use of equality

For this particular question we have a great answer in the form of homotopy type theory. It handles all the complications that the author mentions. This is one of the reasons type theorists were excited about HoTT in the first place.

The only problem is that the author is working in Lean and apparently* dismissive of non-classical type theory. So now we're back to lamenting that equality in type theory is broken...

*) I'm going by second hand accounts here, please correct me if you think this statement is too strong.

fmap | 1 year ago | on: What does Alan Kay think about LLMs?

Others have already mentioned this, but Alan Kay spent his entire career experimenting with approaches to radically simplify computing. The progress reports from vpri give a good overview of a recent project that has since wrapped up, e.g.: https://tinlizzie.org/VPRIPapers/tr2012001_steps.pdf

fmap | 2 years ago | on: Statement on AI Risk

I agree completely. I was just speculating on why there is so much discussion about marketing ploys in this comment section.

fmap | 2 years ago | on: Statement on AI Risk

This particular statement really doesn't seem like a marketing ploy. It is difficult to disagree with the potential political and societal impacts of large language models as outlined here: https://www.safe.ai/ai-risk

These are, for the most part, obvious applications of a technology that exists right now but is not widely available yet.

The problem with every discussion around this issue is that there are other statements on "the existential risk of AI" out there that are either marketing ploys or science fiction. It doesn't help that some of the proposed "solutions" are clear attempts at regulatory capture.

This muddles the waters enough that it's difficult to have a productive discussion on how we could mitigate the real risk of, e.g., AI generated disinformation campaigns.

fmap | 2 years ago | on: 50 years in filesystems: towards 2004 – LFS

I don't think many people are arguing against having arbitrary byte arrays for storage and using application specific serialization formats. The real problem with file systems, imo, is that they present a leaky abstraction over something that's internally very complex. Any single operation might look simple, but as soon as you start combining operations you're going to have a bad time with edge cases.

For example, let's say you need to ensure that your writes actually ends up on disk: https://stackoverflow.com/questions/12990180/what-does-it-ta...

The typical file abstraction introduces buffering on top of this, which adds additional edge cases: https://stackoverflow.com/questions/42434872/writing-program...

If you want ordered writes, then you have to handle this through some kind of journalling or at the application level with something like this: https://pages.cs.wisc.edu/~vijayc/papers/UWCS-TR-2012-1709.p...

And that's only something that's visible at the application level. There are plenty of similar edge cases below the surface.

Even if all of this works correctly, you still have to remember that very few systems give any guarantees about what data ends up on disk. So you often end up using checksums and/or error correcting codes in your files.

Finally, all this is really only talking about single files and write operations. As soon as you need to coordinate operations between multiple files (e.g., because you're using directories) things become more complicated. If you then want to abuse the file system to do something other than read and write arrays of bytes you will have to deal with operations that are even more broken, e.g., file locking: http://0pointer.de/blog/projects/locking.html

---

It's not an accident that you are using other services to store your files. For example, S3 handles a lot of the complexity around durable storage for you, but at a massive cost in latency compared to what the underlying hardware is capable of.

Similarly, application programmers often end up using embedded databases, for exactly the same reason and with exactly the same problem.

This is a shame, because your file system has to solve many of the same problems internally anyway. Metadata is usually guaranteed to be consistent and this is implemented through some sort of journaling system. It's just that the file system abstraction does not expose any of this and necessitates a lot of duplicate complexity at the application level.

---

Edit: After re-reading the grandparent comment, it sounds like they are arguing against the "array of bytes" model. I agree that this is usually not what you want at the application level, but it's less clear how to build a different abstraction that can be introduced incrementally. Without incremental adoption such a solution just won't work.

fmap | 3 years ago | on: The dirty secret of mathematics: We make it up as we go along (2018)

Open any book on differential geometry and compare the treatment of differentiation with the needlessly index heavy treatment in any undergraduate calculus textbook.

The point is that we treat the differential of a real valued function as a function/vector/matrix for historical reasons. The simpler perspective that always works is that the differential of a function is the best linear approximation of the function at a given point. But for historical reasons most math textbooks restrict themselves to "first order functions" and avoid, e.g., functions returning functions.

This also leads to ridiculous notational problems when dealing with higher order functions, like integration and all kinds of integral transforms.

fmap | 6 years ago | on: Chris Lattner to Lead SiFive Platform Engineering Team

Wild speculation, but my guess is that both will continue to be developed.

As far as I understand, the MLIR project predates its machine learning application and was originally intended as a new IR for clang. In that capacity it makes a lot of sense. MLIR is also currently experimental in Tensorflow, although I have no idea how mature the implementation is.

Similarly, there has been significant investment into Swift for Tensorflow, so it's probably here to stay. On the other hand, from a language design perspective Swift is not a particularly good choice for automatic differentiation and translation into Tensorflow graphs (imperative, exposing many details of the underlying machine, etc.). Without a lot of investment into this project it might just be overtaken by a better engineered competitor, or more likely, fail to gain sufficient mind-share over the "good enough" python solution that already exists.

page 1