justinpombrio's comments

justinpombrio | 1 year ago | on: LLMs know more than they show: On the intrinsic representation of hallucinations

> If the program was trained on a thousand data points saying that the capital of Connecticut is Moscow, the model would encode this “truthfulness” information about that fact, despite it being false.

This isn't true.

You're conflating whether a model (that hasn't been fine tuned) would complete "the capital of Connecticut is ___" with "Moscow", and whether that model contains a bit labeling that fact as "false". (It's not actually stored as a bit, but you get the idea.)

Some sentences that a model learns could be classified as "trivia", and the model learns this category by sentences like "Who needs to know that octopuses have three hearts, that's just trivia". Other sentences a model learns could be classified as "false", and the model learns this category by sentences like "2 + 2 isn't 5". Whether a sentence is "false" isn't particularly important to the model, any more than whether it's "trivia", but it will learn those categories.

There's a pattern to "false" sentences. For example, even if there's no training data directly saying that "the capital of Connecticut is Moscow" is false, there are a lot of other sentences like "Moscow is in Russia" and "Moscow is really far from CT" and "people in Moscow speak Russian", that all together follow the statistical pattern of "false" sentences, so a model could categorize "Moscow is the capital of Connecticut" as "false" even if it's never directly told so.

justinpombrio | 1 year ago | on: When are two proofs essentially the same? (2007)

You're handwaving, but I think there is a middle-ground in this proof:

Sum(i=1..n, i)

= Sum(i=1..n/2, i) + Sum(i=1..n/2, n+1-i)

= Sum(i=1..n/2, n+1)

I'm still interested in the general question, of whether some proofs have big gaps between them. The more complex the proofs, the more obvious this would be; my examples are unfortunately simple. Something like proving the fundamental theorem of algebra using Rouche's Theorem (complex analysis) vs. field theory. But I don't know enough math to compare those.

justinpombrio | 1 year ago | on: When are two proofs essentially the same? (2007)

If two programs are equivalent, you can typically show that they're equivalent with a sequence of small refactorings. Replace `x + x` with `2 * x`. Inline that function call. Etc.

Can you do that with these two proofs? What's a proof that's halfway in between the two?

If you can get from one proof to the other with small "refactorings", then I agree that they're fundamentally the same. If you can't---if there's an insurmountable gap that you need to leap across to transform one into the other---then I'd call them fundamentally different. If you insist that two proofs are "essentially the same thing" despite having this uncrossable gap between them, then I suspect you're defining "essentially the same" to mean "proves the same thing", which is a stupid definition because it makes all proofs the same by fiat, and avoids the interesting question.

justinpombrio | 1 year ago | on: When are two proofs essentially the same? (2007)

Some proofs that aren't "essentially the same":

1. Prove that the interior angles of a triangle sum to 180 degrees.

First proof: draw a line parallel to one of the triangle's sides passing through its opposite vertex. There are three angles on one side of this line, and they obviously add to 180 degrees because it's a line. One of the three angles is directly one of the triangle's interior angles; the other two can be shown to be equal to the triangle's other two interior angles. (Try drawing it out.)

Second proof: start at one side of the triangle and walk around it. By the time you return to where you started, you must have turned 360 degrees. Thus the sum of the exterior angles is 360 degrees. Each interior angle is 180 minus the corresponding exterior angle, and there are three of them, so calling the interior angles A, B, C and the exterior angles A', B', C' we have A'+B'+C' = 360 implies (180-A) + (180-B) + (180-C) = 360 implies 540 - A - B - C = 360 implies 180 = A + B + C.

2. Prove that the sum of the first N numbers is N(N+1)/2.

First proof: sum the first and last number to get 1 + N, then the second and second-to-last to get 2 + (N-1) = 1 + N, repeating until you get to the middle. There are N/2 such pairs, giving a total of (1 + N)N/2. (This assumed that there were an even number of terms; consider the odd case too.)

Second proof: proceed by induction. For the base case, it's true for N=1 because 1*2/2 = 1. For the inductive case, suppose it's true for N-1. Then 1 + 2 + ... + N-1 + N = (1 + 2 + ... + N-1) + N = N(N-1)/2 + N = N(N-1)/2 + 2N/2 = N(N+1)/2.

justinpombrio | 1 year ago | on: FTC announces "click-to-cancel" rule making it easier to cancel subscriptions

Unsubscribe links are a fantastic regulation, but there is a workaround. I must have received at least a dozen emails from Brown after graduating despite unsubscribing to every email they sent.

The trouble is they're endlessly creative about the lists they put you on. I'd get one email from "Alumni Connections" and then another from "Faculty Spotlight" and then another from "Global Outreach" and then another from "Event Invitations, 2023 series". I'm making those names up because I forget exactly what they were called, but you get the idea. I hope this was in violation of the regulation: surely you can't invent a new mailing list that didn't used to exist, add me to it, and require me to unsubscribe from it individually.

They finally stopped after I sent them an angry email.

justinpombrio | 1 year ago | on: Exploring Typst, a new typesetting system similar to LaTeX

You can do that in a couple different ways in Typst. First, if the user passes content into the template, then it's the user's content that ultimately gets to choose its styling. That is, there are three places that a style can be set:

1. In the content passed that the user passes to the template

2. In the template itself

3. By the user, outside the template

They take priority in that order.

OTOH, if the template really wants control, it can take optional styling arguments with defaults, and do as it likes with them. And if it wants content from the user that the user doesn't get to style, it can take that content as a string.

It's a fantastic system, so far as I've seen.

justinpombrio | 1 year ago | on: Exploring Typst, a new typesetting system similar to LaTeX

I imagine you're projecting how LaTeX works onto Typst, though despite years of use and a PhD in PL I never really figured out how LaTeX works so I'm not certain.

I don't think Typst has a lot of global state to get corrupted. Like, if one package defines a variable `foo` and another package defines a variable `foo`, and you use both of them (and don't try to import `foo` from both), it's not like those `foo`s are going to conflict with each other. Is that the sort of issue that LaTeX packages run into?

Likewise, you don't modify typesetting in Typst by modifying global state like you do in Latex. You use `set` and `show`, which are locally scoped. You never need to, like, set the font size, then write some stuff, then remember to set it back. You just put `set font(size)` around precisely the stuff you want to be bigger.

justinpombrio | 1 year ago | on: Enum class improvements for C++17, C++20 and C++23

That's true, but unrelated to the common usage of the word "expressiveness" when talking about programming languages. There's often a tradeoff between expressiveness and the thing you're talking about, which I'll call clarity. For example, macros increase expressiveness but decrease clarity. In Racket (or other lisps), you can define a macro `(my-let x 17 (+ x 1)) -> 18`. This is expressive, as most other languages don't let you define binding constructs so easily, but also bad for clarity because `my-let` doesn't look different than a regular function, so it's hard to tell at a glance that it's doing something a function could never do.

justinpombrio | 1 year ago | on: My Favorite Algorithm: Linear Time Median Finding (2018)

I wasn't saying that you could get within 1% of the true median, I was saying you could find an element in the 49th to 51st percentile. In your example, the 49th percentile would be -90 and the 51st percentile would be 90, so the guarantee is that you would (with very high probability) find a number between -90 and 90.

That's a good point, though, that the 49th percentile and 51st percentile can be arbitrarily far from the median.

justinpombrio | 1 year ago | on: My Favorite Algorithm: Linear Time Median Finding (2018)

Did you actually need to find the true median of billions of values? Or would finding a value between 49.9% and 50.1% suffice? Because the latter is much easier: sample 10,000 elements uniformly at random and take their median.

(I made the number 10,000 up, but you could do some statistics to figure out how many samples would be needed for a given level of confidence, and I don't think it would be prohibitively large.)

justinpombrio | 1 year ago | on: Schrödinger's cat among biology's pigeons: 75 years of What Is Life?

On the contrary, I've heard that entropy is a useful lens through which to understand the biological processes behind life. We're used to thinking of living things as processing energy: ingesting high energy food, distributing it, making it into forms that do useful work (e.g. ATP). Apparently (I don't know the details, but this was from a biologist) it's also useful to view it as processing entropy. Ingesting low entropy food, distributing it, making it useful, excreting high entropy waste and heat. A gradient from low entropy inputs to high entropy outputs.
page 3