top | item 39916165

Legendre transform, better explained (2017)

126 points| harperlee | 2 years ago |blog.jessriedel.com | reply

29 comments

order
[+] crdrost|2 years ago|reply
I, too, spent a long time staring at expressions like

    “half-invert p(x, v) to get v(x, p) s.t. 

    p(x, v(x, q)) = q

    then the Legendre transform is 

    H(x, p) = p v(x, p) – L(x, v(x, p))”
And I did come to one of the same conclusions as this article, which is that if we're talking pure mathematics, these “thermodynamic” expressions like (∂L/∂v)_x, (∂L/∂p)_x are deeply easy to get confused about and in fact you should just say “the derivative of the function with respect to its first argument holding the other arguments constant” and therefore introduce different functions which compute the same value under different symbols, say

    Λ(x, p) = L(x, v(x, p))
    ∂₂Λ = ∂₂L ∂₂v
so that you're not scratching your head about “why is the derivative of L with respect to v showing up here, v is now a function isn't it?”

The formulation of first f derivatives as inverse functions is new to me but makes sense.

However, I do think that we do even worse with linear algebra. I believe I could walk up to any college senior in physics and they wouldn't know that “the determinant is the product of the eigenvalues,” but this should be as well-known as “the mitochondria are the powerhouse of the cell.” I think this is because we introduce a complicated way to calculate determinants and then we use determinants to calculate the eigenvalues?

[+] prof-dr-ir|2 years ago|reply
Agreed, the way thermodynamics is often taught is such a mess.

My personal and controversial [0] take is that the free energy should really be seen as the Legendre transform of the entropy, not of the energy.

I know it is ultimately semantics, but this viewpoint makes the passage from the micro-canonical to the canonical ensemble so much nicer. In particular, the saddle point approximation for the canonical partition function makes it natural that the ensembles are equivalent in the thermodynamic limit... through a Legendre transform!

Bonus corollary: the statement mentioned in the blog about the derivatives being each other's inverses is just saying that T(E) and E(T) in respectively the micro-canonical and the canonical ensemble define the same relation between E and T.

[0] Proof of controversiality: even Wikipedia disagrees with me here, see https://en.wikipedia.org/wiki/Thermodynamic_free_energy

[+] mydogcanpurr|2 years ago|reply
> I think this is because we introduce a complicated way to calculate determinants and then we use determinants to calculate the eigenvalues?

Yes, the determinant should be taught and defined as the volume of the parallelepiped in n-dimensions defined by the columns of the given square matrix. This perspective makes it immediately obvious that the eigenvalues scale the parallelepiped in each of its dimensions (a basis of eigenvectors makes it even simpler). Of course the volume (determinant) must be the product of these scaling factors (eigenvalues)! Since algebra is too convenient for solving problems, this geometric intuition is often an afterthought if it's even taught at all.

[+] shiandow|2 years ago|reply
I know programmers like to blame mathematicians for writing functions with lots of one letter variable names, but it's the physicists who insist on doing so without defining any of them.

You want to know what V is? It's clearly the potential, we've defined it six papers ago! Oh you were wondering what it's type was, well it's usually a scalar field. No, don't write the parameter as t that changes the whole meaning!

[+] archgoon|2 years ago|reply
> I believe I could walk up to any college senior in physics and they wouldn't know that “the determinant is the product of the eigenvalues,"

Unless things have gotten significantly worse in physics education in the past decade, I'd be happy to take the other side of that bet.

I will also be willing to bet they could prove it.

The problem you'd have with physicists is convincing them that there are matrices that aren't diagonalizable.

[+] ijustlovemath|2 years ago|reply
The abuse of differentials in explanations like this reminds of this classic and insightful MathOverflow answer: https://math.stackexchange.com/questions/3266639/notation-fo...
[+] dang|2 years ago|reply
Thanks! That is enlightening. It made me realize that the confusion I always felt was actually in the notation all along.

That link was discussed here btw:

On Leibniz Notation - https://news.ycombinator.com/item?id=39064174 - Jan 2024 (95 comments)

[+] maxminminmax|2 years ago|reply
It’s ironic to see the link to that post which implicitly assumes that functions are defined on R^n, under a post about Legendre transform, whose point is that functions are defined on state spaces of systems, and only become represented by functions on R^n once we parameterize the state spaces by state variables. So the value of f at a given point doesn’t depend on what your favorite letter (aka state variable) is, but the value of f’ certainly does. And Legendre transform, as is actually explained, albeit cryptically, in Goldstein comes from the fact that we have 2d state space - phase space of 1d system with config variable y, velocity variable x, and momentum variable u, - on which we have have non-linearly related variables x and u.

In Legendre transform, what we have (the y variable is a red herring, and I will ignore it; everything happens "pointwise in y"), is curve in u-x plane, which we lift to u-x-z space in two ways -- that is, we find functions f and g defined for the points on that curve such that: 1) if the curve is parametrized by x, so that f is a function of x, then df/dx=u 2)if the curve is parametrized by u, so that g is a function of u, then dg/dx = u. (Why do we want this? Presumably because when x is velocity and f is energy, u is momentum, and we want g to have same property going back. And yes, there are conditions when one can parametrize a curve by one of the coordinates, either locally, or globally; one such is that u is monotone increasing function of x - that corresponds to convexity of f.) Of course now "derivatives of f and g are inverse" is tautological.

If we already know f(x), but don't know neither u nor g we could set u = df/dx and try to compute g. Or we could do it the way Goldstein does it: dg/du=x, so dg=xdu (this is an ODE), integrating it "by parts" g=int x du = xu - int u dx = xu - f.

(In advanced speak, u-x curve is a Lagrangian in the u-x plane, which is symplectic as every sum of vector space and its dual is; the functions f and g correspond to lifts of this Lagrangian to Legendrians based on choice of "canonical" 1-forms udx and xdu, respectively,so that df - udx=0 and g-xdu=0.)

[+] abetusk|2 years ago|reply
Thank you so much for this link. I was having trouble following some of the notation that came up with automatic differentiation and I think this clears it up.
[+] ykonstant|2 years ago|reply
Very nice article, kudos to the author. The inverse relation between Jacobians generalizes to a duality statement via the symplectic structure of the configuration space; the section https://en.wikipedia.org/wiki/Hamiltonian_mechanics#From_sym... on Wikipedia has some details. This symplectic duality is my preferred way of looking at Hamiltonian-Lagrangian transitions.
[+] LolWolf|2 years ago|reply
It’s neat! To be fair, as a physicist, I did not understand the Legrendre transform essentially until taking convex optimization (where it is known as the Fenchel conjugate).

Many sources, but all of them are reasonable and give a constructive definition that actually explains what it does: we can characterize a function either by its graph, or its supporting hyperplanes (when it is a closed, convex function).

While the observation is almost silly, it has very deep consequences for different characterizations of problems and other constructions!

[+] ericdfoley|2 years ago|reply
Helliwell & Sahakian Modern Classical Mechanics at least seems to do a much better job of explaining the Legendre transform than Goldstein, but it still never mentions the convexity requirement on f.

I feel like understanding the general convex conjugate and then seeing the Legendre transform as a special case is almost more intuitive.

[+] doppioandante|2 years ago|reply
Wow, I've been looking for a meaningful definition of the Legendre transform for ages, thanks for writing this up
[+] SpaceManNabs|2 years ago|reply
How beautiful that this blog post was made years after I was struggling in thermodynamics to understand these transforms. Now if someone could make a post for the Laplace Transform with the audience being people familiar with the fourier transform.
[+] kevindamm|2 years ago|reply
Really enjoyed this post, it is both understandable and revelatory. I had some introduction to the concepts here from calculus and physics classes but my mathematics interests are more along the branches of Abstract Algebra than Analysis so I didn't expect to enjoy it much, and I wonder if I would have hung on for as long if it didn't have the poor exposition first (and the promise of a better presentation).

I wonder if more math-related material was given in this "look how confusing, now wait look at it this way" would be more engaging, overall. Perhaps replacing the first part with a demonstration instead of mocking established representations. But maybe there is something to the "you're not alone, this way of looking at it is confusing and hand-wavy" even if done deliberately, just to give comfort to students making sense of a concept for the first time. Especially with math, I think mamy people would be more eager to learn it if that initial uncomfortable and confusing stage is considered normal for everyone.

Also, side question, is the content of this post considered Tropical Mathematics?

[+] mgaunard|2 years ago|reply
Not nearly as good as the explanation of the Fourier transform we had the other day.
[+] lupire|2 years ago|reply
Legendre transform moves a ruler (tangent line) along a convex function, measuring how much "lag" the function accumulated while "accelerating" up to its velocity at a certain moment, relative to having constant velocity for its entire history.

The larger the function's 2nd derivative is, the smaller the transform value is. And vice versa. Since the tranform is written in terms of the original function's derivative, not it's "x value", the derivative of the transform is inversely proportional to the derivative of the function

?

[+] bigbacaloa|2 years ago|reply
I found this explanation quite bad. Poorly motivated and making a priori regularity assumptions that are not necessary. The quoted explanation by Arnold is much better.