Thermodynamics
Table of Contents
I want to learn thermodynamics. I have decided that I will have understood thermodynamics when I am able to coherently answer the following questions:
- The Maximum entropy principle is supposed to be an epistemic thing, like a restatement of Bayes’ theorem — why, then, do real systems also “maximize entropy” (e.g. gases following the Boltzmann distribution)? Does “the universe” perform Bayesian inference too?
- “So if I know the exact positions and momenta of every particle in this pot of boiling water, its temperature is zero?” “Yep!” “… and yet, when I touch it, it scalds.” Does my finger have a mind of its own, with less information than I do?
- Do systems maximize entropy or is entropy just non-decreasing? Liouville’s theorem and coarse-graining only imply the latter …
- How does observation not decrease entropy — it literally gives you new information!
- this is the more general version of “if boiling water turns into cold water + electricity has entropy decreased?”
- I guess the answer is to say: when you’re observing it, it’s not a closed system; the combined entropy of you + the system is still increasing from an external perspective. But IDK, what if you just view light rays coming from the system?
- There seems to be some link between thermodynamics and semantics/model theory.
- What exactly is the thermodynamics/bounded-rationality (Ortega & Braun), active-inference, bayesian-mechanics stuff?
- I can intuitively understand “systems perform approximate Bayesian inference, which is bounded rationality; also systems minimize free energy”. Is that what it is?
- One thing I don’t understand is: a reversible computer needn’t lose any heat, right? (think: computing the digits of pi with frictionless colliding blocks). So how does thermodynamics say anything about logically non-omniscient things at all?
- I’ve seen comparisons between free energy minimization and no free lunch (e.g. by Yudkowsky). I would like to pump my intuition about no-free-lunch to free energy minimization.
- What exactly is thermodynamics, in a nutshell? I know it’s something “epistemic” – its place is among statistics, information theory, computability theory, logic, decision theory, games and economics – rather than mechanics, relativity, field theory and chemistry … but what is it, in one line? I want to say something like “the bridge between physics and agency”, or “the science of how the universe computes”.
1. References marked by [IMPORTANCE]
- Yuxi Liu’s blog
- Ortega and Braun papers:
- [75] Second Law of Thermodynamics by Eliezer Yudkowsky
- [75] An equilibrium of no free energy by Eliezer Yudkowsky
- [80] Free energy principle wikipedia page
- [50] Richard Ngo’s tweet and replies thereoef
- [50] Demystifying the second law of thermodynamics by Eigil Rischel
- [20] Clarifying the free energy principle (with quotes)
- [50] Physical foundations of Landauer’s principle by Michael P Frank
- [80] Bayesian mechanics
- [40] Bayesian mechanics - map/territory
- [60] Shore & Johnson (1980): Axiomatic Derivation of the Principle of Maximum Entropy and the Principle of Minimum Cross-Entropy
- [50] Feynman lectures on Computation
2. Why does physics think?
2.1. Ensembles and ideal gases
2.2. Dependence on initial conditions
2.3. Bayesian mechanics
3. Fun: Homomorphisms and the second law
One informal way to think of homomorphisms in math is that they are maps that do not “create information out of thin air”. Isomorphisms further do not destroy information. The terminal object (e.g. the trivial group, the singleton topological space, or the trivial vector space) is the “highest-entropy state”, where all distinctions disappear and reaching it is heat death.
- Take, for instance the group homomorphism \(\phi:\mathbb{Z}^+\to\mathbb{Z}_{4}^+\). Before \(\phi\) was applied, “1” and “5” were distinguished: 2 + 3 = 5 was correct, but 2 + 3 = 1 was wrong. Upon applying this homomorphism, this information disappears — however, no new information has been created, that is: no true indinstinctions (equalities) have become false.
- Similarly in topology, “indistinction” is “arbitrary closeness”. Wiggle-room (aka “open sets”) is information, it cannot be created from nothing. If a set or sequence goes arbitrarily close to a point, it will always be arbitrarily close to that point after any continuous transformations.
- There is no information-theoretical formalization of “indistinction” on these structures, because this notion is more general than information theory. In the category of measurable spaces, two points in the sample space are indistinct if they are not distinguished by any measurable set — and measurable functions are not allowed to create measurable sets out of nothing.
(there is also an alternate, maybe dual/opposite analogy I can make based on presentations — here, the the highest-entropy state is the “free object” e.g. a discrete topological space or free group, and each constraint (e.g. \(a^5=1\)) is information — morphisms are “observations”. In this picture we see knowledge as encoded by identities rather than distinctions — we may express our knowledge as a presentation like: \(\langle X_1,\dots X_n\mid X_3=4,X_2-X_1=2\rangle\), and morphisms cannot be concretely understood as functions on sets but rather show a tree of possible outcomes, like maybe you believe in Everett branches or whatever.)
In general if you postulate:
- … you live on some object in a category
- … time-evolution is governed by some automorphism \(H\)
- … you, the observer, have beliefs about your universe and keep forgetting some information (“coarse-grains the phase space”) — i.e. your subjective phase space is also an object in that category, which undergoes homomorphisms
Then the second law is just a tautology. The second law we all know and love comes from taking the universe to be a symplectic manifold, and time-evolution as governed by symplectomorphisms. And the point of Liouville’s theorem is really to clarify/physically motivate what the Jaynesian “uniform prior” should be. Here is some more stuff, from Yuxi Liu’s statistical mechanics article:
In almost all cases, we use the uniform prior over phase space. This is how Gibbs did it, and he didn’t really justify it other than saying that it just works, and suggesting it has something to do with Liouville’s theorem. Now with a century of hindsight, we know that it works because of quantum mechanics: We should use the uniform prior over phase space, because phase space volume has a natural unit of measurement: \(h^N\), where \(h\) is Planck’s constant, and \(2N\) is the dimension of phase space. As Planck’s constant is a universal constant, independent of where we are in phase space, we should weight all of the phase space equally, resulting in a uniform prior.