Coalitional agency thoughts

Table of Contents

Making public some stuff I wrote in an email because it has some interesting ideas.

1. General directions I have in mind.

1.1. Collectives of agents can be modeled as agents when they trade

Take two agents A and B which action sets \(X_A\) and \(X_B\) and utility functions \(U_A(x_A, x_B)\), \(U_B(x_A, x_B)\). In general if they are individually maximizing their utility functions, then maybe their chosen actions \((x_A^*,x_B^*)\) will be some Nash-equilibrium of the game — but it may not be possible to interpret this as the action of a “super-agent”.

There are two ways to interpret the words “the action of a super-agent”:

  • as the action that maximizes some “total utility function”, where this total utility function has some sensible properties (like being increasing in each agent’s utility)
  • if the two agents could co-ordinate and choose their action, there is no other action that would be better for both of them — i.e. is Pareto-optimal. In fact it is a basic result in welfare economics that these two definitions are (under some assumptions) equivalent. And furthermore by the fundamental theorems trade achieves Pareto-optimality — i.e. trade allows the collective to be taken as a super-agent.

More precisely by “trade” we mean: perfect information, zero transaction costs, and (I guess the last one is only relevant in economics) competitive market structure. I guess it also depends on perfect property rights/perfect assurance, which is implicitly assumed under zero transaction costs.

1.1.1. Some questions I can explore:

  • Is there a thermodynamic analog of this? For instance transaction costs are always analogized to “friction”, does it make sense to say that a combined system doesn’t minimize free energy if there is some “friction” preventing this — I guess not, because “free energy” already takes that into account? IDK I need to study thermodynamics better. More generally how can we generalize this to settings where “trade” isn’t an explicitly defined thing?
  • Maybe construct a dynamic version of this (i.e. rather than assuming Nash equilibrium)
  • Maybe all of this has something to do with the paper The Reward Hypothesis is False — where they show that an MORL problem can only be expressed s a regular RL problem if the goal is to maximize a linear sum of rewards. If I’m thinking correctly, I think this implies that regular RL will train a coherent agent (or induce coherence)?

1.2. property rights stuff

We discussed this a bit over our call, but: “trade aligns incentives” requires two basic assumptions: (a) that there is no information asymmetry (b) there is perfect assurance, equivalently property rights/contract enforcement.

(In general I prefer this formulation to “outer and inner alignment” — the former is outer alignment; the relationship of the latter to inner alignment is more subtle but the idea is that there would be no “inner alignment problem” if your reward mechanism reliably persists forever; but once the AI gets smart enough it can just completely upend your reward mechanism.)

For instance the second fundamental theorem of welfare economics states a correspondence between “initial endowments” (which is a special case of “property rights allocations”) and “aggregate utility functions”.

I don’t know if right now I have a more precise idea than “property rights are probably worth studying; e.g. we should at least think about how to economically model settings where agents can violate rights/the initial rights allocations aren’t stable”.

I wrote some stuff about this as an undergrad: https://arxiv.org/pdf/2107.09651 — I don’t know if it’s very useful, but I did discover that there is interesting mathematical structure there, which I generally take as a sign of usefulness.

And as always, another question is how “property rights” translates into a thermodynamic context.

1.3. simply continuing work on my existing market-based RL project

i.e. on the attached paper. The key questions to work on are: - how can we make market-based agents more efficient — either by introducing “parent programs” or figuring out how to backpropagate into the space of agents - how can we get “modularity of state” in an arbitrary RL setting? - proving that market-based agents maximize e.g. infinite-horizon average reward in the long run

2. Re: your point about economics neglecting dynamics and computational complexity

Some essays and surveys on this are linked on this stack exchange post and this survey by Alfred Norman, but yeah — a proper general theory of this is lacking. Two thoughts I have:

This also applies the thermodynamics; everything is in terms of equilibrium states, not the dynamics of getting there. In both cases dynamics is linked to “learning”. But I do wonder: maybe in thermodynamics there is some physics describing the exact time-evolution, maybe a variational principle that can be transferred over to economics?

Maybe the “conditional on algorithmic information” intuition underlying bounded rationality also applies to time?

Author: manyu

Created: 2025-06-24 Tue 17:25