The perfect rolling sphere of alignment
Table of Contents
1. Alignment as Pareto-efficiency
Consider the following toy example.
Agent \(\alpha\) has action set \(X_\alpha\) and utility function \(U_{\alpha}(x_\alpha,x_\beta)\); likewise for agent \(\beta\). We consider these agents “aligned” if they can be modeled as a single super-agent. What does this mean? There are two ways to express this:
- The agents choose actions \((x_\alpha^*,x_\beta^*)\) such that there is no better (for both of them) action they could arrive at through co-ordination.
- The agents choose actions \((x_\alpha^*,x_\beta^*)\) which maximize an “aggregate utility function” (a utility function which satisfies some sensible properties, like being increasing in both agents’ utilities).
… in other words the agents act in a way which is Pareto-efficient.
In particular, alignment can be achieved by trade.
For instance, suppose we have a good G, with \(X_{\alpha}=[0,\infty)\) the “quantity of \(G\) consumed” by \(\alpha\) (with the quantity consumed by \(\beta\) set to \(q-x_{\alpha}\) always, so \(X_{\beta}=\varnothing\)) and \(U_{\alpha}(x_{\alpha})=\log(x_\alpha)\); \(U_\beta(x_\alpha)=\log(q-x_\alpha)\) (\(q\) is the total quantity of G). In the absence of trade where \(\alpha\) is the only actor, the Nash equilibrium is \(x_\alpha=q\) which gives \(\alpha\) a utility of \(q\) and \(\beta\) a utility of \(-\infty\).
However we might consider the following “expanded game”, where:
- There is a third agent \(\mu\) whose action is represented by “the price of \(G\)”. It acts by charging agents a price \(p\) for the amount of \(G\) they consume. Its goal is to minimize disequilibrium, i.e. \(U_\mu=-|q-x_\alpha-x_\beta|\)
- Then \(U_{\alpha}=\log(x_\alpha)-px_\alpha\) and \(U_{\beta}=\log(x_\beta)-px_\beta\).
Then the Nash equilibrium occurs at \((q/2,q/2,2/q)\). This also maximizes \(U_\alpha+U_\beta\).
It is also possible to think of all of the above as a constrained optimization problem with prices as Lagrange multipliers, etc.