The perfect rolling sphere of alignment

1. Alignment as Pareto-efficiency
2. scratch

1. Alignment as Pareto-efficiency

Say Agent \(\alpha\) has action set \(X_\alpha\) and utility function \(U_{\alpha}(x_\alpha,x_\beta)\); likewise for agent \(\beta\). We consider these agents “aligned” if they can be modeled as a single super-agent. What does this mean? There are two ways to express this:

The agents choose actions \((x_\alpha^*,x_\beta^*)\) such that there is no better (for both of them) action they could arrive at through co-ordination.
The agents choose actions \((x_\alpha^*,x_\beta^*)\) which maximize an “aggregate utility function” (a utility function which satisfies some sensible properties, like being increasing in both agents’ utilities).

… in other words the agents act in a way which is Pareto-efficient.

In particular, alignment can be achieved by trade.

Consider the following toy example.

Suppose \(X_\alpha=\mathbb{R}^{G}\) representing the “quantities of each good \(G=\{g_1,\dots g_n\}\) that \(\alpha\) supplies”, and \(X_\beta=\{0\}\). \(\beta\), who receives these goods, has a preference on these, e.g. \(U_\beta(x_{\alpha}) = \sum\lambda_i\log(x_\alpha^{g_i})\) — whereas \(\alpha\) only sees costs to producing them \(U_\alpha(x_\alpha)=-\mathbf{p}_{\mathrm{in}}\cdot x_{\alpha}\) where \(\mathbf{p}_{\mathrm{in}}\) are input costs. Then of course, the only Nash equilibrium is \(x_\alpha=(0,\dots, 0)\).

(To be clear, this is also Pareto-efficient. Indeed the case where one agent, \(\beta\)’s wishes are totally ignored is also one possible alignment, the “degenerate case” of alignment if you will. The point will be that … TODO

But now suppose \(\beta\) can bid for the goods.

2. scratch

Suppose we have a good G, with \(X^0_{\alpha}=[0,Q)\) the “quantity of \(G\) that \(\alpha\) tries to consume” and likewise for \(X^0_{\beta}=[0,Q)\). Each agent gets \(\log(1+x)\) utility from its respective consumption; if \(x_\alpha+x_\beta>Q\), the two agents go to war which costs them \(-10^{99}\) utility.

For instance, suppose we have a good G, with \(X_{\alpha}=[0,\infty)\) the “quantity of \(G\) consumed” by \(\alpha\) (with the quantity consumed by \(\beta\) set to \(q-x_{\alpha}\) always, so \(X_{\beta}=\varnothing\)) and \(U_{\alpha}(x_{\alpha})=\log(x_\alpha)\); \(U_\beta(x_\alpha)=\log(q-x_\alpha)\) (\(q\) is the total quantity of G). In the absence of trade where \(\alpha\) is the only actor, the Nash equilibrium is \(x_\alpha=q\) which gives \(\alpha\) a utility of \(q\) and \(\beta\) a utility of \(-\infty\).

However we might consider the following “expanded game”, where:

There is a third agent \(\mu\) whose action is represented by “the price of \(G\)”. It acts by charging agents a price \(p\) for the amount of \(G\) they consume. Its goal is to minimize disequilibrium, i.e. \(U_\mu=-|q-x_\alpha-x_\beta|\)
Then \(U_{\alpha}=\log(x_\alpha)-px_\alpha\) and \(U_{\beta}=\log(x_\beta)-px_\beta\).

Then the Nash equilibrium occurs at \((q/2,q/2,2/q)\). This also maximizes \(U_\alpha+U_\beta\).

It is also possible to think of all of the above as a constrained optimization problem with prices as Lagrange multipliers, etc.

The perfect rolling sphere of alignment

Table of Contents

1. Alignment as Pareto-efficiency

2. scratch