agents simultaneously learning through backprop
have prices calculated via backprop instead of traders
differentiating wrt the agent space
hybrid setup where search is only called when necessary
emulate markets but via reward estimation instead of losing bets outright — try this with o1 kinda things first. will also make the comparison between markets and backprop — how markets allow for arbitrary jumps — clearer
decentralized framework