This video discusses the potential for an "intelligence explosion" in AI, focusing on self-improving AI and the implications of DeepMind's AlphaZero. Wes Roth interviews Demis Hassabis, founder of DeepMind, to explore the convergence of different AI approaches and the possibility of rapid advancements in AI capabilities.
The transcript doesn't specify the exact evolutionary programming techniques used alongside foundation models in DeepMind's work. It only mentions that evolutionary programming techniques were paired with the latest foundation models.
AlphaGo was trained on human game data, learning to replicate human strategies. AlphaGo Zero, conversely, started with a blank slate and learned to play by playing itself (self-play), resulting in surpassing AlphaGo's performance significantly within 36 hours and beating it 100-0 by the 72-hour mark.
The "Absolute Reasoner" approach uses two models: a "proposer" that suggests coding problems or questions, and a "solver" that attempts to solve them. This creates a self-play loop where the solver's improvements inform the proposer to create more challenging problems, leading to iterative self-improvement in coding abilities.
The video highlights a shift in AI training. Initially, the focus was on massive pre-training compute. The speaker suggests the next major wave will be a significant increase in reinforcement learning compute (RL compute). This increased RL compute dwarfs the pre-training compute in the presented chart, indicating a prioritization of reinforcement learning for enhancing AI performance. This is expected to lead to substantial progress, especially in areas like coding where self-improvement through reinforcement learning shows promise.