AI in the Tokamak: Controlling Fusion Plasma
Reinforcement learning now drives magnetic plasma control on real tokamaks. What DeepMind and TCV proved, and where ML sits on the path to fusion.
A tokamak confines a fusion plasma — a churning, hundred-million-degree fluid of ionized hydrogen — inside a magnetic cage. The cage is not static. The plasma is unstable on millisecond timescales, and holding its shape, position, and current demands continuous feedback to dozens of electromagnet coils. For decades that feedback was hand-engineered: each coil got its own controller, painstakingly tuned by plasma physicists for one specific configuration. In 2022, a team from DeepMind and the Swiss Plasma Center at EPFL showed that a single neural network, trained with reinforcement learning, could do the job end to end. It is one of the more convincing demonstrations that learned control can run a hard, fast, expensive physical system without falling apart.

Why plasma control is genuinely hard#
Start with the timescales. A tokamak plasma drifts and deforms in milliseconds. The Variable Configuration Tokamak (TCV) in Lausanne, the testbed for the DeepMind work, runs its magnetic control loop at 10kHz — a fresh set of voltage commands to every power supply every 100 microseconds. There is no room to pause and think. A controller that misses its deadline does not degrade gracefully; the plasma touches the wall, the discharge terminates, and you have wasted a shot that took hours to set up.
Then there is the coupling. TCV has 19 magnetic control coils, and they do not act independently. Push current into one and the field everywhere shifts. The plasma’s own current redistributes the field again. Shape, vertical position, and plasma current are entangled, and a vertically elongated plasma is actively unstable — left alone, it accelerates into the wall. Classical control handles this with a stack of single-input controllers plus an outer estimator that reconstructs the plasma boundary from magnetic sensors, all linearized around a nominal operating point. It works, but every new plasma shape is a new tuning project, and the linearization breaks down exactly where the interesting physics lives: highly shaped, high-performance configurations near the stability limit.
This is the part engineers underestimate. The control problem is not “keep a blob centered.” It is a high-dimensional, nonlinear, safety-critical control task with hard real-time deadlines, partial observability through noisy magnetic sensors, and a plant you cannot afford to crash while learning.
What reinforcement learning actually changed#
The DeepMind and EPFL approach, published in Nature as “Magnetic control of tokamak plasmas through deep reinforcement learning,” reframed the whole stack. Instead of a controller per coil plus a separate shape estimator, a single neural network maps raw magnetic measurements directly to voltage commands for all coils at once. The policy was trained entirely in simulation using a free-boundary plasma model, then deployed on TCV without further tuning — the agent learned the coupled dynamics that engineers normally encode by hand.
The headline is not just that it worked, but the range of what it controlled. The trained policies produced and held elongated and conventional shapes, and advanced configurations including negative triangularity and ‘snowflake’ divertor geometries that are awkward for classical controllers. In one run the agent maintained two separate plasmas — “droplets” — inside the vessel simultaneously, a configuration nobody builds a hand-tuned controller for because it is purely a research curiosity. The point of that demonstration is generality: the same training pipeline produces a controller for a new target shape by changing the reward, not by re-deriving control law.
Two things make this more than a stunt. First, the agent ran at the real 10kHz control rate on real hardware, which means the network is small and the inference path is deterministic enough to hit a 100-microsecond budget. Second, training happened in a simulator and transferred to a physical device on the first try — the sim-to-real gap, the thing that kills most learned-control demos, was closed well enough to control an actual fusion plasma.
The follow-up nobody talks about#
The 2022 result was a proof of capability with real weaknesses: steady-state shape errors, current drift, and long training times. The follow-up, Towards practical reinforcement learning for tokamak magnetic control (Tracey et al., 2024), is the one that matters for anyone thinking about deployment. It reported up to a 65% improvement in shape accuracy in simulation, reduced the long-term bias in plasma current, and cut the training time needed to learn a new task by a factor of three or more. That trajectory — from “it works” to “it works accurately and trains fast enough to iterate” — is the difference between a paper and a tool.
DeepMind also open-sourced TORAX in 2024, a differentiable tokamak transport simulator written in JAX. A fast, differentiable physics model is the unglamorous foundation here: it is what lets you train and validate controllers, run gradient-based optimization through the plasma dynamics, and test policies against thousands of scenarios before risking a real shot. The lesson generalizes well past fusion — a good simulator is usually worth more than a clever model.
It is worth dwelling on why differentiability matters so much. A conventional simulator gives you a forward map: feed in coil voltages, get out a plasma evolution. A differentiable one gives you the gradient as well — how the outcome changes with every input — which means you can optimize the controller directly through the physics instead of treating the simulator as an opaque oracle you can only query. That turns controller design from blind search into gradient descent, and it is the reason the training loop converges in hours rather than weeks. When people ask why reinforcement learning suddenly works on a sixty-year-old machine, the honest answer is usually that the simulator and the tooling around it finally got good enough, not that the learning algorithm is magic.

Where this sits on the path to fusion power#
Honesty matters here, because fusion attracts hype the way few fields do. Learned magnetic control does not make net-energy fusion happen. The hard problems — sustained confinement, plasma-facing materials that survive neutron flux, tritium breeding, and the basic question of getting more energy out than you put in — are physics and engineering problems that no controller solves. What AI control changes is the cost and speed of experimentation, and the reachability of high-performance regimes.
Concretely, three contributions are real and bounded. It compresses the controller design cycle: where a new plasma scenario once meant weeks of expert tuning, a trained agent can target it by adjusting a reward and retraining. It reaches configurations that are hard to stabilize classically, including the strongly shaped, near-limit plasmas that next-generation devices need to run. And it points toward integrated control — handling shape, current, heating, and instability suppression in one policy rather than a federation of loops that fight each other.
There is also a sharper near-term target than steady-state shaping: event avoidance. Disruptions — sudden, total losses of plasma confinement — are the single most dangerous failure mode for large tokamaks like ITER, where the mechanical and thermal loads of an uncontrolled disruption are severe. Predicting and steering away from a disruption in real time, from noisy sensor streams, on a sub-millisecond budget, is exactly the shape of problem reinforcement learning and online inference are suited to. Several groups are pursuing it. It is not solved, and the consequences of a false negative are real, which keeps the field appropriately conservative about putting a black box in the safety loop.
It is also worth being clear about who this serves. The DeepMind and EPFL work was done on a public research tokamak, but the wider fusion field has shifted. A wave of privately funded efforts — Commonwealth Fusion Systems building the SPARC device around high-temperature superconducting magnets, TAE, Tokamak Energy, and others — is pushing toward demonstration machines on aggressive timelines, and every one of them faces the same control problem at higher field strengths and tighter tolerances. Stronger magnets mean faster instabilities and less margin for error, which makes fast, adaptable, learned control more valuable, not less. The research-to-industry handoff here is unusually direct: a control technique proven on a university tokamak maps cleanly onto a commercial device chasing net energy, because the underlying physics of holding a plasma is the same regardless of who funds the reactor.
The engineering reality, not the press release#
For anyone building hard real-time AI systems outside fusion, the tokamak work is a useful reference precisely because the constraints are uncompromising. The model has to be small enough to run deterministically inside a fixed control budget. The simulator has to be accurate enough that sim-to-real transfer survives contact with a real plant. The reward has to encode safety, not just performance, because the cost of a crash is measured in damaged hardware. And critically, the learned controller does not run unsupervised — it sits inside a conventional protection system that will take over the instant the plasma strays outside a safe envelope. That layered architecture, a learned policy wrapped in deterministic guardrails, is the right pattern for almost any high-stakes Operational Automation problem, from grid control to industrial process loops to the data pipelines and Data Platforms that feed them.
The TCV results are not a claim that fusion is around the corner. They are evidence that one of the field’s stubborn engineering bottlenecks — controlling a violently unstable plasma across a wide range of shapes — yields to learned control, fast, and on real hardware. That is a narrow, verified, genuinely useful result. In a domain full of inflated timelines, narrow and verified is the kind worth paying attention to.
Building real-time control or AI Implementation where a missed deadline has physical consequences? We design learned systems with deterministic guardrails, not black boxes. Talk to our engineers.