What's funny is the Doom setup accidentally makes the case for RL over biological computing. If the RL algorithm is doing most of the heavy lifting and the brain cells are basically a noisy controller in the middle, the real takeaway isn't "brain cells can play Doom." It's that RL is robust enough to learn even when forced through a stochastic, non-stationary interface. Actually impressive, just not in the way the headlines want it to be.
The Pong learning signal is elegant though. Using unpredictability itself as punishment rather than an explicit reward is a beautiful minimal design.
As someone working on this very topic in industry, I'm very happy to see such a grounded and well-informed take. I think you point out the weaknesses in the methodology and the overblown claims quite effectively, while still providing some insights into the true value hidden behind.
To provide some further insights, there is actually a lot of proper research being done in this field, without any sensationalist claims, but rather data-backed and rigorous analysis. Some good examples include publications such as "Brain organoid reservoir computing for artificial intelligence" (Guo et al., 2023) and "Goal-directed learning in cortical organoids" (Robbins et al., 2026). The latter is a very recent publication, which has cells tackle a pole-balancing task. In many ways, they improve on what has been in the Pong paper, e.g., by separating the execution of the task from the training in time, thereby eliminating the risk that the actual stimuli drive the gameplay rather than the cells themselves, and they also use a dedicated preprocessing pipeline to indentify where the most active cells are located, and even indentifying interconnectivity of the cells to select the most promising targets for encoding and decoding, instead of just randomly selecting some electrodes on the array and relying on the free-energy principle to do its job.
Finally, I want to make a point of where this research will actually generate value in the near future. While the whole "biocomputing" topic is super flashy and attracting a lot of funding from VC & AI start-ups who see this as the next step for computing, I'm very doubtful this will lead to any meaningful results, at least in the near future. The actual value lies elsewhere, namely in gaining a better understanding of how our brain processes information and how neurodegenerative diseases affect this capability. Basically, you can use these learning tasks to assess the performance of healthy neurons compared to diseased ones, and then test which compounds can effectively restore their normal behavior. This has huge implications for the development of new treatments, while also effectively eliminating the need for animal trials.
Hey, thanks so much for the detailed response and the citations, I'll check those out! Glad to hear from someone working in the area. Your skepticism about bio computing makes sense to me, seems like there is a huge gulf between where we are and where it would need to be to be useful, the more scientific/medical questions make a lot more sense!
Very interesting, I hoped someone would write a more nuanced take on this since I saw it in the news.
I've a question. Your explanation about "punishment" makes intuitive sense, but does it actually works? I.e., is it a standard way of reinforcement and could some other reinforcement method produce a better result than "slightly better than chance" in this task?
Yeah I had the same thought, maybe the limiting factor here is the reinforcement method, and if they found a stronger "learning signal" it would be able to learn better. After all, the neurons composing our brains learn much more robustly than this, so it must be possible somehow. I don't know what that might look like with this kind of chip, though!
"But they also face big disadvantages: they have to learn in “real-time”, when computerized algorithms can learn incredibly quickly (the DotA 2 agents mentioned above played the equivalent of hundreds of years of the game, made possible by the fact that they can play however fast the computer can run them)."
Can you explain this? Is that because "real-time" is based (to us) on neural latency?
Yeah, since the brain cells need to undergo real, physical changes, increases in computational speed can’t increase how fast they learn — you’re stuck with neural latency as you say. They can only learn so fast. Whereas for a computer agent, the faster the computer goes, the faster it can chug through samples and update the weights in the model
Still quite balanced Tommy. I think you’ve done a great job all around. I know what you mean with ‘my original intention was x, but boy, when I start think hard I got y’… :)
This is so freaking cool compared to the ANN grad work I did in '91-'93 at Penn State's Applied Research Lab (Navy). You absolutely outclass me on all the neuroscience. My background is Elec Eng! (yikes!). But I absolutely LOVE this stuff!!!
"When stimulation becomes random, the relationships between neurons stop being consistent, so the plasticity rules have nothing stable to reinforce. Because of this, the network will drift toward configurations of activity that produce more structured input ..." But it is precisely when the random and unpredictable occurs that the human gains experience and intuition that allows them to creatively learn and become better at a game, finding even new ways of beating their opponent. We do not operate in closed loops. James' human interaction with an indeterminate, messy buzzing world is precisely why we have become, for a time, the apex predator. And our interactions with Whiteheads dipolar God is my only hope that we might eventually inhabit a new heaven and earth . My own physical experience of learning how to put a spin on a ping pong ball comes to mind: I stumbled backward and still tried to hit an unusually fast ping pong ball by swiping up in order to not miss it.
Great essay! I am still don't understand the bit about the free energy principle and your explanation of it. That seemed like the most interesting part of the essay.
Great deep dive. Really enjoy the read.
What's funny is the Doom setup accidentally makes the case for RL over biological computing. If the RL algorithm is doing most of the heavy lifting and the brain cells are basically a noisy controller in the middle, the real takeaway isn't "brain cells can play Doom." It's that RL is robust enough to learn even when forced through a stochastic, non-stationary interface. Actually impressive, just not in the way the headlines want it to be.
The Pong learning signal is elegant though. Using unpredictability itself as punishment rather than an explicit reward is a beautiful minimal design.
Thanks a lot for this very interesting read!
As someone working on this very topic in industry, I'm very happy to see such a grounded and well-informed take. I think you point out the weaknesses in the methodology and the overblown claims quite effectively, while still providing some insights into the true value hidden behind.
To provide some further insights, there is actually a lot of proper research being done in this field, without any sensationalist claims, but rather data-backed and rigorous analysis. Some good examples include publications such as "Brain organoid reservoir computing for artificial intelligence" (Guo et al., 2023) and "Goal-directed learning in cortical organoids" (Robbins et al., 2026). The latter is a very recent publication, which has cells tackle a pole-balancing task. In many ways, they improve on what has been in the Pong paper, e.g., by separating the execution of the task from the training in time, thereby eliminating the risk that the actual stimuli drive the gameplay rather than the cells themselves, and they also use a dedicated preprocessing pipeline to indentify where the most active cells are located, and even indentifying interconnectivity of the cells to select the most promising targets for encoding and decoding, instead of just randomly selecting some electrodes on the array and relying on the free-energy principle to do its job.
Finally, I want to make a point of where this research will actually generate value in the near future. While the whole "biocomputing" topic is super flashy and attracting a lot of funding from VC & AI start-ups who see this as the next step for computing, I'm very doubtful this will lead to any meaningful results, at least in the near future. The actual value lies elsewhere, namely in gaining a better understanding of how our brain processes information and how neurodegenerative diseases affect this capability. Basically, you can use these learning tasks to assess the performance of healthy neurons compared to diseased ones, and then test which compounds can effectively restore their normal behavior. This has huge implications for the development of new treatments, while also effectively eliminating the need for animal trials.
Hey, thanks so much for the detailed response and the citations, I'll check those out! Glad to hear from someone working in the area. Your skepticism about bio computing makes sense to me, seems like there is a huge gulf between where we are and where it would need to be to be useful, the more scientific/medical questions make a lot more sense!
Very interesting, I hoped someone would write a more nuanced take on this since I saw it in the news.
I've a question. Your explanation about "punishment" makes intuitive sense, but does it actually works? I.e., is it a standard way of reinforcement and could some other reinforcement method produce a better result than "slightly better than chance" in this task?
Yeah I had the same thought, maybe the limiting factor here is the reinforcement method, and if they found a stronger "learning signal" it would be able to learn better. After all, the neurons composing our brains learn much more robustly than this, so it must be possible somehow. I don't know what that might look like with this kind of chip, though!
"But they also face big disadvantages: they have to learn in “real-time”, when computerized algorithms can learn incredibly quickly (the DotA 2 agents mentioned above played the equivalent of hundreds of years of the game, made possible by the fact that they can play however fast the computer can run them)."
Can you explain this? Is that because "real-time" is based (to us) on neural latency?
Yeah, since the brain cells need to undergo real, physical changes, increases in computational speed can’t increase how fast they learn — you’re stuck with neural latency as you say. They can only learn so fast. Whereas for a computer agent, the faster the computer goes, the faster it can chug through samples and update the weights in the model
"If you were hoping to be told super-intelligent biological based computer learning systems are here, you’re probably disappointed."
I was not. I am not.
So interesting and presented in a way that made it understandable and thought provoking. Thank you.
Ditto!
Still quite balanced Tommy. I think you’ve done a great job all around. I know what you mean with ‘my original intention was x, but boy, when I start think hard I got y’… :)
Thanks, Mike!
This is so freaking cool compared to the ANN grad work I did in '91-'93 at Penn State's Applied Research Lab (Navy). You absolutely outclass me on all the neuroscience. My background is Elec Eng! (yikes!). But I absolutely LOVE this stuff!!!
"When stimulation becomes random, the relationships between neurons stop being consistent, so the plasticity rules have nothing stable to reinforce. Because of this, the network will drift toward configurations of activity that produce more structured input ..." But it is precisely when the random and unpredictable occurs that the human gains experience and intuition that allows them to creatively learn and become better at a game, finding even new ways of beating their opponent. We do not operate in closed loops. James' human interaction with an indeterminate, messy buzzing world is precisely why we have become, for a time, the apex predator. And our interactions with Whiteheads dipolar God is my only hope that we might eventually inhabit a new heaven and earth . My own physical experience of learning how to put a spin on a ping pong ball comes to mind: I stumbled backward and still tried to hit an unusually fast ping pong ball by swiping up in order to not miss it.
Fascinating!
Wow!
Great essay! I am still don't understand the bit about the free energy principle and your explanation of it. That seemed like the most interesting part of the essay.
We're going to look back at this in 20 years and...