Did Brain Cells on a Chip Really Learn to Play Doom?
Getting cells on a plate to play pong, the free energy principle, and biological computing
You may have seen the headline: “Human brain cells on a chip learned to play Doom in a week“. I saw it and was immediately curious—how does that work?
The headline evokes an image of a chip with brain cells being hooked up to a camera fixed on a game of Doom, a controller, and reasonable gameplay being produced. I knew this couldn’t be what was happening. Unfortunately, the linked article is very sparse on details. To learn more, you really have to do a deep dive. So I did a deep dive.
Parts of this article are going to get into nitty gritty details that I think are neat, and you might too if you want to understand how to get brain cells on a chip to do anything. But let me give the high-level view here: the company Cortical Labs has built chips with human (and mice) brain cells on them that can be interacted with through electrical input/output. A couple of years ago they published a scientific article about one of these chips playing Pong as a proof of concept. This Doom project builds on that. That much is all true.
What is talked about less is how well the chips played Pong, or how they got the chip to play Doom. For Pong, the cells were given a very simple “game state” telling them if the ball was currently above or below the paddle, and it had to simply map that to the action of moving the paddle up or down. The brain cells learned to play slightly better than a blindfolded monkey mashing a controller would.
For Doom, the story is a bit more complicated: a reinforcement learning algorithm—a form of AI—learned to control Doom by passing simple stimulation through a network of brain cells that controlled the action. It didn’t play well, and it’s unclear how much the network of brain cells contributed versus the AI (more on that below). Though it seems the brain cells learned something, it’s possible the AI would have played the game better without the brain cells in the loop at all.
This all sounds a bit dismissive, but let me be clear—there’s cool science here and there’s interesting potential future technology. But the way this has been presented in the media has been so high-level it paints an extremely misleading picture of what was actually going on.
With that broad picture painted, let’s get into the details.
What even is a neuron-powered computer chip?
Almost all of the details of what Cortical Labs actually did come from their earlier paper where they had their brain chips play Pong. Most of this applies to the Doom project as well, but that builds on the previous Pong work, so let’s talk Pong first.
The published paper on the Pong project is actually a pretty interesting paper, and it’s published in Neuron, one of the top neuroscience journals. This is real science published in a mainstream, high-impact journal.
The first question is: what does it mean to have a chip with brain cells on it? I don’t know much about cell culturing, but the basic idea is pretty simple. They used two types of cells, mouse cortical cells and human cortical cells. For mice, they dissected mice embryos and isolated neural cells. For humans, they followed a protocol known to induce the formation of human neurons from stem cells.
The brain cells are placed onto a multi-electrode array—basically, a chip that has tiny electrical contacts all over it. The electrodes are important because, later, that’s what they used to “communicate” with the cells.
Prepared in the right way (e.g. with the right chemical inhibitors), the cells branch out and form connections with each other. The result is a network of hundreds of thousands of neurons interconnected in a network.
So once you’ve done that, you have a brain cell network on a multi-electrode array chip. Neato!
How do you get a brain chip to control a game?
Once you have a chip with a network of interconnected brain cells on it, how do you actually get them to do anything?
One problem is just getting information into and out of the network of cells. Luckily, since they were prepared on a multi-electrode array, that bit is easy—electrodes are two-way streets. You can run current into them to provide information, which stimulates them and makes them more likely to fire. On the flip side, you can record electrical activity taking place in them to get information out. Since neurons communicate using electricity, this isn’t dissimilar to how neurons talk to each other.
The researchers designated sections of the electrode grid to be the “sensory area”, where electrical stimulation would be based on what was going on in the game. Other parts were designated motor regions, and electrical activity controls the actions in the game.

To get a bit more specific, the “sensory area” had 8 electrodes that were used for stimulation—so it was a bit like the network had 8 “pixels” on its display. Each electrode represented a different position of the ball relative to the paddle. The intensity of the stimulus had 5 different levels (4Hz to 40Hz) that indicated how far the ball was.
To be clear, the neurons weren’t directly “seeing” the game like we humans were. They instead were sort of “seeing” the game from the paddle’s point of view, and with only the resolution of 8 pixels.
The “motor regions” had a similarly simple set-up: sum up the neural activity in the two “up” areas and compare that to the “down” regions. If there was more activity in the “up” regions, the paddle moved up, and if there was more activity in the “down” regions, the paddle moved down.
It’s worth pausing to note how simple the task is to learn based on this set up. There’s a direct relationship between the stimulus and what action you should take. If one of the “above you” electrodes is stimulating, you should move up; if one of the “below you” electrodes is firing, you should move down. This is a pretty trivial rule for any learning algorithm, and could be hardcoded in any programming language with a single “conditional” statement. This really isn’t asking much of the neural network to learn.
Based on the videos they show of the game, it appears the ball moves more slowly than the paddle, so it’s possible that learning this simple mapping would result in a perfect performance where the ball never gets past the paddle.

At this point from what I’ve described, there isn’t any real playing of the game. There’s electrical signals put in one end based on the state of the game, and electrical activity read out the other end that controls the game, forming a closed loop. But there’s nothing connecting performance in the game to the output of the network—to anthropomorphize a bit, the neurons have no reason to “care” what’s going on in the game.
So how do you get the network to actually play?
Training a brain network to play Pong
This is where things get interesting. There was an additional component the researchers used to get the network to learn the game: when the paddle missed the ball, it got random, unpredictable stimulation in the sensory region.
To be clear, this doesn’t hurt the neurons or the network in any way. But one way to look at this is as a sort of punishment for the network.
In traditional reinforcement learning, agents (whether living or computational) are given an environmental state, a selection of actions, and then outcomes based on the actions taken. If something good happens, it’s a reward, but you can have negative outcomes—penalties or punishments, whatever you want to call them—as well. The agent then can learn (through various algorithms) what mappings between state and action result in good outcomes.
In the Pong set up, the environmental state is communicated via the sensory areas, and the actions selected via the motor areas. Brain cell networks are good at learning, so all it needs is an outcome. The unpredictable stimulation can be seen as a negative outcome that the network is then learning to avoid.
But why would unpredictable stimulation be negative?
The paper extensively cites the free energy principle. Broadly put, it states all biological systems try to minimize “free energy”, which is basically the amount of unpredicted activity the system experiences. So, the theory goes, the network will learn to avoid the random stimulation because it’s hard to predict. As long as it plays the game well (e.g. hits the ball), nothing unpredictable will happen.
The free energy principle is often given as a unifying framework for thinking about biological systems. I’ll be honest—I don’t love this way of thinking about it, it’s a bit too abstract. How does the network learn to reduce free energy?
I find it easier to think in terms of simpler learning rules. Synaptic plasticity tends to strengthen consistent correlations in neural activity—this is how the brain finds structure in our environment (including the environment of our brain). Connections are reinforced when input and output patterns repeatedly occur together. When stimulation becomes random, the relationships between neurons stop being consistent, so the plasticity rules have nothing stable to reinforce. Because of this, the network will drift toward configurations of activity that produce more structured input.
This is all saying the same thing as “reducing uncertainty” and gesturing at the sorts of mechanisms that do this, but I find this framing a bit easier for me to wrap my head around.
Regardless of how you think about it, the important point is that random stimulation can be used as a learning signal for the network. This is super cool! We can teach a network of biological neurons to do an arbitrarily selected task using this learning signal!
How Well Can It Actually Play?
The details up to here are, in my opinion, pretty impressive. It’s interesting that we can grow a network of brain cells on a microelectrode array, and teach them using a learning signal. Now we get to the disappointing bit: How well did it play Pong?
Not great.
For a scientific paper like this, you’re generally looking for some statistically significant effect that proves whatever is going on isn’t just chance. In this case, what they were trying to prove was that the brain cells learned. Not learned to perform well, just learned at least a little. And that they did—across a bunch of measures, they learned to play better over time and better than various “controls” (chips set up without learning).
But none of that means they played Pong well. And note that this is a simplified version of Pong, where the ball’s movement is slow and easily predictable.
One measure of performance is Aces—how often does the ball just go right past the paddle without a single hit. On average, just because of the size of the paddle, the ball is hit back by the controls that can’t play the game almost half the time. So the rate of Aces by chance is about 50-55%. After learning, the chips with brain cells (MCC or HCC) decrease this to… about 48%-50%. Again, this is statistically significant, so there’s something going on. But this isn’t impressive Pong playing.

Other measures of performance show a similar pattern—small but statistically significant performance increases. Average Rally Length (the average number of times the paddle hits the ball before letting it through) goes from 0.7-0.9 to 1.0-1.1. In other words, the brain chips are on average managing to hit the ball about once before it goes past the paddle. Another measure is the percentage of “Long-Rallies”, where they manage to hit the ball 3 times in a row before it goes past. By chance this measure is around 4%-10%. After learning the brain chips were at 10%-12%.
These are statistically significant effects, but on an objective level, this is not impressive playing. Given how simple the task is (like I mentioned above, learning a simple mapping from input to output would likely result in perfect play).
In the promotional videos showing off the brain chip playing Pong, they show long rallies and impressive seeming responsiveness. The stats reported in the paper tell a very different story: those long rallies happen infrequently, and you need to do statistical analyses to tell if the playing is any better than chance. But when you have a lot of games to pick from, you can find a few rallies where the performance looks good for a promotional video.
What About Doom?
Okay, so far I’ve been talking about Pong. What about the recent popular press articles about Doom?
There isn’t a published academic article about the Doom project. Beyond the promotional video, there is some documentation, but nothing peer-reviewed, and there’s basically no details about the performance except a few anecdotes. But there are details on the implementation.
If you followed the above about Pong, you probably have a big question about Doom: there were only 8 input channels for Pong, corresponding to the location of the ball relative to the paddle. How do you encode everything that’s going on in Doom into only 8 inputs, each of which with only a few possible values?
The answer is, you don’t. Here’s the big disappointing thing about how the chip “plays” Doom: a reinforcement learning algorithm is fed the information about the state of the game, and then outputs “commands” to the neural network through 8 channels. Unlike with Pong, there isn’t a simple mapping of what those channels mean, they’re just the “actions” the algorithm can take.
Activity in the network is read off of 7 different areas for different possible actions: move forward, back, left, or right. Turn left or right. Attack.
This is certainly more complex than Pong where there are just two actions (up or down). The weird thing here is the reinforcement learning agent is learning along with the brain cells. Since the reinforcement learning algorithm gets 8 outputs, and the brain cells get 7, it’s not really clear what the point of the brain cells is. The algorithm has 8 buttons it can press to get the brain cells to press one of 7 buttons, so the brain cell network is sort of acting like a crappy non-deterministic controller.
If there was just a direct mapping from the reinforcement learning algorithm to the 7 actions in the video game, could it play Doom better than the combination of algorithm plus brain cells? This, as far as I can tell, hasn’t been tested. I suspect the answer is yes. We know that reinforcement learning algorithms can learn incredibly complex games—7 years ago, OpenAI unveiled DotA 2 AI that could beat the world champions. DotA 2 is an extremely complicated team-based game, with many more than 7 actions available at any given time. Now, the algorithm playing Doom was given much less playtime to learn from than the DotA 2 bots—it’s possible that based on the small amount of data, the algorithm couldn’t learn that well on its own and benefits from the brain chip. But that hasn’t been shown.
If all that’s happening here is a reinforcement learning algorithm is playing Doom with the handicap of needing to work with the brain cell chip, that’s not very exciting.
Luckily, the brain cell people have a response to this idea that the reinforcement learning algorithm is doing all the work:
Isn’t the encoder/PPO doing all the learning?
This question largely assumes that the cells are static, which is incorrect; it is not a memory-less “feed X in, get Y” machine. Both the policy and the cells are dynamical systems; biological neurons have an internal state (membrane potential, synaptic weights, adaptation currents).
The same stimulation delivered at different points in training will produce different spike patterns, because the neurons have been conditioned by prior feedback. During testing, we froze encoder weights and still observed improvements in the reward.
The first paragraph of their response is basically saying “Well, brain cells change, so really we’ve made it harder for the AI to learn”. Which is a bit like saying “Isn’t it impressive that even when we gave the reinforcement learning algorithm a kind of broken controller it still can sort of play?”
The second paragraph is a bit more interesting—the claim that freezing the encoder weights (basically stopping the reinforcement learning algorithm from learning any more), you still see improvement, means the neural network is learning something. That’s good, but without any quantification it’s hard to know quite how well it’s learning, and it’s a far-cry from saying the network is playing the game.
What would be a bit more impressive is if they showed this set up, with the brain cells “in the loop”, could learn better than a reinforcement learning algorithm in direct control of the actions in the game. You still wouldn’t be able to claim the network was playing Doom, but at least you could say it was contributing positively, and quantify that contribution.
We don’t have much to go off of to determine how well this set up actually played Doom, but there is this quote, from the COO/CSO of Cortical Labs, in their promotional video for the project:
Right now, the cells play a lot like a beginner who has never seen a computer.
When someone heavily incentivized to hype this project up is saying that, I think it’s fair to say the performance isn’t great.
Is there anything to this?
When I started writing this, I planned for a nice balanced perspective, cutting through the hype without completely dismissing the results. But as I wrote and dug in more, I realized just how far the hype was from reality. I don’t think we can really claim the brain cells on chips are doing anything close to useful in either the Pong or the Doom case. With Pong, they are doing slightly better than a randomly hooked up circuit. With Doom, they might be doing even worse than that and actively decreasing performance compared to a random but deterministically hooked up circuit.
If you were hoping to be told super-intelligent biological based computer learning systems are here, you’re probably disappointed. But I don’t mean for this to be a takedown article—there’s lots of neat things with these projects, and I learned a lot by looking into them.
The fact that the chips are capable of any learning, and there is a simple way to train them (providing random stimulation) is really cool. A lot of what Cortical Labs has done is provide a simple interface for programmers to use to interact with the cell networks more easily, which is also impressive technology.
There’s real promise for these kinds of brain chips. We know that human brain cells, configured in the right way, can do pretty incredible things. Humans can play Doom! And there’s reason to think these brain cell chips can learn more efficiently (e.g. from fewer samples) than artificial neural networks, and they’re more energy efficient than training artificial neural networks. But they also face big disadvantages: they have to learn in “real-time”, when computerized algorithms can learn incredibly quickly (the DotA 2 agents mentioned above played the equivalent of hundreds of years of the game, made possible by the fact that they can play however fast the computer can run them).
So biological learners might have advantages in certain situations, while artificial ones that are completely computerized have advantages in others. But one thing’s for sure: this form of biological computation has a long way to go before it’s useful, while computerized learning networks are doing things now that are pretty incredible.
More practically, this kind of technology makes it possible to do research into the ways these networks learn in a systematic way. There’s scientific value, even if there doesn’t end up being any practical value to the chips themselves.
If you enjoyed this, please hit the “Like” ❤️ button, restack, or share this article to help others find it.
If you enjoy Cognitive Wonderland, consider supporting it by becoming a paid subscriber at whatever level feels comfortable for you.
If you’re a Substack writer and have been enjoying Cognitive Wonderland, consider adding it to your recommendations. I really appreciate the support.






Great deep dive. Really enjoy the read.
What's funny is the Doom setup accidentally makes the case for RL over biological computing. If the RL algorithm is doing most of the heavy lifting and the brain cells are basically a noisy controller in the middle, the real takeaway isn't "brain cells can play Doom." It's that RL is robust enough to learn even when forced through a stochastic, non-stationary interface. Actually impressive, just not in the way the headlines want it to be.
The Pong learning signal is elegant though. Using unpredictability itself as punishment rather than an explicit reward is a beautiful minimal design.
Thanks a lot for this very interesting read!
As someone working on this very topic in industry, I'm very happy to see such a grounded and well-informed take. I think you point out the weaknesses in the methodology and the overblown claims quite effectively, while still providing some insights into the true value hidden behind.
To provide some further insights, there is actually a lot of proper research being done in this field, without any sensationalist claims, but rather data-backed and rigorous analysis. Some good examples include publications such as "Brain organoid reservoir computing for artificial intelligence" (Guo et al., 2023) and "Goal-directed learning in cortical organoids" (Robbins et al., 2026). The latter is a very recent publication, which has cells tackle a pole-balancing task. In many ways, they improve on what has been in the Pong paper, e.g., by separating the execution of the task from the training in time, thereby eliminating the risk that the actual stimuli drive the gameplay rather than the cells themselves, and they also use a dedicated preprocessing pipeline to indentify where the most active cells are located, and even indentifying interconnectivity of the cells to select the most promising targets for encoding and decoding, instead of just randomly selecting some electrodes on the array and relying on the free-energy principle to do its job.
Finally, I want to make a point of where this research will actually generate value in the near future. While the whole "biocomputing" topic is super flashy and attracting a lot of funding from VC & AI start-ups who see this as the next step for computing, I'm very doubtful this will lead to any meaningful results, at least in the near future. The actual value lies elsewhere, namely in gaining a better understanding of how our brain processes information and how neurodegenerative diseases affect this capability. Basically, you can use these learning tasks to assess the performance of healthy neurons compared to diseased ones, and then test which compounds can effectively restore their normal behavior. This has huge implications for the development of new treatments, while also effectively eliminating the need for animal trials.