Vibe Physics? Not Yet

Sabine Hossenfelder tried to “vibe physics” using popular LLM models, and she was underwhelmed. So if you’re a physicist, I guess your job is safe for now. I’m not a physicist myself, and I couldn’t understand half the things Sabine said. That said, she points out that the current crop of LLMs are good at literature review, and that’s been my experience too. Gemini’s Deep Research in particular is quite useful for getting a quick sense of what the existing literature says about a particular topic.

So, my verdict is: GPT top. Then there’s a big gap. Then there’s Grok, followed by Gemini 2.5, which might well be right that it’s impossible. Gemini DeepThink, which really isn’t worth the money, and at the very bottom, there is Claude. I think this illustrates what the current models are good for and what not. What they are good at: they’re really good now at digging up related work and explaining it, which is good for brainstorming. What they’re not yet good at: first of all, they constantly conflate similar-sounding but different physical concepts. Energy is not the same as free energy. An equation can be time reversible but not invariant under time reversal. In another thing I was working on, GPT kept confusing two different Feynman diagrams, both of which are sometimes referred to as “self-energy.” And the issue is, a student you just have to tell this once: “You are confusing these things.” The models will bring back these mistakes over and over.

The second related problem is that they sometimes switch notation in the middle of a reply, or just switch to a different topic. For example, the other day GPT was going on about perturbative quantum gravity for an hour, and then all of a sudden, it switches to canonical quantum gravity and then to the semiclassical approximation. And if you don’t know what these things are already, you’ll end up with a lot of rubbish that doesn’t fit together.

The third problem, and that’s the biggest issue I think, is that they don’t actually develop new ideas in any sense. In the best case, they assemble reasonable-looking equations and then massage them so that they prove whatever you want them to prove, by skipping over the actual proof. The LLM idea of a new theory is a plausible-looking sequence of arguments, not an actually correct one. One way you can try to prevent this is just by asking, “Is this correct or did you just make this up?” And sure enough, most of the time, they’ll just admit they made it up.

So, my verdict for the moment is it’s a mixed bag. The current models are very much stuck to the existing literature, which isn’t useful if you want to do something new. If you kick them enough, they will eventually agree to do anything, but then you can’t trust them. For the time being, my advice would be to use them for literature research and background information. You can also get them quite effectively to criticize an idea if you specifically ask for it, but don’t trust them with new ideas. By my assessment, these models are currently not anywhere near as good as a good student. So, I guess physicists’ jobs are safe from AI for the time being.

https://www.youtube.com/watch?v=CbO2YosyTt4

Vibe Physics? Not Yet

What is artificial intelligence (AI)?

How Americans use AI and what they think about it

Join the Conversation