Vibe Physics? Not Yet
Sabine Hossenfelder tried to âvibe physicsâ using popular LLM models, and she was underwhelmed. So if youâre a physicist, I guess your job is safe for now. Iâm not a physicist myself, and I couldnât understand half the things Sabine said. That said, she points out that the current crop of LLMs are good at literature review, and thatâs been my experience too. Geminiâs Deep Research in particular is quite useful for getting a quick sense of what the existing literature says about a particular topic.
So, my verdict is: GPT top. Then thereâs a big gap. Then thereâs Grok, followed by Gemini 2.5, which might well be right that itâs impossible. Gemini DeepThink, which really isnât worth the money, and at the very bottom, there is Claude. I think this illustrates what the current models are good for and what not. What they are good at: theyâre really good now at digging up related work and explaining it, which is good for brainstorming. What theyâre not yet good at: first of all, they constantly conflate similar-sounding but different physical concepts. Energy is not the same as free energy. An equation can be time reversible but not invariant under time reversal. In another thing I was working on, GPT kept confusing two different Feynman diagrams, both of which are sometimes referred to as âself-energy.â And the issue is, a student you just have to tell this once: âYou are confusing these things.â The models will bring back these mistakes over and over.
The second related problem is that they sometimes switch notation in the middle of a reply, or just switch to a different topic. For example, the other day GPT was going on about perturbative quantum gravity for an hour, and then all of a sudden, it switches to canonical quantum gravity and then to the semiclassical approximation. And if you donât know what these things are already, youâll end up with a lot of rubbish that doesnât fit together.
The third problem, and thatâs the biggest issue I think, is that they donât actually develop new ideas in any sense. In the best case, they assemble reasonable-looking equations and then massage them so that they prove whatever you want them to prove, by skipping over the actual proof. The LLM idea of a new theory is a plausible-looking sequence of arguments, not an actually correct one. One way you can try to prevent this is just by asking, âIs this correct or did you just make this up?â And sure enough, most of the time, theyâll just admit they made it up.
So, my verdict for the moment is itâs a mixed bag. The current models are very much stuck to the existing literature, which isnât useful if you want to do something new. If you kick them enough, they will eventually agree to do anything, but then you canât trust them. For the time being, my advice would be to use them for literature research and background information. You can also get them quite effectively to criticize an idea if you specifically ask for it, but donât trust them with new ideas. By my assessment, these models are currently not anywhere near as good as a good student. So, I guess physicistsâ jobs are safe from AI for the time being.
Join the Conversation
Share your thoughts and go deeper down the rabbit hole