Flavors of AI
my entirely subjective review of current leading model providers
I can confidently say that I’m a power user of AI. I’m subscribed to the highest tiers of ChatGPT, Gemini, Grok, and Claude (don’t tell my CFO). I use all of them on a regular basis. For most people, they are all really good.
If you’re doubting which one to use and want a one-size-fits-all solution, I recommend Gemini. It’s fast, reliable, and very strong at what it does, easy to use, and also really nice that it’s good within the Google ecosystem. But there are other use cases. So let me just go through what I think they are good at today as of January 2026 and what I use them for.
The process here is 100% vibes from my experience. Your milage may vary.
Deep research
So every one of the AI tools has different deep research functionalities, and all of them search the web to some degree and take a long time to provide a particular output.
Having used this functionality particularly a lot with ChatGPT, Grok, and Gemini, I can confidently say that Gemini is heads and toes above the rest. The output of deep research with Gemini is almost always longer than all the others, and it’s easy to mistake, of course, a really long output for something that is good. But if I put it head to head against the others, I find that its analysis is sharper. It’s more critical about assumptions about things that it finds on the web.
And particularly if you use things around numbers, like my post the other day on getting rich as a founder or in an exit, I found that Gemini was much more critical. Even if I put in the analysis of the other AIs into Gemini, it was very good at making rational analyses about these. For example, it would call out mistakes or overgenerous interpretations of both ChatGPT and Grok.
Winner entirely on vibes: Gemini
Trends and gossip
Okay, here’s one thing that I like to do. I would love to figure out what is the latest on something.
For example, I want to know what are the latest rumors related to the release of a new car or what is happening in a particular company.
And for this kind of data, you really want to be in a social network. What I found is that Grok, because it’s plugged in very directly to X.com, works incredibly well.
Whenever I want to know what is the latest on this without the AI just scanning for news headlines, Grok is actually really, really great, and I use it for that. It’s also really well made, as in it’s a really good app. It’s easy to use, it’s fast, and it’s reliable. So generally, I find Grok to be quite good at that, and I recommend using it specifically for that.
Winner, by my subjective standards: Grok
Programming
I have been a fan of the Codex CLI made by OpenAI. Under the hood, it uses a specialized model of ChatGPT 5.2 called GPT 5.2-Codex. I found that this works incredibly well. It’s very easy to use, and if you are on Pro, you have incredible usage. I use it all the time and I’ve never run into a big wall, so it seems to be really reliable and work quite well.
However, I’ve been experimenting more and more with Claude Code, and I found that in certain circumstances, it’s just more intentional and more deliberate in solving particular problems that even at its highest compute version, GPT-5.2 Codex could not solve.
The command line interface of Claude Code is incredibly good and far ahead of that of Codex. Having said that, I use both regularly, and Claude Code has the downside that it has very restrictive limits where you need to be in a higher cost tier. And so to me, it’s a bit of a toss-up.
I’ve also used Gemini, for example, inside of Cursor, and I even used Cursor’s own model, composer-1, which, although weaker, is really fast. Which is nice. And so I don’t really think there is one winner. I think we, the consumers, the people that are programming, are the true winners of all of this. But if I were to currently pick out a winner and ignore the usage aspect, I think Claude Code from Anthropic comes out on top.
Winner: Claude Code (with a strong second place to Codex by OpenAI)
Image generation
All large AI models are able to generate images, but I always use Nano Banana Pro from Google - which runs in Gemini (select ‘Pro’). In my testing (which is mostly generating dinosaurs in monster trucks - thank my son) it’s head and shoulders above the rest of the competition in initial generation.
I did find that subsequent changes to an image are occasionally a bit more difficult. For example, if you want to exchange the T-rex with the Stegosaurus in an image (something I expect you all to do regularly), Nano Banana Pro can be stubborn.
Anything else
I’ve been a long-time user of ChatGPT. I have a million conversations there that the AI can use to refer back to. So I find myself using ChatGPT almost always as default if not for the more specific use cases above.
If it wasn’t for that, would I still use it as default? I’m not sure.


Great read as always Job!
Hey, great read as always. Your deep dive into each AI's research capabilities is incredibly insightful. I'm particularly struck by your observation about Gemini's sharper critical analisys. Have you considered if this stems from a different approach to semantic indexing or maybe a more robust fact-checking layer in its architecture?