ChatGPT vs Claude vs Gemini: AI’s Hidden Stack Wars [2025]

Most major AI chat tools are converging on similar capabilities—but the way they’re architected under the hood is still pretty different.

🧠 ChatGPT (OpenAI)

Infra: Runs on Microsoft Azure, tightly coupled with OpenAI’s tools.
Tool Use: Native tool calling¹ (code, image, file edit, web scrape) using real execution containers.
File I/O: Can generate downloadable PDFs, charts, docs in-session.
Quirks: You’re often talking to a full-stack assistant pretending to be just a chatbot. Prompting is a gateway to actual compute pipelines.

🧠 Claude (Anthropic)

Infra: AWS-hosted. Uses Anthropic’s in-house Claude models.
Tool Use: Doesn’t natively run code or generate files (yet). No live tool calls or charts—only simulated reasoning.
File I/O: Can read and understand uploaded documents (PDF, CSV), but doesn’t generate artifacts like PDFs or charts.
Quirks: Safer, more “aligned”² in tone. Often better at long-form reasoning and philosophy, but can’t do much beyond text.

🧠 Gemini (Google DeepMind)

Infra: Google Cloud Platform.
Tool Use: Some native integrations with Docs, Sheets, and Gmail. Can code, but currently lacks full on-demand code execution like ChatGPT.
File I/O: Limited. More about integrating into Google Workspace than producing standalones.
Quirks: Fast, good at web data, but gated inside the Google ecosystem. Less control for devs.

🧠 Mistral / Mixtral + Tool Wrappers (Ollama, LangChain, etc.)

Infra: Self-hosted or local.
Tool Use: You wire it yourself—great for hacking, but no “batteries included.”
File I/O: Whatever you build. You’re the infra.
Quirks: Fast, open, local—but you’re DevOps now.

TL;DR

Feature	ChatGPT	Claude	Gemini	Mistral/Ollama
Runs Code	✅ Native	❌ Simulated	⚠️ Partial	⚙️ DIY
File Generation	✅ PDF, PNG	❌ None	⚠️ Workspace	⚙️ DIY
Image Tools	✅ Native	❌ None	❌ None	⚙️ Pluginable
Web Search	✅ Tool	❌ None	✅ Built-in	⚙️ Optional
Cloud Stack	Azure	AWS	Google Cloud	User Choice
Local Option	❌	❌	❌	✅

Hardware Layer: Chips & Fabs (2025)

While the front-end feels conversational, what powers these models is a race at the silicon level. Here’s how the top players compare when it comes to AI chips and manufacturing.

Chip Family	Region	Maker	Use Case	Process Node	Notes
NVIDIA H100³ (Hopper)	US	NVIDIA / TSMC	Training & Inference	TSMC 4N (~5nm)	The current standard for frontier models like GPT-4, Claude 2, Gemini.
NVIDIA Blackwell (B100, B200)	US	NVIDIA / TSMC	Next-gen Training	TSMC N4P + CoWoS	Announced 2024. Major leap in FLOPs and memory bandwidth. Likely behind GPT-5 and Gemini Ultra roadmap.
TPU v5	US	Google (in-house)	Training & Inference (Gemini)	Undisclosed	Optimized for Google’s own models. Not publicly available.
Huawei Ascend 910B⁴	China	Huawei	Training & Inference	SMIC 7nm (DUV)	Used inside Huawei’s Peng Cheng Cloud Brain and MindSpore stack. Fabbed domestically under restrictions.

Most Western AI models still run on NVIDIA silicon via Azure (OpenAI), AWS (Anthropic), or GCP (Gemini). China’s stack is catching up through a mix of domestic fab workarounds (SMIC) and stockpiled GPUs.

What’s Next (Probably)

As models converge, the battleground shifts from what they can say to what they can actually do. Expect to see:

More native toolchains — Claude and Gemini will likely introduce code execution and file generation to stay competitive.
Persistent memory + workflows — Moving from single prompts to full sessions that span tasks, days, even projects.
Agentic behavior — Not just answering questions, but autonomously completing goals (“generate + email a weekly report”).
Local-first AI — Mistral and other open models are pushing toward self-hosted assistants with privacy and offline control baked in.

The future’s less about personalities⁵ and more about infrastructure as interface. Whoever controls the runtime wins.

https://python.langchain.com/docs/concepts/tool_calling/ ↩︎
https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback ↩︎
Each H100 GPU typically lives in an 8-GPU NVLink pod, with clusters scaling to thousands via InfiniBand—powering large-model training at supercomputer scale. ↩︎
https://e.huawei.com/en/products/computing/ascend ↩︎
Personas (like “helpful assistant” or “code tutor”) will still matter, but increasingly they’ll be wrappers around tool routing, session memory, and real-world execution authority. ↩︎

drew's blog