Short version: I expected more from Google. Not because Gemini 3.5 Flash is an unusable model or because Antigravity has no direction. The problem is elsewhere: Google showed a lot of things that look like major steps forward from the outside, but in real-world use they currently feel more like a more expensive, more closed, and less polished replacement for tools I already liked.
I am deliberately only assessing things I can actually try myself, or things where public material exists. I am not covering products available only in the US, only to trusted testers, or only to people on Google AI Ultra. Availability is part of the problem: Google can show the future on stage, but for an ordinary developer outside the US, that future is often just a screenshot.
Gemini 3.5 Flash: a fast model with a high real-world cost
According to the official article, Gemini 3.5 Flash is a model for the agentic era: long workflows, sub-agents, iteration, coding loops, and a huge context window. On paper, that sounds great. The input context is 1,048,576 tokens, the output limit is 65,536 tokens, and the model supports thinking, code execution, file search, URL context, and grounding via search. The knowledge cut-off, however, is January 2025, which is the same basic time horizon that does not exactly excite me in a new generation.

My impression is that this is more like a tuned and scaled Gemini 3.0 Flash than a genuine qualitative leap.
Speed
The speed is real, especially during the response generation itself.

But for agentic tasks, tokens per second are not the only thing that matters. What matters is how long the model thinks before answering, such as 18.55 seconds before it starts responding — Time to First Token — how many tokens it burns, whether it checks its own work, and how many attempts it needs to produce a usable result.

Cost
And this is where Gemini 3.5 Flash starts to bother me. In charts, it is supposed to look like a cheaper Flash model, but in real usage it burns so many tokens that the final cost no longer feels much like a classic Flash model.

Based on material from Artificial Analysis and tweets around agentic benchmarks, it ends up in an absurdly expensive category for some tasks.

In my original comparison, Cost to Run comes out at:
- 1,552 dollars for Gemini 3.5 Flash
- 892 dollars for Gemini 3.1 Pro
- 278 dollars for Gemini 3.0 Flash.
Even if the exact numbers change depending on the benchmark, the point remains: a fast token is not the same thing as a cheaply solved task. And this shows up everywhere. For example, in the price on GitHub Copilot.

Programming
Another problem is the quality when programming. I am not saying Gemini 3.5 Flash cannot do anything. Quite the opposite: in some benchmarks and short tasks, it is very good. But if I am going to use it as my main agent, I need reliability. I need the model to run code, check it, fix it, and avoid drowning unnecessarily in its own reasoning output. From the transcript of the video by Theo and from my own notes, a similar pattern emerges: the model can be fast and impressive, but with longer coding tasks it quickly becomes clear that speed alone is not enough.

My verdict: Gemini 3.5 Flash is faster than Gemini 3.1 Pro, but it does not feel smarter in the areas I care about. On top of that, it can be more expensive in practice. If I want a large context window, I still understand why someone would reach for it. But if I want a reliable coding agent, I would rather try GPT-5.4 mini, GPT-5.5 low, or, depending on the task, the older Gemini 3 Flash or 3.1 Flash-Lite.
Antigravity CLI: a faster replacement for a tool I liked
I liked Gemini CLI, especially in the early days. It was open-source, had generous limits, was moving quickly, and thanks to the large context window, MCP, browser, and Google Search, it made surprisingly fun automations possible. It was not a perfect tool, but it had a community, energy, and direction. The GitHub repository has more than 100 thousand stars, and that is not an accident.
Google is now pushing users towards Antigravity CLI. The official announcement says that on 18 June 2026, Gemini CLI and Gemini Code Assist IDE extensions will stop serving requests for Google AI Pro, Ultra, and free individual usage. I understand the product logic: unify the backend, agents, and IDE into one platform. But from a developer’s perspective, it is bitter. An open-source tool that had won over a community is being replaced by a closed-source CLI that currently feels unfinished.

Antigravity CLI is written in Go and subjectively feels faster than the old Gemini CLI. But UI speed does not help much when the UI itself is poor. The same issues keep appearing in tests and reports: broken scrolling, where the terminal inserts older history into the input instead of scrolling; a moving input box; repeated logins, fixed by the update on 22 May 2026; no normal way to exit the tool with Ctrl+C; freezing during generation; feedback prompts that block work; and significantly reduced configuration compared with Gemini CLI.
For me, this is mainly a communication failure. If Google had said: “Gemini CLI will continue, and Antigravity CLI is a new option for multi-agent workflows,” I would take it differently. But when a new closed replacement arrives with bugs while the old open route is being closed, it does not feel like a step forward. It feels like big-company product politics: take the energy from the community, then push that community into your own closed system.
Antigravity 2.0: Codex desktop app, but less finished
Antigravity 2.0 is the strangest part of the whole presentation for me. Google positions it as an agent-first environment for developers. In practice, though, it feels like a fairly blatant copy of the OpenAI Codex desktop app, just with fewer features and a worse sense of polish.

The similarity itself would not bother me. Everyone learns from everyone else, and good UX patterns naturally get copied.
The problem is when a new product resembles the competition more than it resembles your own previous good work.
And it becomes even funnier when the Antigravity 2.0 presentation included a folder called Codex 😀 .
It is missing things I consider basic for an agentic tool: planning mode, goals, clear control over the agent’s autonomy, good limit visibility, pinned projects, and better support for common developer workflows. Add to that the limits, which on a paid account can be burned through in a few dozen minutes, and the excitement fades quickly.
After complaints, Google increased the limits for paid tiers by 3x and reset quotas. That is a good response. But at the same time, it confirms that the initial setup was not good.
Update 2026-05-22: They have already increased the limits again by 3x, but it is still a problem when you are paying $200 and burn through a 5-hour limit in 30 minutes.
For a multi-agent tool, limits are part of the product, not a minor detail. When users do not know how much they have left, what exactly is burning tokens, and why their work suddenly stops, they lose trust.
Why I am not assessing Spark, Omni, or the new AI search
I am not assessing Gemini Spark because I cannot actually use it. It is not available to me, it is not available globally, and based on the available information, it is initially aimed at trusted testers, the US, and higher-tier subscriptions.
Gemini Omni? Same problem: “This feature is not available in your country.” The new search built on Gemini 3.5 Flash? Again, an availability fog.
This has been frustrating with Google for a long time. On stage, everything looks like the finished future. In Europe, you often end up sitting in front of a product where half the features are “coming this summer” — and still only in the US — another half are “not available in your country” even when they are supposedly “globally available”, and the remaining half burns through your limit before you have time to find out whether it is any good.
Overall impression: Google has everything except product tempo
Google has the infrastructure, TPUs, research, distribution, data, talent, and ecosystem. That is precisely why it is so frustrating when the result feels like a combination of excellent lab capabilities and unfinished product management. In my view, Anthropic and OpenAI have felt faster over the past year. Not always flawless, but their developer products have a clearer sense of momentum.
Gemini 3.5 Flash is not a disaster. Antigravity CLI is not hopeless. Antigravity 2.0 may mature. But as a whole, after Google I/O 2026, it feels weaker than I expected. Instead of thinking “Google has finally found its rhythm”, I am left thinking “Google has a huge ship that still turns far too slowly”.
My current recommendation is simple: test Gemini 3.5 Flash, but do not look only at speed. Track the cost of the whole task, the number of tokens burned, the quality of completion, and how much manual work remains afterwards. Try Antigravity CLI, but I would not build a critical workflow on top of it yet. And I see Antigravity 2.0 as a product that may improve in a few months, but today it has not convinced me.


