2025-12-17 16:15:44
Google makes Gemini 3 Flash the default model in Gemini app and Search's AI mode; it scored 33.7% without tool use on Humanity's Last Exam vs. GPT-5.2's 34.5% (Ivan Mehta/TechCrunch)
https://techcrunch.com/2025/12/17/goog
Google makes Gemini 3 Flash the default model in Gemini app and Search's AI mode; it scored 33.7% without tool use on Humanity's Last Exam vs. GPT-5.2's 34.5% (Ivan Mehta/TechCrunch)
https://techcrunch.com/2025/12/17/goog
Google says Gemini 3 Pro sets new vision AI benchmark records, including in complex visual reasoning, beating Claude Opus 4.5 and GPT-5.1 in some categories (Rohan Doshi/The Keyword)
https://blog.google/technology/developers/gemini-3-pro-vision/
Researchers say GPT 4.1, Claude 3.7 Sonnet, Gemini 2.5 Pro, and Grok 3 can reproduce long excerpts from books they were trained on when strategically prompted (Alex Reisner/The Atlantic)
https://www.theatlantic.com/technology/2026/01/ai-memorization-research/6…
Researchers say GPT 4.1, Claude 3.7 Sonnet, Gemini 2.5 Pro, and Grok 3 can reproduce long excerpts from books they were trained on when strategically prompted (Alex Reisner/The Atlantic)
https://www.
Look at the capabilities versus costs of Kimi K2 and GPT-5. Kimi K2 is 3 times as cheap with similar performance.
#AI
Gemini 3 demonstrates strong planning, coding, and judgment skills, and shows how AI models moved past hallucinations to subtle, and often human-like, errors (Ethan Mollick/One Useful Thing)
https://www.oneusefulthing.org/p/three-years-from-gpt-3-to-gemini
<…
Curious that whenever someone shows me “the cool #AI flow” they built that’s supposed to be impressive, the conversation goes the same way:
Stage 1: “But you don’t understand. You don’t like AI because you haven’t used it right. Let me show you how much you can do it with.”
Stage 2: “Here are the steps in the flow and the instructions I feed to this agent / custom GPT / Claude project. I tell it to do X, reference document Y, and aim for Z.”
Stage 3: “Now, let me show you the results it gives.”
*Writes task, presses to run the prompt.*
Stage 4: “Umm sorry it’s taking a while. It’s fast but not instant. And by the way, the prompt isn’t perfect, you can definitely make it better. I just threw this together real quick the other day. It makes some mistakes, but it’s really good.”
Stage 5: “Uuuuuuh actually don’t look at the output.” *scrolls or stops screen share or pulls device away.*
“You know it’s already doing so well, if I do more prompt engineering it will get really good but I need to give it better instructions. And it ran just fine last night, I don’t know what’s up with it. And this is a cheap model, if we use another model it will be better.”
Stage 6: “You know, you really shouldn’t judge this so much. The technology will improve, it will get there sooner than you know and then you’ll regret not trying it sooner.”
So curious that this keeps happening 🤷♀️
#LLMs #work #tech #AIBubble
Gemini 3 Pro is priced at $2-$4 per 1M input tokens and $12-$18 per 1M output tokens, cheaper than Claude Sonnet 4.5 but more expensive than GPT-5.1 (Simon Willison/Simon Willison's Weblog)
https://simonwillison.net/2025/Nov/18/gemini-3/
Gemini 3 hands-on: a fundamental improvement on daily use, extremely fast, Antigravity IDE is a powerful launch product, and its personality is terse and direct (matt shumer)
https://shumer.dev/gemini3review
Anthropic prices Claude Opus 4.5 at $5/1M input and $25/1M output tokens, much cheaper than Opus 4.1 at $15/$75 but still pricier than GPT-5.1 and Gemini 3 Pro (Simon Willison/Simon Willison's Weblog)
https://simonwillison.net/2025/Nov/24/claude-opus/