Tootfinder

Opt-in global Mastodon full text search. Join the index!

@Techmeme@techhub.social
2025-12-17 16:15:44

Google makes Gemini 3 Flash the default model in Gemini app and Search's AI mode; it scored 33.7% without tool use on Humanity's Last Exam vs. GPT-5.2's 34.5% (Ivan Mehta/TechCrunch)
techcrunch.com/2025/12/17/goog

@Techmeme@techhub.social
2025-12-08 09:30:46

Google says Gemini 3 Pro sets new vision AI benchmark records, including in complex visual reasoning, beating Claude Opus 4.5 and GPT-5.1 in some categories (Rohan Doshi/The Keyword)
blog.google/technology/develop

@Mediagazer@mstdn.social
2026-01-10 07:26:09

Researchers say GPT 4.1, Claude 3.7 Sonnet, Gemini 2.5 Pro, and Grok 3 can reproduce long excerpts from books they were trained on when strategically prompted (Alex Reisner/The Atlantic)
theatlantic.com/technology/202

@ErikJonker@mastodon.social
2025-11-19 07:28:07

"To be very clear, Gemini 3 isn’t perfect, and it still needs a manager who can guide and check it. But it suggests that “human in the loop” is evolving from “human who fixes AI mistakes” to “human who directs AI work.” And that may be the biggest change since the release of ChatGPT."

@Techmeme@techhub.social
2026-01-10 07:25:54

Researchers say GPT 4.1, Claude 3.7 Sonnet, Gemini 2.5 Pro, and Grok 3 can reproduce long excerpts from books they were trained on when strategically prompted (Alex Reisner/The Atlantic)

@ErikJonker@mastodon.social
2025-11-08 15:35:17

Look at the capabilities versus costs of Kimi K2 and GPT-5. Kimi K2 is 3 times as cheap with similar performance.
#AI

Intelligence of various AI models compared
Cost of various AI models compared
@Techmeme@techhub.social
2025-11-18 17:26:22

Gemini 3 demonstrates strong planning, coding, and judgment skills, and shows how AI models moved past hallucinations to subtle, and often human-like, errors (Ethan Mollick/One Useful Thing)
oneusefulthing.org/p/three-yea
<…

@mariyadelano@hachyderm.io
2025-11-13 22:00:11

Curious that whenever someone shows me “the cool #AI flow” they built that’s supposed to be impressive, the conversation goes the same way:
Stage 1: “But you don’t understand. You don’t like AI because you haven’t used it right. Let me show you how much you can do it with.”
Stage 2: “Here are the steps in the flow and the instructions I feed to this agent / custom GPT / Claude project. I tell it to do X, reference document Y, and aim for Z.”
Stage 3: “Now, let me show you the results it gives.”
*Writes task, presses to run the prompt.*
Stage 4: “Umm sorry it’s taking a while. It’s fast but not instant. And by the way, the prompt isn’t perfect, you can definitely make it better. I just threw this together real quick the other day. It makes some mistakes, but it’s really good.”
Stage 5: “Uuuuuuh actually don’t look at the output.” *scrolls or stops screen share or pulls device away.*
“You know it’s already doing so well, if I do more prompt engineering it will get really good but I need to give it better instructions. And it ran just fine last night, I don’t know what’s up with it. And this is a cheap model, if we use another model it will be better.”
Stage 6: “You know, you really shouldn’t judge this so much. The technology will improve, it will get there sooner than you know and then you’ll regret not trying it sooner.”
So curious that this keeps happening 🤷‍♀️
#LLMs #work #tech #AIBubble

@Techmeme@techhub.social
2025-11-18 20:55:53

Gemini 3 Pro is priced at $2-$4 per 1M input tokens and $12-$18 per 1M output tokens, cheaper than Claude Sonnet 4.5 but more expensive than GPT-5.1 (Simon Willison/Simon Willison's Weblog)
simonwillison.net/2025/Nov/18/

@Techmeme@techhub.social
2025-11-19 15:56:00

Gemini 3 hands-on: a fundamental improvement on daily use, extremely fast, Antigravity IDE is a powerful launch product, and its personality is terse and direct (matt shumer)
shumer.dev/gemini3review

@Techmeme@techhub.social
2025-11-24 20:45:47

Anthropic prices Claude Opus 4.5 at $5/1M input and $25/1M output tokens, much cheaper than Opus 4.1 at $15/$75 but still pricier than GPT-5.1 and Gemini 3 Pro (Simon Willison/Simon Willison's Weblog)
simonwillison.net/2025/Nov/24/