Tootfinder

Opt-in global Mastodon full text search. Join the index!

@heiseonline@social.heise.de
2025-11-13 10:43:00

GPT-5.1 am Start: "intelligenter und unterhaltsamer"
OpenAI macht GPT-5.1 für ChatGPT verfügbar. Wie man mit Menschen umgeht, die KI-Beziehungen führen, weiß OpenAI bisher nicht.

@Techmeme@techhub.social
2025-11-13 23:30:49

OpenAI releases GPT-5.1 in the API, featuring a "no-reasoning" mode and extended prompt caching with up to 24-hour retention to generate faster responses (OpenAI)
openai.com/index/gpt-5-1-for-d

@heiseonline@social.heise.de
2025-11-14 14:04:00

KI-Update: GPT-5.1, Maschinen menschlich machen, Definierte KI, Anthropic-Invest
Das "KI-Update" liefert werktäglich eine Zusammenfassung der wichtigsten KI-Entwicklungen.

@offenenetze@chaos.social
2025-11-11 15:38:50

Urteil in München:
ChatGPT darf Liedtexte nicht ohne Lizenz nutzen
zdfheute.de/wirtschaft/unterne

@Techmeme@techhub.social
2025-12-12 07:01:18

GPT-5.2 models match GPT-5 and 5.1 with a 400K context window and 128K max output tokens, but have a newer knowledge cutoff of Aug. 31, 2025 vs. Sept. 30, 2024 (Simon Willison/Simon Willison's Newsletter)
simonw.substack.com/p/gpt-52-a

@heiseonline@social.heise.de
2025-12-12 04:09:00

GPT-5.2: Neues KI-Modell von OpenAI soll Büroarbeiten besser unterstützen
Nur einen Monat nach GPT-5.1 kommt ein neues KI-Modell der ChatGPT-Entwickler. GPT-5.2 soll bessere Tabellen, Präsentationen und Code produzieren können.

@arXiv_csSE_bot@mastoxiv.page
2025-09-15 08:53:41

WALL: A Web Application for Automated Quality Assurance using Large Language Models
Seyed Moein Abtahi, Akramul Azim
arxiv.org/abs/2509.09918

@arXiv_csHC_bot@mastoxiv.page
2025-10-14 08:48:58

ROBOPSY PL[AI]: Using Role-Play to Investigate how LLMs Present Collective Memory
Margarete Jahrmann, Thomas Brandstetter, Stefan Glasauer
arxiv.org/abs/2510.09874

@arXiv_csCL_bot@mastoxiv.page
2025-10-10 11:05:49

Single layer tiny Co$^4$ outpaces GPT-2 and GPT-BERT
Noor Ul Zain, Mohsin Raza, Ahsan Adeel
arxiv.org/abs/2510.08404 arxiv.org/pdf/2510.084…

@Techmeme@techhub.social
2025-12-11 18:18:02

OpenAI says GPT-5.2 Thinking hallucinates less than GPT-5.1 and has improved reliability for agentic AI needs; pre-release testers include Notion, Box, Shopify (Hayden Field/The Verge)
theverge.com/ai-artificial-int

@peterhoneyman@a2mi.social
2025-10-14 10:04:31

i got up at 6 a.m. to wait in line

EADME someday
WX
WeatherSpark G 2-step
Coordinated Calen...
UM-GPT
Internet Speed Tes...
OPÉRA
NATIONAL
DE PARIS
Welcome to Opéra national de Paris
Sales for the operas Satyagraha and Rusalka, the Empreintes ballet programme, the ballets Romeo
and Juliet and La Dame aux camélias, and the Ballet School Demonstrations and the concert Hector
Berlioz open today.
The website will be available in...
03:02
min.
sec.
Your waiting time is updated periodically. Once elapsed, you will be able to enter the…
@arXiv_csCV_bot@mastoxiv.page
2025-10-13 10:27:00

CapGeo: A Caption-Assisted Approach to Geometric Reasoning
Yuying Li, Siyi Qian, Hao Liang, Leqi Zheng, Ruichuan An, Yongzhen Guo, Wentao Zhang
arxiv.org/abs/2510.09302

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 10:44:10

Weight Initialization and Variance Dynamics in Deep Neural Networks and Large Language Models
Yankun Han
arxiv.org/abs/2510.09423 arxiv.org…

@Techmeme@techhub.social
2025-11-13 20:41:04

Baidu unveils Ernie 5.0, an AI model to process and generate text, images, audio, and video, claiming it beats GPT-5-High and Gemini 2.5 Pro on some benchmarks (Carl Franzen/VentureBeat)
venturebeat.com/ai/baidu-unvei

@offenenetze@chaos.social
2025-11-11 09:35:52

Niederlage für Chat-GPT vor LG München
sueddeutsche.de/wirtschaft/mue

@arXiv_csCR_bot@mastoxiv.page
2025-09-15 08:23:51

Securing LLM-Generated Embedded Firmware through AI Agent-Driven Validation and Patching
Seyed Moein Abtahi, Akramul Azim
arxiv.org/abs/2509.09970

@Techmeme@techhub.social
2025-12-11 18:06:51

OpenAI launches GPT-5.2, its "best model yet," in Instant, Thinking, and Pro variants, with significant improvements in writing, coding, and reasoning (Maxwell Zeff/Wired)
wired.com/story/openai-gpt-lau

@arXiv_csHC_bot@mastoxiv.page
2025-10-09 09:44:51

GPT-5 Model Corrected GPT-4V's Chart Reading Errors, Not Prompting
Kaichun Yang, Jian Chen
arxiv.org/abs/2510.06782 arxiv.org/pdf/2510.…

@Mediagazer@mstdn.social
2025-12-04 19:30:52

Business Insider launches a monthlong pilot AI program to publish quick news stories, edited by BI editors, created using a GPT trained on its archives (Jamie Heller/Business Insider)
businessinsider.com/ai-pilot

@arXiv_csSE_bot@mastoxiv.page
2025-10-14 11:21:28

What Slows Down FMware Development? An Empirical Study of Developer Challenges and Resolution Times
Zitao Wang, Zhimin Zhao, Michael W. Godfrey
arxiv.org/abs/2510.11138

@dichotomiker@dresden.network
2025-10-27 02:25:30

#TIL we were supposed to shout at AI.
robert-glaser.de/prompts-as-pr

@arXiv_csCV_bot@mastoxiv.page
2025-10-10 11:03:49

Detecting Legend Items on Historical Maps Using GPT-4o with In-Context Learning
Sofia Kirsanova, Yao-Yi Chiang, Weiwei Duan
arxiv.org/abs/2510.08385

@life_is@no-pony.farm
2025-10-09 16:43:11

Google führt ein, dass Devoloper von Android-Apps sich identifizieren müssen
Alle Hersteller von LLM GPT führen Vibe Coding ein.
Wie funktioniert das? Der GPTbot identifiziert sich bei Google und verantwortet die App mit allen Modifikationen des menschlichen "Developers"?
Der menschliche Developer identifiziert sich und haftet für alle Halluzinationen des vibe-bots, die er weder versteht noch kennt?

@Life_is@no-pony.farm
2025-10-09 16:43:11

Google führt ein, dass Devoloper von Android-Apps sich identifizieren müssen
Alle Hersteller von LLM GPT führen Vibe Coding ein.
Wie funktioniert das? Der GPTbot identifiziert sich bei Google und verantwortet die App mit allen Modifikationen des menschlichen "Developers"?
Der menschliche Developer identifiziert sich und haftet für alle Halluzinationen des vibe-bots, die er weder versteht noch kennt?

@heiseonline@social.heise.de
2025-12-12 05:18:00

Freitag: Kritik an eID-Karte wegen Geldwäsche, neues OpenAI-Modell als Bürohilfe
eID-Karte zu einfach zu ergaunern GPT-5.2 für Profi-Nutzer Disney gegen Google-KI wegen Copyright Kritik an EU wegen VMware Roboter-Bewegungen erklärt

67: The none-sensical viral phrase that Gen Alpha can’t stop saying
And now, its seems to be taking over OpenAI as well.
“GPT-6 will be renamed GPT-6-7, you’re welcome,”
OpenAI CEO Sam Altman posted to X on Friday.
Altman’s announcement comes just a couple of days after Dictionary.com named the slang 2025’s word of the year.
For those lucky enough to not be familiar with the term,
the Gen Alpha slang can be traced back to hip hop artist Skrilla’s late 20…

@Techmeme@techhub.social
2025-12-11 19:16:04

[Thread] GPT-5.2 is now available in the API, priced at $1.75/1M input and $14/1M output tokens; GPT-5.2 Pro is priced at $21/1M input and $168/1M output tokens (@openaidevs)
x.com/openaidevs/status/199918

@arXiv_csAI_bot@mastoxiv.page
2025-10-08 10:03:59

Large Language Model-Based Uncertainty-Adjusted Label Extraction for Artificial Intelligence Model Development in Upper Extremity Radiography
Hanna Kreutzer, Anne-Sophie Caselitz, Thomas Dratsch, Daniel Pinto dos Santos, Christiane Kuhl, Daniel Truhn, Sven Nebelung
arxiv.org/abs/2510.05664

@ErikJonker@mastodon.social
2025-11-08 15:35:17

Look at the capabilities versus costs of Kimi K2 and GPT-5. Kimi K2 is 3 times as cheap with similar performance.
#AI

Intelligence of various AI models compared
Cost of various AI models compared
@Techmeme@techhub.social
2025-11-13 20:35:45

Anthropic open sources a method to score AI model political evenhandedness; Gemini 2.5 Pro got 97%, Grok 4 96%, Claude Opus 4.1 95%, GPT-5 89%, and Llama 4 66% (Ina Fried/Axios)
axios.com/2025/11/13/anthropic

@erikdelareguera@mastodon.nu
2025-11-21 09:43:44

Macron: ”Vem röstar folk på om de frågar Chat GPT?”
Samtidigt är EU på väg att pausa delar av sin AI-lagstiftning, efter påtryckningar från USA.
dn.se/varlden/macron-vem-rosta

@arXiv_csHC_bot@mastoxiv.page
2025-10-14 09:26:18

Read the Room or Lead the Room: Understanding Socio-Cognitive Dynamics in Human-AI Teaming
Jaeyoon Choi, Mohammad Amin Samadi, Spencer JaQuay, Seehee Park, Nia Nixon
arxiv.org/abs/2510.09944

@mariyadelano@hachyderm.io
2025-11-13 22:00:11

Curious that whenever someone shows me “the cool #AI flow” they built that’s supposed to be impressive, the conversation goes the same way:
Stage 1: “But you don’t understand. You don’t like AI because you haven’t used it right. Let me show you how much you can do it with.”
Stage 2: “Here are the steps in the flow and the instructions I feed to this agent / custom GPT / Claude project. I tell it to do X, reference document Y, and aim for Z.”
Stage 3: “Now, let me show you the results it gives.”
*Writes task, presses to run the prompt.*
Stage 4: “Umm sorry it’s taking a while. It’s fast but not instant. And by the way, the prompt isn’t perfect, you can definitely make it better. I just threw this together real quick the other day. It makes some mistakes, but it’s really good.”
Stage 5: “Uuuuuuh actually don’t look at the output.” *scrolls or stops screen share or pulls device away.*
“You know it’s already doing so well, if I do more prompt engineering it will get really good but I need to give it better instructions. And it ran just fine last night, I don’t know what’s up with it. And this is a cheap model, if we use another model it will be better.”
Stage 6: “You know, you really shouldn’t judge this so much. The technology will improve, it will get there sooner than you know and then you’ll regret not trying it sooner.”
So curious that this keeps happening 🤷‍♀️
#LLMs #work #tech #AIBubble

@rigo@mamot.fr
2025-11-17 08:31:56

Super émission/video de TV Monaco sur l'IA générative avec les pros de l'INRIA et de l'université Sophia Antipolis. A consommer sans modération!
videos.tvmonaco.com/content/ia

@K_luep@mastodon.social
2025-11-09 20:09:40

Kann Chat GPT Ultraschallbilder? #Tatort

@tinoeberl@mastodon.online
2025-10-18 05:07:02

#SteadySupporter
Der Einsatz von #GPT4 in der #Diagnostik zeigt Potenzial, doch aktuelle Studien belegen: Ohne gezielte

@Techmeme@techhub.social
2025-10-10 06:01:17

OpenAI says GPT‑5 instant and GPT‑5 thinking cut political bias by 30% from earlier models, and show greater robustness to charged prompts (Ashley Gold/Axios)
axios.com/2025/10/09/openai-gp

@ErikJonker@mastodon.social
2025-11-08 14:10:48

With models like Kimi K2 freely available doesn't the OpenAI businesscase with GPT-5 becomes extremely bad? 🤔
#AI #KimiK2 #chatgpt

@arXiv_csAI_bot@mastoxiv.page
2025-09-16 08:08:46

Understanding AI Evaluation Patterns: How Different GPT Models Assess Vision-Language Descriptions
Sajjad Abdoli, Rudi Cilibrasi, Rima Al-Shikh
arxiv.org/abs/2509.10707

@arXiv_statML_bot@mastoxiv.page
2025-10-03 09:11:01

AI Foundation Model for Time Series with Innovations Representation
Lang Tong, Xinyi Wang
arxiv.org/abs/2510.01560 arxiv.org/pdf/2510.01560…

@kubikpixel@chaos.social
2025-09-17 06:45:13

»GPT-5-Codex — Neue OpenAI-KI übernimmt komplexe Coding-Tasks:
#OpenAI hat das neue #KI-Modell #GPT5Codex vorgestellt. Dieses wurde speziell dafür entwickelt, eigenständig längere

@servelan@newsie.social
2025-11-03 17:53:37

Saying 'please' and 'thank you' to ChatGPT costs OpenAI millions, Sam Altman says
qz.com/open-ai-sam-altman-chat

@arXiv_csRO_bot@mastoxiv.page
2025-10-07 11:29:22

Zenbo Patrol: A Social Assistive Robot Based on Multimodal Deep Learning for Real-time Illegal Parking Recognition and Notification
Jian-jie Zheng, Chih-kai Yang, Po-han Chen, Lyn Chao-ling Chen
arxiv.org/abs/2510.04190

@michabbb@social.vivaldi.net
2025-10-05 17:10:49

🎤 Create custom voice agents in under 10 minutes using #Python with STT, LLM and TTS pipelines like #Deepgram, #OpenAI GPT-4o and

@arXiv_csCY_bot@mastoxiv.page
2025-10-08 07:40:49

Artificial-Intelligence Grading Assistance for Handwritten Components of a Calculus Exam
Gerd Kortemeyer, Alexander Caspar, Daria Horica
arxiv.org/abs/2510.05162

@arXiv_csCL_bot@mastoxiv.page
2025-10-10 10:51:19

Evaluating LLM-Generated Legal Explanations for Regulatory Compliance in Social Media Influencer Marketing
Haoyang Gui, Thales Bertaglia, Taylor Annabell, Catalina Goanta, Tjomme Dooper, Gerasimos Spanakis
arxiv.org/abs/2510.08111

@Techmeme@techhub.social
2025-10-04 22:36:55

An interview with Sam Altman and OpenAI President Greg Brockman on the tepid initial reception to GPT-5's launch, scaling, reinforcement learning, AGI, and more (Steven Levy/Wired)
wired.com/story/sam-altman-say

@arXiv_csCV_bot@mastoxiv.page
2025-10-03 10:23:41

Generating Findings for Jaw Cysts in Dental Panoramic Radiographs Using GPT-4o: Building a Two-Stage Self-Correction Loop with Structured Output (SLSO) Framework
Nanaka Hosokawa, Ryo Takahashi, Tomoya Kitano, Yukihiro Iida, Chisako Muramatsu, Tatsuro Hayashi, Yuta Seino, Xiangrong Zhou, Takeshi Hara, Akitoshi Katsumata, Hiroshi Fujita

@wrog@mastodon.murkworks.net
2025-11-04 23:42:58

MERCER ISLAND SCHOOL BOARD
Wow, you really do have to watch the downballot races. Mercer Island School Board has two (2) candidates (O'Callahan is and Gaspar) that are *both* software CTOs touting their "AI" credentials. Gaspar explicitly wants "free AI classes".
Here's a hint: The only "AI classes" that kids need are ones that teach them how to TURN ALL OF THAT SHIT OFF, and learn to think and write in their own words, not Chat-GPT'…

@seeingwithsound@mas.to
2025-09-30 18:36:36

The illusion of readiness: Stress testing large frontier models on multimodal medical benchmarks #AI

@gray17@mastodon.social
2025-11-01 07:45:19

researchers picked 2000 posts from r/AmITheAsshole where consensus response was "yes, YTA", and they asked LLMs to respond.
Claude and GPT said "no, NTA" 50% of the time.
Gemini was the least sycophantic at 20% NTA.
they also tested human reaction to sycophantic vs non-sycophantic LLMs. results are as expected (but not very large)
arxiv.org/abs/2510.01395

@Techmeme@techhub.social
2025-12-12 17:06:30

Companies are updating insider trading policies to cover prediction markets; Kalshi and others are pushing for federal oversight, including of insider trading (Rocket Drew/The Information)
theinformation.com/articles/po

@ErikJonker@mastodon.social
2025-11-08 15:07:21

Kimi K2 is another Deepseek moment it seems, only not everybody is noticing it yet. It will be interesting to see what the stock market will do on monday.
#AI #KimiK2

Someone tested Kimi K2 on unpublished material and it performed as good as GPT-5 and Gemini 2.5
@K_luep@mastodon.social
2025-11-09 20:09:40

Kann Chat GPT Ultraschallbilder? #Tatort

@UP8@mastodon.social
2025-09-29 15:25:58

🧾 Multi-Modal Vision vs. Text-Based Parsing: Benchmarking LLM Strategies for Invoice Processing
#software

@Techmeme@techhub.social
2025-10-06 19:40:53

OpenAI announces API updates, including GPT-5 Pro, Sora 2 in preview, and gpt-realtime-mini, a voice model that is 70% cheaper than gpt-realtime (Rebecca Bellan/TechCrunch)
techcrunch.com/2025/10/06/open

@arXiv_csCL_bot@mastoxiv.page
2025-10-09 10:18:11

Aligning Large Language Models via Fully Self-Synthetic Data
Shangjian Yin, Zhepei Wei, Xinyu Zhu, Wei-Lin Chen, Yu Meng
arxiv.org/abs/2510.06652

@Mediagazer@mstdn.social
2025-09-30 17:10:48

OpenAI releases an invitation-only Sora app on iOS, powered by Sora 2, to let people create and share AI-generated videos of themselves and their friends (Ina Fried/Axios)
axios.com/2025/09/30/openai-so

@arXiv_csAI_bot@mastoxiv.page
2025-10-08 10:34:39

Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
Weihao Zeng, Keqing He, Chuqiao Kuang, Xiaoguang Li, Junxian He
arxiv.org/abs/2510.06135

@Techmeme@techhub.social
2025-12-11 18:45:58

OpenAI says GPT‑5.2 Thinking beats or ties industry professionals on 70.9% of GDPval knowledge work tasks, delivering outputs at >11x the speed and <1% the cost (OpenAI)
openai.com/index/introducing-g

@arXiv_csSE_bot@mastoxiv.page
2025-10-01 10:45:37

Using GPT to build a Project Management assistant for Jira environments
Joel Garcia-Escribano, Arkaitz Carbajo, Mikel Ega\~na Aranguren, Unai Lopez-Novoa
arxiv.org/abs/2509.26014

@ErikJonker@mastodon.social
2025-11-19 07:28:07

"To be very clear, Gemini 3 isn’t perfect, and it still needs a manager who can guide and check it. But it suggests that “human in the loop” is evolving from “human who fixes AI mistakes” to “human who directs AI work.” And that may be the biggest change since the release of ChatGPT."

@Techmeme@techhub.social
2025-10-04 19:11:06

OpenAI updates GPT-5 Instant to better recognize and support people in distress; ChatGPT will route such sensitive parts of conversations to the model (@openai)
x.com/openai/status/1974234951

@arXiv_csCR_bot@mastoxiv.page
2025-10-06 09:26:49

LLM-Generated Samples for Android Malware Detection
Nik Rollinson, Nikolaos Polatidis
arxiv.org/abs/2510.02391 arxiv.org/pdf/2510.02391

@arXiv_csCL_bot@mastoxiv.page
2025-10-09 10:36:21

Mining the Mind: What 100M Beliefs Reveal About Frontier LLM Knowledge
Shrestha Ghosh, Luca Giordano, Yujia Hu, Tuan-Phong Nguyen, Simon Razniewski
arxiv.org/abs/2510.07024

@Life_is@no-pony.farm
2025-10-24 19:58:34

Meine Schwester und ich sind furchtbar zusammengestoßen in der Sache der Klimakatastrophe. Und sie hat tatsächlich wortwörtlich das Argument eingeführt "Heute geht die Welt nicht unter..." und offenbar hat sie wirklich keine Ahnung, wen sie da zitiert hat. Definitiv weiss sie nicht und will es auch nicht wissen, dass ich ein GPT-generiertes Bild dieses Merz-Zitates (korrekt gekennzeichnet als generiert) nach wikipedia importiert habe. Der böse Beigeschmack: Sie ist studierte Umweltingenieurin, …

@Techmeme@techhub.social
2025-10-30 17:10:46

OpenAI launches Aardvark, a GPT-5-powered autonomous cybersecurity research agent that can identify and help patch vulnerabilities, in private beta (Sabrina Ortiz/ZDNET)
zdnet.com/article/openai-unvei

@arXiv_csCV_bot@mastoxiv.page
2025-10-07 12:47:02

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation
Ziqi Huang, Ning Yu, Gordon Chen, Haonan Qiu, Paul Debevec, Ziwei Liu
arxiv.org/abs/2510.05094

@Techmeme@techhub.social
2025-12-10 16:54:06

Sources: Meta's new AI model, codenamed Avocado, may launch in spring 2026 as a "closed" model, and was trained using Google's Gemma, OpenAI's gpt-oss, and Qwen (Bloomberg)
bloomberg.com/news/articles/20

@arXiv_csCL_bot@mastoxiv.page
2025-10-07 12:18:02

Resource-Efficient Fine-Tuning of LLaMA-3.2-3B for Medical Chain-of-Thought Reasoning
Imran Mansha
arxiv.org/abs/2510.05003 arxiv.org/pdf/2…

@Techmeme@techhub.social
2025-12-10 21:56:20

ChatGPT was 2025's most downloaded free app in the US iOS App Store, up from No. 4 in 2024, followed by Threads, Google, TikTok, WhatsApp, and Instagram (Sarah Perez/TechCrunch)
techcrunch.com/2025/12/10/chat

@Techmeme@techhub.social
2025-11-20 18:50:51

OpenAI says GPT-5 has demonstrated the ability to accelerate scientific research workflows but can't run projects or solve scientific problems autonomously (Radhika Rajkumar/ZDNET)
zdnet.com/article/gpt-5-is-spe

@arXiv_csCL_bot@mastoxiv.page
2025-10-07 12:23:02

Slm-mux: Orchestrating small language models for reasoning
Chenyu Wang, Zishen Wan, Hao Kang, Emma Chen, Zhiqiang Xie, Tushar Krishna, Vijay Janapa Reddi, Yilun Du
arxiv.org/abs/2510.05077

@arXiv_csAI_bot@mastoxiv.page
2025-10-03 09:21:51

ICL Optimized Fragility
Serena Gomez Wannaz
arxiv.org/abs/2510.00300 arxiv.org/pdf/2510.00300

@Techmeme@techhub.social
2025-11-09 09:01:30

The Alpha Arena experiment gave six frontier models $10K each to trade crypto derivatives over two weeks: losses ranged from Qwen3 Max's $652 to GPT-5's $5,679 (Sebastian Pellejero/Reuters)
reuters.com/commentary/breakin

@Techmeme@techhub.social
2025-09-25 16:50:55

OpenAI releases GDPval, a benchmark to test AI performance on "economically valuable, real-world tasks", and says Claude Opus 4.1 was the best performing model (Maxwell Zeff/TechCrunch)
techcrunch.com/2025/09/25/open

@arXiv_csCV_bot@mastoxiv.page
2025-10-01 11:43:07

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
Keming Wu, Sicong Jiang, Max Ku, Ping Nie, Minghao Liu, Wenhu Chen
arxiv.org/abs/2509.26346

@Techmeme@techhub.social
2025-10-19 16:30:55

OpenAI researchers delete X posts claiming a GPT-5 math breakthrough after pushback from Hassabis, others; LeCun says they were "hoisted by their own GPTards" (Matthias Bastian/The Decoder)
the-decoder.com/leading-openai

@arXiv_csCL_bot@mastoxiv.page
2025-10-06 08:06:19

Hallucination-Resistant, Domain-Specific Research Assistant with Self-Evaluation and Vector-Grounded Retrieval
Vivek Bhavsar, Joseph Ereifej, Aravanan Gurusami
arxiv.org/abs/2510.02326

@Techmeme@techhub.social
2025-11-18 17:26:22

Gemini 3 demonstrates strong planning, coding, and judgment skills, and shows how AI models moved past hallucinations to subtle, and often human-like, errors (Ethan Mollick/One Useful Thing)
oneusefulthing.org/p/three-yea
<…

@Techmeme@techhub.social
2025-12-08 09:30:46

Google says Gemini 3 Pro sets new vision AI benchmark records, including in complex visual reasoning, beating Claude Opus 4.5 and GPT-5.1 in some categories (Rohan Doshi/The Keyword)
blog.google/technology/develop

@Techmeme@techhub.social
2025-12-07 16:05:39

Essential AI, whose CEO co-wrote Google's Attention Is All You Need paper, unveils Rnj-1, an 8B-parameter open model with SWE-bench performance close to GPT-4o (Ashish Vaswani/Essential AI)
essential.ai/research/rnj-1

@Techmeme@techhub.social
2025-09-29 19:26:02

Claude Sonnet 4.5 is faster and more steerable than Opus 4.1 and excels in Claude Code, but GPT-5 Codex is still better for difficult production coding tasks (Dan Shipper/Every)
every.to/vibe-check/vibe-check

@Techmeme@techhub.social
2025-09-15 17:15:44

OpenAI debuts GPT‑5-Codex, a version of GPT‑5 optimized for agentic coding in Codex and says it spends its "thinking" time more dynamically than previous models (Maxwell Zeff/TechCrunch)
techcrunch.com/2025/09/15/open

@arXiv_csCL_bot@mastoxiv.page
2025-10-03 10:46:41

Detecting LLM-Generated Spam Reviews by Integrating Language Model Embeddings and Graph Neural Network
Xin Liu, Rongwu Xu, Xinyi Jia, Jason Liao, Jiao Sun, Ling Huang, Wei Xu
arxiv.org/abs/2510.01801

@Techmeme@techhub.social
2025-10-29 12:21:02

OpenAI releases gpt-oss-safeguard, its open-weight reasoning models for safety classification tasks, available in 120B and 20B parameters, under Apache 2.0 (OpenAI)
openai.com/index/introducing-g

@Techmeme@techhub.social
2025-11-07 02:50:49

Chinese startup Moonshot releases Kimi K2 Thinking, an open-source model it claims beats GPT-5 in agentic capabilities; source: the model cost $4.6M to train (Evelyn Cheng/CNBC)
cnbc.com/2025/11/06/alibaba-ba

@Techmeme@techhub.social
2025-11-05 12:40:50

Netherlands-based Nebius unveils Token Factory, a platform to let companies use open source AI models like GPT-oss, in a bid to compete with AWS and Azure (Dina Bass/Bloomberg)
bloomberg.com/news/articles/20

@Techmeme@techhub.social
2025-12-05 02:05:54

Physicist Steve Hsu says he has published a peer-reviewed theoretical physics paper whose main idea came from GPT-5 (Steve Hsu/@hsu_steve)
x.com/hsu_steve/status/1996034

@Techmeme@techhub.social
2025-12-02 11:25:57

Study: using the SCONE-bench benchmark of 405 smart contracts, Claude Opus 4.5, Sonnet 4.5, and GPT-5 found and developed exploits collectively worth $4.6M (Anthropic)
red.anthropic.com/2025/smart-c

@Techmeme@techhub.social
2025-10-02 21:46:04

Mercor launches the AI Productivity Index (APEX), which evaluates AI models' ability to perform "economically valuable knowledge work"; GPT-5 leads the index (Mercor)
mercor.com/blog/introducing-ap

@Techmeme@techhub.social
2025-11-19 19:32:12

OpenAI unveils GPT-5.1-Codex-Max, saying it is "significantly better" at "long-horizon reasoning" and is the first model it has trained for Windows environments (David Gewirtz/ZDNET)
zdnet.com/article/op…

@Techmeme@techhub.social
2025-09-30 17:05:55

OpenAI releases an invitation-only Sora app on iOS, powered by Sora 2, to let people create and share AI-generated videos of themselves and their friends (Ina Fried/Axios)
axios.com/2025/09/30/openai-so

@Techmeme@techhub.social
2025-11-30 06:40:47

Alibaba Technical Report: Qwen3-VL beats GPT-5 and Gemini 2.5 Pro on visual tasks and has 100% accuracy on "needle-in-a-haystack" tests for 30-minute videos (Jonathan Kemper/The Decoder)
the-decoder.com/qwen3-vl-can-s

@Techmeme@techhub.social
2025-09-29 13:40:44

Microsoft launches Agent Mode in Excel and Word, using GPT-5 to generate complex spreadsheets and documents, saying it is "bringing vibe working" to 365 Copilot (Tom Warren/The Verge)
theverge.com/news/787076/micro

@Techmeme@techhub.social
2025-09-29 19:26:02

Anthropic prices Claude Sonnet 4.5 at $3/1M input and $15/1M output tokens, same as Sonnet 4, cheaper than Opus at $15/$75 but higher than GPT-5 at $1.25/$10 (Simon Willison/Simon Willison's Weblog)
simonwillison.net/2025/Sep/29/

@Techmeme@techhub.social
2025-09-25 13:55:49

Databricks says it plans to integrate OpenAI's models, including GPT-5, into its data platform and AI product Agent Bricks, as part of a $100M multiyear deal (Rebecca Bellan/TechCrunch)
techcrunch.com/2025/09/25/data

@Techmeme@techhub.social
2025-11-24 18:15:53

OpenAI unveils a free shopping research feature in ChatGPT that delivers a personalized buyer's guide, powered by a custom version of GPT-5 mini (Sabrina Ortiz/ZDNET)
zdnet.com/article/chatgpts-new

@Techmeme@techhub.social
2025-11-24 20:45:47

Anthropic prices Claude Opus 4.5 at $5/1M input and $25/1M output tokens, much cheaper than Opus 4.1 at $15/$75 but still pricier than GPT-5.1 and Gemini 3 Pro (Simon Willison/Simon Willison's Weblog)
simonwillison.net/2025/Nov/24/

@Techmeme@techhub.social
2025-11-24 18:35:42

Microsoft unveils Fara-7B, its first agentic SLM designed for computer use, available as an experimental release on Hugging Face and Microsoft Foundry (Ben Dickson/VentureBeat)
venturebeat.com/ai/microsofts-