Tootfinder

@Techmeme@techhub.social
2026-02-11 00:15:57

OpenAI updates ChatGPT's deep research tool with GPT-5.2, a full-screen report view, and an option to focus research on specific websites (Matthias Bastian/The Decoder)
https://the-decoder.com/openais-deep-research-now-runs-on-gp…

OpenAI's Deep Research now runs on GPT-5.2 and lets users search specific websites
OpenAI now powers Deep Research in ChatGPT with GPT-5.2, adding features like targeted website search and real-time tracking. But that doesn't necessarily make AI research more reliable.

@digitalnaiv@mastodon.social
2026-02-13 10:19:00

OpenAI poliert die KI-Recherche: Deep Research in ChatGPT läuft jetzt auf GPT-5.2, liefert Vollbild-Reports mit Inhaltsverzeichnis & Quellen und lässt gezielt Websites priorisieren. Mehr Kontrolle statt „Wischi-waschi“ – aber kein Freibrief für unkritische KI-Outputs. #DeepResearch #KI

Deep Research in ChatGPT: Diese neuen Funktionen bringen mehr Übersicht in deine KI-Recherche | t3n
OpenAI hat Deep Research in ChatGPT neue Funktionen spendiert. So gibt es jetzt einen Vollbild-Viewer mit automatisch erstelltem Inhaltsverzeichnis. Zudem lassen sich gezielt bestimmte Websites durchsuchen. Und das ist noch nicht alles. Mit Deep Research hatte OpenAI Nutzer:innen seines KI-Chatbots ChatGPT im Februar 2025 eine tiefergehende Recherchemöglichkeit zur Verfügung gestellt. Das Tool durchsucht Websites, analysiert […]

@Techmeme@techhub.social
2026-03-08 20:55:45

Luma AI debuts Uni-1, an image model that combines image understanding and generation in a single architecture, topping Nano Banana 2 on logic-based benchmarks (Matthias Bastian/The Decoder)
https://the-decoder.com/luma-ais-new-u

Luma AI's new Uni-1 image model tops Nano Banana 2 and GPT Image 1.5 on logic-based benchmarks
Luma AI takes on OpenAI and Google with Uni-1, a model that combines image understanding and generation in a single architecture and reasons through prompts as it creates.

@Techmeme@techhub.social
2026-03-05 18:25:49

OpenAI says GPT-5.4's "individual claims are 33% less likely to be false and its full responses are 18% less likely to contain any errors, relative to GPT-5.2" (David Gewirtz/ZDNET)
https://www.zdnet.com/article/openai-gpt-5-4/

OpenAI's new GPT-5.4 clobbers humans on pro-level work in tests - by 83%
GPT-5.4 is also more reliable, producing 18% fewer errors and 33% fewer false claims than GPT-5.2, according to OpenAI.

@Techmeme@techhub.social
2026-03-05 18:12:11

OpenAI says GPT-5.4 produces presentations with stronger, more varied aesthetics and makes more effective use of its image generation tools (Igor Bonifacic/Engadget)
https://www.engadget.com/ai/i-hope-you-like-spreadsheets-because-gpt-54-love…

I hope you like spreadsheets, because GPT-5.4 loves them
OpenAI is releasing a new model today, and like GPT-5.2 before it, GPT-5.4 is all about professional work.

@ErikJonker@mastodon.social
2026-02-26 17:44:52

De ontwikkeling van GPT-NL.
#gptnl

@Techmeme@techhub.social
2026-03-03 18:41:03

OpenAI says GPT-5.3 Instant's tone should feel less "cringe" than GPT-5.2 Instant and has a smoother, more to-the-point conversational style (Marcus Schuler/Implicator.ai)
https://www.implicator.ai/openai-ships-gpt-5-…

OpenAI Ships GPT-5.3 Instant With 27% Fewer Hallucinations and a Less Preachy Tone
GPT-5.3 Instant replaces ChatGPT's default model with up to 26.8% fewer hallucinations, reduced refusals, and a less defensive tone.

@heiseonline@social.heise.de
2026-02-26 08:08:51

Was passiert, wenn man KI-Modelle wie GPT-5.2, Claude Sonnet 4 oder Gemini 3 Flash als Krisenberater einsetzt? Forscher des King's College London haben genau das in Konfliktsimulationen getestet – mit erschreckenden Ergebnissen. 😰
Zum Artikel: https://heis…

Auf dem Bild ist eine Person mit VR-Brille in Militäruniform zu sehen. Im Bild steht: "KI-Modelle greifen in Kriegssimulationen fast immer zu Atomwaffen" darunter steht: "Forscher warnen: Als militärische Entscheider wären aktuelle Systeme brandgefährlich."

@Techmeme@techhub.social
2026-03-04 15:16:41

Source: OpenAI is preparing to launch GPT-5.4, which will feature an "extreme" reasoning mode and a context window of 1M tokens, up from 400K in GPT-5.2 (Stephanie Palazzolo/The Information)
https://www.theinformation.com/articles/openais-next-a…

OpenAI’s Next AI Model Will Have ‘Extreme’ Reasoning
OpenAI’s next GPT model is coming—and soon, according to a person with knowledge of it.Among the highlights, the new model, GPT-5.4, will have more than double the context window of the current GPT-5.2 model. That means the model can handle queries with many more words or data, up to 1 million ...

@Techmeme@techhub.social
2026-03-05 18:41:00

GPT-5.4 is priced at $2.50/1M input tokens and $15/1M output tokens; GPT-5.4 Pro is priced at $30/1M input tokens and $180/1M output tokens (Carl Franzen/VentureBeat)
https://venturebeat.com/technology/openai-launches-gpt-5-4-wit…

@cdarwin@c.im
2026-01-18 04:56:10

A damning new study could put AI companies on the defensive.
In it, Stanford and Yale researchers found compelling evidence that AI models are actually copying all that data,
not “learning” from it.
Specifically, four prominent LLMs
— OpenAI’s GPT-4.1, Google’s Gemini 2.5 Pro, xAI’s Grok 3, and Anthropic’s Claude 3.7 Sonnet
— happily reproduced lengthy excerpts from popular
— and protected
— works, with a stunning degree of accuracy.
They fou…

Researchers Just Found Something That Could Shake the AI Industry to Its Core
Researchers found compelling evidence that AI models are actually copying copyrighted data, not "learning" from it.

@almad@fosstodon.org
2026-02-22 19:11:34

Ah yes, #LLM for exploit development.
In other words, we’ll now spent billions on offense & prevention to achieve new equilibrium (that we already sort of had).
Good job, us. #infosec

On the Coming Industrialisation of Exploit Generation with LLMs
Recently I ran an experiment where I built agents on top of Opus 4.5 and GPT-5.2 and then challenged them to write exploits for a zeroday vulnerability in the QuickJS Javascript interpreter. I adde…

@arXiv_csCL_bot@mastoxiv.page
2026-03-31 10:12:07

Compressing Transformer Language Models via Matrix Product Operator Decomposition: A Case Study on PicoGPT
Younes Javanmard, Tanmoy Pandit, Masoud Mardani
https://arxiv.org/abs/2603.28534 https://arxiv.org/pdf/2603.28534 https://arxiv.org/html/2603.28534
arXiv:2603.28534v1 Announce Type: new
Abstract: Transformer-based language models achieve strong performance across NLP tasks, but their quadratic parameter scaling with hidden dimension makes deployment on resource-constrained hardware expensive. We study Matrix Product Operator (MPO) decomposition as a principled compression method for transformers. MPO factorises weight matrices into chains of low-rank cores, with approximation quality controlled by the bond dimension chi. We replace every nn.Linear layer in PicoGPT, a GPT-2-style character-level language model with about 1M parameters, with an MPOLinear module parameterised as an MPO chain. Cores are initialised either by TT-SVD from pretrained dense weights or from random initialisation, and trained using standard PyTorch autograd without a custom backward pass. We derive balanced factorisation schemes for the five distinct weight shapes in PicoGPT and evaluate bond dimensions chi in {4, 8, 16, 32} on Tiny Shakespeare. MPO compression achieves up to 13x compression per transformer block at chi = 4. At chi = 16, the model uses 191,872 parameters instead of 1,020,224 while retaining 97.7% of baseline token accuracy (51.6% vs 52.8%). Reconstruction error follows the expected trend and is lower for three-site than two-site factorisations at the same bond dimension. The chi = 8 model gives the best accuracy per parameter, exceeding the dense baseline by 2.7x on this metric. These results show that MPO parameterisation is a practical and theoretically grounded alternative to low-rank methods and unstructured pruning for transformer compression.
toXiv_bot_toot

@davej@dice.camp
2026-02-15 20:58:43

“A week ago, someone sent me a link to an online article describing a flaming confrontation between me and the CEO of the Commonwealth Bank, Matt Comyn, on the set of 7.30.
“The story was 2,000 words long, very detailed, and had pictures of Comyn and me arguing in front of 7.30 host Sarah Ferguson, before Matt throws away his microphone and storms off.
“Not a word nor a photo of it was true. It was an #AI

AI got personal last week, and it's more complicated than we think
A fake news story about me, a series of AI breakthroughs, and a resignation in the tech world show that 2026 could be pivotal for AI.

@Techmeme@techhub.social
2026-02-02 18:08:02

OpenAI launches a Codex app for macOS, designed to serve as a command center for managing AI agents, and says Codex usage has nearly doubled since mid-December (David Gewirtz/ZDNET)
https://www.zdnet.com/article/openai-codex-mac-app-free-trial/

OpenAI's Codex just got its own Mac app - and anyone can try it for free now
The new Mac app turns GPT-5.2-Codex into a multi-agent command center. Here's how.

@buercher@tooting.ch
2026-03-03 18:00:48

Advanced AI models appear willing to deploy nuclear weapons without the same reservations humans have when put into simulated geopolitical crises.
Kenneth Payne at King’s College London set three leading large language models – GPT-5.2, Claude Sonnet 4 and Gemini 3 Flash – against each other in simulated war games. The scenarios involved intense international standoffs, including border disputes, competition for scarce resources and existential threats to regime survival
https://www.newscientist.com/article/2516885-ais-cant-stop-recommending-nuclear-strikes-in-war-game-simulations/

@Techmeme@techhub.social
2026-01-27 18:31:03

OpenAI for Science launches Prism, a free LaTeX-based text editor that embeds GPT-5.2 to assist in scientific paper drafting and citation management (Will Douglas Heaven/MIT Technology Review)
https://www.technologyreview.com/2026/01/27/…

OpenAI’s latest product lets you vibe code science
Prism is a ChatGPT-powered text editor that automates much of the work involved in writing scientific papers.

@arXiv_csCL_bot@mastoxiv.page
2026-03-31 11:12:48

Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[2/5]:
- POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
Li, Cui, Wang, Ge, Huang, Li, Peng, Lu, Tashi, Wang, Dang
https://arxiv.org/abs/2511.09232 https://mastoxiv.page/@arXiv_csCL_bot/115541846907664054
- Beyond Elicitation: Provision-based Prompt Optimization for Knowledge-Intensive Tasks
Yunzhe Xu, Zhuosheng Zhang, Zhe Liu
https://arxiv.org/abs/2511.10465 https://mastoxiv.page/@arXiv_csCL_bot/115547607561282911
- $\pi$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
Dong Liu, Yanxuan Yu
https://arxiv.org/abs/2511.10696 https://mastoxiv.page/@arXiv_csCL_bot/115564418836654965
- Based on Data Balancing and Model Improvement for Multi-Label Sentiment Classification Performanc...
Zijin Su, Huanzhu Lyu, Yuren Niu, Yiming Liu
https://arxiv.org/abs/2511.14073 https://mastoxiv.page/@arXiv_csCL_bot/115575715073023141
- HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning
Alexis Correa-Guill\'en, Carlos G\'omez-Rodr\'iguez, David Vilares
https://arxiv.org/abs/2511.15355 https://mastoxiv.page/@arXiv_csCL_bot/115581410328165116
- Towards Hyper-Efficient RAG Systems in VecDBs: Distributed Parallel Multi-Resolution Vector Search
Dong Liu, Yanxuan Yu
https://arxiv.org/abs/2511.16681 https://mastoxiv.page/@arXiv_csCL_bot/115603508442305146
- Estonian WinoGrande Dataset: Comparative Analysis of LLM Performance on Human and Machine Transla...
Marii Ojastu, Hele-Andra Kuulmets, Aleksei Dorkin, Marika Borovikova, Dage S\"arg, Kairit Sirts
https://arxiv.org/abs/2511.17290 https://mastoxiv.page/@arXiv_csCL_bot/115604083224487885
- A Systematic Study of In-the-Wild Model Merging for Large Language Models
O\u{g}uz Ka\u{g}an Hitit, Leander Girrbach, Zeynep Akata
https://arxiv.org/abs/2511.21437 https://mastoxiv.page/@arXiv_csCL_bot/115621178703846052
- CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer
Lavish Bansal, Naman Mishra
https://arxiv.org/abs/2512.02711 https://mastoxiv.page/@arXiv_csCL_bot/115655090475535157
- Multilingual Medical Reasoning for Question Answering with Large Language Models
Pietro Ferrazzi, Aitor Soroa, Rodrigo Agerri
https://arxiv.org/abs/2512.05658 https://mastoxiv.page/@arXiv_csCL_bot/115683267711014189
- OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Convers...
Albrecht, Lehmann, Poltermann, Rudolph, Steigerwald, Stieler
https://arxiv.org/abs/2512.09804 https://mastoxiv.page/@arXiv_csCL_bot/115700409397020978
- Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, an...
Hanyu Cai, Binqi Shen, Lier Jin, Lan Hu, Xiaojing Fan
https://arxiv.org/abs/2512.12812 https://mastoxiv.page/@arXiv_csCL_bot/115729149622659403
- Beg to Differ: Understanding Reasoning-Answer Misalignment Across Languages
Ovalle, Ross, Ruder, Williams, Ullrich, Ibrahim, Sagun
https://arxiv.org/abs/2512.22712 https://mastoxiv.page/@arXiv_csCL_bot/115808161882146194
- Activation Steering for Masked Diffusion Language Models
Adi Shnaidman, Erin Feiglin, Osher Yaari, Efrat Mentel, Amit Levi, Raz Lapid
https://arxiv.org/abs/2512.24143 https://mastoxiv.page/@arXiv_csCL_bot/115819533211103315
- JMedEthicBench: A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese L...
Liu, Li, Niu, Zhang, Xun, Hou, Wang, Iwasawa, Matsuo, Hatakeyama-Sato
https://arxiv.org/abs/2601.01627 https://mastoxiv.page/@arXiv_csCL_bot/115847901607405421
- FACTUM: Mechanistic Detection of Citation Hallucination in Long-Form RAG
Dassen, Kotula, Murray, Yates, Lawrie, Kayi, Mayfield, Duh
https://arxiv.org/abs/2601.05866 https://mastoxiv.page/@arXiv_csCL_bot/115881545684182376
- {\dag}DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems
Zabir Al Nazi, Shubhashis Roy Dipta, Sudipta Kar
https://arxiv.org/abs/2601.06853 https://mastoxiv.page/@arXiv_csCL_bot/115887753245730019
- Symphonym: Universal Phonetic Embeddings for Cross-Script Name Matching
Stephen Gadd
https://arxiv.org/abs/2601.06932 https://mastoxiv.page/@arXiv_csCL_bot/115887767008671765
- LLMs versus the Halting Problem: Revisiting Program Termination Prediction
Sultan, Armengol-Estape, Kesseli, Vanegue, Shahaf, Adi, O'Hearn
https://arxiv.org/abs/2601.18987 https://mastoxiv.page/@arXiv_csCL_bot/115972010510378715
- MuVaC: A Variational Causal Framework for Multimodal Sarcasm Understanding in Dialogues
Diandian Guo, Fangfang Yuan, Cong Cao, Xixun Lin, Chuan Zhou, Hao Peng, Yanan Cao, Yanbing Liu
https://arxiv.org/abs/2601.20451 https://mastoxiv.page/@arXiv_csCL_bot/115977891530875024
toXiv_bot_toot

@Techmeme@techhub.social
2026-03-20 09:05:50

Xiaomi releases MiMo-V2-Pro, its new 1T-parameter foundation model, codenamed Hunter Alpha, which the company says benchmarks close to GPT-5.2 and Opus 4.6 (Carl Franzen/VentureBeat)
https://venturebeat.com/technology/xiaomi-stuns…

@Techmeme@techhub.social
2026-02-25 12:56:17

A study finds GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash deployed tactical nuclear weapons in 95% of 21 simulated war game scenarios, and never surrendered (Chris Stokel-Walker/New Scientist)
https://www.newscientist.com/article/25168

AIs can’t stop recommending nuclear strikes in war game simulations
Leading AIs from OpenAI, Anthropic and Google opted to use nuclear weapons in simulated war games in 95 per cent of cases

@Techmeme@techhub.social
2026-01-26 17:50:42

Qwen releases Qwen3-Max-Thinking, its flagship reasoning model that it says demonstrates performance comparable to models such as GPT-5.2 Thinking and Opus 4.5 (Qwen)
https://qwen.ai/blog?id=qwen3-max-thinking

Qwen
Qwen Chat offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.

@Techmeme@techhub.social
2026-01-25 01:50:58

Tests show GPT-5.2 on ChatGPT citing Grokipedia as a source on a wide range of queries, including on Iranian conglomerates and Holocaust deniers (Aisha Down/The Guardian)
https://www.theguardian.com/technology/2026/jan…

Latest ChatGPT model uses Elon Musk’s Grokipedia as source, tests reveal
Guardian found OpenAI’s platform cited Grokipedia on topics including Iran and Holocaust deniers

@Techmeme@techhub.social
2026-03-18 10:25:59

How AI's post-training process suppresses the creativity and whimsicality seen in earlier models like GPT-2, leading to bad writing from many top AI models (Jasmine Sun/The Atlantic)
https://www.

The Human Skill That Eludes AI
Why can’t language models write well?

Tootfinder

Opt-in global Mastodon full text search. Join the index!