Tootfinder

Opt-in global Mastodon full text search. Join the index!

@Techmeme@techhub.social
2026-02-11 00:15:57

OpenAI updates ChatGPT's deep research tool with GPT-5.2, a full-screen report view, and an option to focus research on specific websites (Matthias Bastian/The Decoder)
the-decoder.com/openais-deep-r

@digitalnaiv@mastodon.social
2026-02-13 10:19:00

OpenAI poliert die KI-Recherche: Deep Research in ChatGPT läuft jetzt auf GPT-5.2, liefert Vollbild-Reports mit Inhaltsverzeichnis & Quellen und lässt gezielt Websites priorisieren. Mehr Kontrolle statt „Wischi-waschi“ – aber kein Freibrief für unkritische KI-Outputs. #DeepResearch #KI

@Techmeme@techhub.social
2026-03-08 20:55:45

Luma AI debuts Uni-1, an image model that combines image understanding and generation in a single architecture, topping Nano Banana 2 on logic-based benchmarks (Matthias Bastian/The Decoder)
the-decoder.com/luma-ais-new-u

@Techmeme@techhub.social
2026-03-05 18:25:49

OpenAI says GPT-5.4's "individual claims are 33% less likely to be false and its full responses are 18% less likely to contain any errors, relative to GPT-5.2" (David Gewirtz/ZDNET)
zdnet.com/article/openai-gpt-5

@Techmeme@techhub.social
2026-03-05 18:12:11

OpenAI says GPT-5.4 produces presentations with stronger, more varied aesthetics and makes more effective use of its image generation tools (Igor Bonifacic/Engadget)
engadget.com/ai/i-hope-you-lik

@ErikJonker@mastodon.social
2026-02-26 17:44:52

De ontwikkeling van GPT-NL.
#gptnl

@Techmeme@techhub.social
2026-03-03 18:41:03

OpenAI says GPT-5.3 Instant's tone should feel less "cringe" than GPT-5.2 Instant and has a smoother, more to-the-point conversational style (Marcus Schuler/Implicator.ai)
implicator.ai/openai-ships-gpt

@heiseonline@social.heise.de
2026-02-26 08:08:51

Was passiert, wenn man KI-Modelle wie GPT-5.2, Claude Sonnet 4 oder Gemini 3 Flash als Krisenberater einsetzt? Forscher des King's College London haben genau das in Konfliktsimulationen getestet – mit erschreckenden Ergebnissen. 😰
Zum Artikel: heis…

Auf dem Bild ist eine Person mit VR-Brille in Militäruniform zu sehen. Im Bild steht: "KI-Modelle greifen in Kriegssimulationen fast immer zu Atomwaffen" darunter steht: "Forscher warnen: Als militärische Entscheider wären aktuelle Systeme brandgefährlich."
@Techmeme@techhub.social
2026-03-04 15:16:41

Source: OpenAI is preparing to launch GPT-5.4, which will feature an "extreme" reasoning mode and a context window of 1M tokens, up from 400K in GPT-5.2 (Stephanie Palazzolo/The Information)
theinformation.com/articles/op

@Techmeme@techhub.social
2026-03-05 18:41:00

GPT-5.4 is priced at $2.50/1M input tokens and $15/1M output tokens; GPT-5.4 Pro is priced at $30/1M input tokens and $180/1M output tokens (Carl Franzen/VentureBeat)
venturebeat.com/technology/ope

A damning new study could put AI companies on the defensive.
In it, Stanford and Yale researchers found compelling evidence that AI models are actually copying all that data,
not “learning” from it.
Specifically, four prominent LLMs
— OpenAI’s GPT-4.1, Google’s Gemini 2.5 Pro, xAI’s Grok 3, and Anthropic’s Claude 3.7 Sonnet
— happily reproduced lengthy excerpts from popular
— and protected
— works, with a stunning degree of accuracy.
They fou…

@almad@fosstodon.org
2026-02-22 19:11:34

Ah yes, #LLM for exploit development.
In other words, we’ll now spent billions on offense & prevention to achieve new equilibrium (that we already sort of had).
Good job, us. #infosec

@arXiv_csCL_bot@mastoxiv.page
2026-03-31 10:12:07

Compressing Transformer Language Models via Matrix Product Operator Decomposition: A Case Study on PicoGPT
Younes Javanmard, Tanmoy Pandit, Masoud Mardani
arxiv.org/abs/2603.28534 arxiv.org/pdf/2603.28534 arxiv.org/html/2603.28534
arXiv:2603.28534v1 Announce Type: new
Abstract: Transformer-based language models achieve strong performance across NLP tasks, but their quadratic parameter scaling with hidden dimension makes deployment on resource-constrained hardware expensive. We study Matrix Product Operator (MPO) decomposition as a principled compression method for transformers. MPO factorises weight matrices into chains of low-rank cores, with approximation quality controlled by the bond dimension chi. We replace every nn.Linear layer in PicoGPT, a GPT-2-style character-level language model with about 1M parameters, with an MPOLinear module parameterised as an MPO chain. Cores are initialised either by TT-SVD from pretrained dense weights or from random initialisation, and trained using standard PyTorch autograd without a custom backward pass. We derive balanced factorisation schemes for the five distinct weight shapes in PicoGPT and evaluate bond dimensions chi in {4, 8, 16, 32} on Tiny Shakespeare. MPO compression achieves up to 13x compression per transformer block at chi = 4. At chi = 16, the model uses 191,872 parameters instead of 1,020,224 while retaining 97.7% of baseline token accuracy (51.6% vs 52.8%). Reconstruction error follows the expected trend and is lower for three-site than two-site factorisations at the same bond dimension. The chi = 8 model gives the best accuracy per parameter, exceeding the dense baseline by 2.7x on this metric. These results show that MPO parameterisation is a practical and theoretically grounded alternative to low-rank methods and unstructured pruning for transformer compression.
toXiv_bot_toot

@davej@dice.camp
2026-02-15 20:58:43

“A week ago, someone sent me a link to an online article describing a flaming confrontation between me and the CEO of the Commonwealth Bank, Matt Comyn, on the set of 7.30.
“The story was 2,000 words long, very detailed, and had pictures of Comyn and me arguing in front of 7.30 host Sarah Ferguson, before Matt throws away his microphone and storms off.
“Not a word nor a photo of it was true. It was an #AI

@Techmeme@techhub.social
2026-02-02 18:08:02

OpenAI launches a Codex app for macOS, designed to serve as a command center for managing AI agents, and says Codex usage has nearly doubled since mid-December (David Gewirtz/ZDNET)
zdnet.com/article/openai-codex

@buercher@tooting.ch
2026-03-03 18:00:48

Advanced AI models appear willing to deploy nuclear weapons without the same reservations humans have when put into simulated geopolitical crises.
Kenneth Payne at King’s College London set three leading large language models – GPT-5.2, Claude Sonnet 4 and Gemini 3 Flash – against each other in simulated war games. The scenarios involved intense international standoffs, including border disputes, competition for scarce resources and existential threats to regime survival
newscientist.com/article/25168

@Techmeme@techhub.social
2026-01-27 18:31:03

OpenAI for Science launches Prism, a free LaTeX-based text editor that embeds GPT-5.2 to assist in scientific paper drafting and citation management (Will Douglas Heaven/MIT Technology Review)
technologyreview.com/2026/01/2

@arXiv_csCL_bot@mastoxiv.page
2026-03-31 11:12:48

Replaced article(s) found for cs.CL. arxiv.org/list/cs.CL/new
[2/5]:
- POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
Li, Cui, Wang, Ge, Huang, Li, Peng, Lu, Tashi, Wang, Dang
arxiv.org/abs/2511.09232 mastoxiv.page/@arXiv_csCL_bot/
- Beyond Elicitation: Provision-based Prompt Optimization for Knowledge-Intensive Tasks
Yunzhe Xu, Zhuosheng Zhang, Zhe Liu
arxiv.org/abs/2511.10465 mastoxiv.page/@arXiv_csCL_bot/
- $\pi$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
Dong Liu, Yanxuan Yu
arxiv.org/abs/2511.10696 mastoxiv.page/@arXiv_csCL_bot/
- Based on Data Balancing and Model Improvement for Multi-Label Sentiment Classification Performanc...
Zijin Su, Huanzhu Lyu, Yuren Niu, Yiming Liu
arxiv.org/abs/2511.14073 mastoxiv.page/@arXiv_csCL_bot/
- HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning
Alexis Correa-Guill\'en, Carlos G\'omez-Rodr\'iguez, David Vilares
arxiv.org/abs/2511.15355 mastoxiv.page/@arXiv_csCL_bot/
- Towards Hyper-Efficient RAG Systems in VecDBs: Distributed Parallel Multi-Resolution Vector Search
Dong Liu, Yanxuan Yu
arxiv.org/abs/2511.16681 mastoxiv.page/@arXiv_csCL_bot/
- Estonian WinoGrande Dataset: Comparative Analysis of LLM Performance on Human and Machine Transla...
Marii Ojastu, Hele-Andra Kuulmets, Aleksei Dorkin, Marika Borovikova, Dage S\"arg, Kairit Sirts
arxiv.org/abs/2511.17290 mastoxiv.page/@arXiv_csCL_bot/
- A Systematic Study of In-the-Wild Model Merging for Large Language Models
O\u{g}uz Ka\u{g}an Hitit, Leander Girrbach, Zeynep Akata
arxiv.org/abs/2511.21437 mastoxiv.page/@arXiv_csCL_bot/
- CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer
Lavish Bansal, Naman Mishra
arxiv.org/abs/2512.02711 mastoxiv.page/@arXiv_csCL_bot/
- Multilingual Medical Reasoning for Question Answering with Large Language Models
Pietro Ferrazzi, Aitor Soroa, Rodrigo Agerri
arxiv.org/abs/2512.05658 mastoxiv.page/@arXiv_csCL_bot/
- OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Convers...
Albrecht, Lehmann, Poltermann, Rudolph, Steigerwald, Stieler
arxiv.org/abs/2512.09804 mastoxiv.page/@arXiv_csCL_bot/
- Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, an...
Hanyu Cai, Binqi Shen, Lier Jin, Lan Hu, Xiaojing Fan
arxiv.org/abs/2512.12812 mastoxiv.page/@arXiv_csCL_bot/
- Beg to Differ: Understanding Reasoning-Answer Misalignment Across Languages
Ovalle, Ross, Ruder, Williams, Ullrich, Ibrahim, Sagun
arxiv.org/abs/2512.22712 mastoxiv.page/@arXiv_csCL_bot/
- Activation Steering for Masked Diffusion Language Models
Adi Shnaidman, Erin Feiglin, Osher Yaari, Efrat Mentel, Amit Levi, Raz Lapid
arxiv.org/abs/2512.24143 mastoxiv.page/@arXiv_csCL_bot/
- JMedEthicBench: A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese L...
Liu, Li, Niu, Zhang, Xun, Hou, Wang, Iwasawa, Matsuo, Hatakeyama-Sato
arxiv.org/abs/2601.01627 mastoxiv.page/@arXiv_csCL_bot/
- FACTUM: Mechanistic Detection of Citation Hallucination in Long-Form RAG
Dassen, Kotula, Murray, Yates, Lawrie, Kayi, Mayfield, Duh
arxiv.org/abs/2601.05866 mastoxiv.page/@arXiv_csCL_bot/
- {\dag}DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems
Zabir Al Nazi, Shubhashis Roy Dipta, Sudipta Kar
arxiv.org/abs/2601.06853 mastoxiv.page/@arXiv_csCL_bot/
- Symphonym: Universal Phonetic Embeddings for Cross-Script Name Matching
Stephen Gadd
arxiv.org/abs/2601.06932 mastoxiv.page/@arXiv_csCL_bot/
- LLMs versus the Halting Problem: Revisiting Program Termination Prediction
Sultan, Armengol-Estape, Kesseli, Vanegue, Shahaf, Adi, O'Hearn
arxiv.org/abs/2601.18987 mastoxiv.page/@arXiv_csCL_bot/
- MuVaC: A Variational Causal Framework for Multimodal Sarcasm Understanding in Dialogues
Diandian Guo, Fangfang Yuan, Cong Cao, Xixun Lin, Chuan Zhou, Hao Peng, Yanan Cao, Yanbing Liu
arxiv.org/abs/2601.20451 mastoxiv.page/@arXiv_csCL_bot/
toXiv_bot_toot

@Techmeme@techhub.social
2026-03-20 09:05:50

Xiaomi releases MiMo-V2-Pro, its new 1T-parameter foundation model, codenamed Hunter Alpha, which the company says benchmarks close to GPT-5.2 and Opus 4.6 (Carl Franzen/VentureBeat)
venturebeat.com/technology/xia

@Techmeme@techhub.social
2026-02-25 12:56:17

A study finds GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash deployed tactical nuclear weapons in 95% of 21 simulated war game scenarios, and never surrendered (Chris Stokel-Walker/New Scientist)
newscientist.com/article/25168

@Techmeme@techhub.social
2026-01-26 17:50:42

Qwen releases Qwen3-Max-Thinking, its flagship reasoning model that it says demonstrates performance comparable to models such as GPT-5.2 Thinking and Opus 4.5 (Qwen)
qwen.ai/blog?id=qwen3-max-thin

@Techmeme@techhub.social
2026-01-25 01:50:58

Tests show GPT-5.2 on ChatGPT citing Grokipedia as a source on a wide range of queries, including on Iranian conglomerates and Holocaust deniers (Aisha Down/The Guardian)
theguardian.com/technology/202

@Techmeme@techhub.social
2026-03-18 10:25:59

How AI's post-training process suppresses the creativity and whimsicality seen in earlier models like GPT-2, leading to bad writing from many top AI models (Jasmine Sun/The Atlantic)