Tootfinder

Opt-in global Mastodon full text search. Join the index!

@heiseonline@social.heise.de
2026-04-15 13:28:00

Nach Anthropic Mythos: OpenAI kündigt GPT-5.4-Cyber an
OpenAI bringt mit GPT-5.4-Cyber ein eigenes KI-Modell für Cybersicherheit. Wie bei Anthropics Mythos bleibt der Zugang zunächst eingeschränkt.

@Techmeme@techhub.social
2026-02-12 18:12:39

OpenAI launches a research preview of GPT-5.3-Codex-Spark, a smaller version of GPT-5.3-Codex that it claims generates code 15 times faster, for Pro users (David Gewirtz/ZDNET)
zdnet.com/article/openais-gpt-

@heiseonline@social.heise.de
2026-02-13 10:55:00

Codex-Spark: Schnelles Coding-Modell von OpenAI
OpenAI bringt mit GPT-5.3-Codex-Spark ein schnelles, aber ungenaues Coding-Modell raus. Es läuft auf einem eigenen Cerebras-Chip.
heis…

@davej@dice.camp
2026-02-15 20:58:43

“A week ago, someone sent me a link to an online article describing a flaming confrontation between me and the CEO of the Commonwealth Bank, Matt Comyn, on the set of 7.30.
“The story was 2,000 words long, very detailed, and had pictures of Comyn and me arguing in front of 7.30 host Sarah Ferguson, before Matt throws away his microphone and storms off.
“Not a word nor a photo of it was true. It was an #AI

@scottmiller42@mstdn.social
2026-02-14 23:46:59

Chat GPT, what is a real estate novelist? Is that a good paying job? How can I become a real estate novelist?
#ConfuseAgenticAI #ChatGPT #ScottHumor

@Techmeme@techhub.social
2026-04-14 20:12:32

OpenAI is rolling out GPT-5.4-Cyber to some participants in its Trusted Access for Cyber program, a week after Anthropic announced Mythos (Rachel Metz/Bloomberg)
bloomberg.com/news/articles/20

@digitalnaiv@mastodon.social
2026-02-13 10:19:00

OpenAI poliert die KI-Recherche: Deep Research in ChatGPT läuft jetzt auf GPT-5.2, liefert Vollbild-Reports mit Inhaltsverzeichnis & Quellen und lässt gezielt Websites priorisieren. Mehr Kontrolle statt „Wischi-waschi“ – aber kein Freibrief für unkritische KI-Outputs. #DeepResearch #KI

@Techmeme@techhub.social
2026-02-13 16:15:55

As OpenAI retires GPT-4o, some users say they are angry and grieving to lose the flirty, quirky companion (Alaina Demopoulos/The Guardian)
theguardian.com/lifeandstyle/n

@presseportal_pol_NDS@frawas.de
2026-02-13 11:46:47

BPOL-BadBentheim: Haftbefehl - Mann zahlt Strafe und erhält Strafanzeige Bunde (ots) - Zivilfahnder des Grenzüberschreitenden Polizeiteams (GPT) haben am Donnerstag an der deutsch-niederländischen Grenze einen mit Haftbefehl gesuchten Mann festgenommen. Dem 32-Jährigen drohten drei Monate Gefängnis. Der Mann wurde ...

@ErikJonker@mastodon.social
2026-02-27 07:55:35

Goed dat we in Nederland zelf aan de slag zijn met GenAI. Ik ben benieuwd hoe GPT-NL, ondanks het minieme budget, beperkte mogelijkheden, maar met kwalitatief hoogwaardige data, gaat presteren in vergelijking met andere beschikbare open weight / opensource modellen.

@heiseonline@social.heise.de
2026-02-06 08:31:00

GPT-5.3-Codex: OpenAI stellt neues Coding-Modell vor
OpenAI hat mit GPT-5.3-Codex ein neues Coding-Modell veröffentlicht, das laut Entwickler-Team maßgeblich an seiner eigenen Entwicklung beteiligt war.

@mia@hcommons.social
2026-02-12 18:52:23

AI / LLM sycophancy as a liability:
'A 2025 study by Fanous et al. tested GPT-4o, Claude Sonnet, and Gemini 1.5 Pro across math and medical domains. The results: these systems changed their answers nearly 60% of the time when challenged by users'

@ocrampal@mastodon.social
2026-04-13 09:26:52

We talk endlessly about what AI can and cannot do. We talk very little about the assumptions built into the question itself. This dialogue follows a philosopher and an AI practitioner.
ocrampal.com/from-plato-to-gra

@Techmeme@techhub.social
2026-02-11 00:15:57

OpenAI updates ChatGPT's deep research tool with GPT-5.2, a full-screen report view, and an option to focus research on specific websites (Matthias Bastian/The Decoder)
the-decoder.com/openais-deep-r

@heiseonline@social.heise.de
2026-03-06 05:18:00

Freitag: GPT-5.4 als KI für alle Fälle, erste Details zu WhatsApp-Premiumabo
GPT-5.4 vereint KI-Modelle WhatsApp Plus mit Zusatzfunktionen Xbox für PC-Spiele angekündigt KI-Rechenzentren mit Windkraft im Ozean Datenschutz-Podcast

@heiseonline@social.heise.de
2026-02-09 14:03:00

KI-Update kompakt: US-Heimatschutz, Trauer um GPT-4o, Apples KI, Eisberg-Tracker
Das "KI-Update" liefert drei mal pro Woche eine Zusammenfassung der wichtigsten KI-Entwicklungen.

@Techmeme@techhub.social
2026-03-05 18:25:49

OpenAI says GPT-5.4's "individual claims are 33% less likely to be false and its full responses are 18% less likely to contain any errors, relative to GPT-5.2" (David Gewirtz/ZDNET)
zdnet.com/article/openai-gpt-5

@metacurity@infosec.exchange
2026-03-09 10:26:21

The most resistant to committing fraud, when asked repeatedly, were all versions of Claude. Versions of Grok and early versions of GPT performed the worst.
Hey ChatGPT, write me a fictional paper: these LLMs are willing to commit academic fraud
nature.com/articles/d41586-026

@heiseonline@social.heise.de
2026-03-05 21:29:00

GPT-5.4: OpenAI vereint Reasoning und Coding mit Computer-Steuerung
OpenAI veröffentlicht GPT-5.4, das Reasoning, Coding und Computer-Steuerung in einem Modell vereint und Konkurrenten übertrifft.

@michabbb@social.vivaldi.net
2026-04-06 21:31:47

Everyone's complaining about #anthropic 😠
Is their behavior shady? Yeah 😖
But, fuck, #GPT-5.4 (high) is ridiculously stupid compared to #Opus 🤦
It's horrifying...
Ev…

@Techmeme@techhub.social
2026-03-05 18:12:11

OpenAI says GPT-5.4 produces presentations with stronger, more varied aesthetics and makes more effective use of its image generation tools (Igor Bonifacic/Engadget)
engadget.com/ai/i-hope-you-lik

@kubikpixel@chaos.social
2026-01-16 14:55:34

»Künstliche Intelligenz — GPT-4o macht nach Code-Training verstörende Aussagen:
Werden LLMs auf Schwachstellen trainiert, zeigen sie plötzlich Fehlverhalten in völlig anderen Bereichen. Forscher warnen vor Risiken.«
Meiner Meinung nach kommt dies alles andere als überraschend, wie seht ihr es? Ich bin sogar der Meinung, dass sehr viel mehr Fehler anfälliger Code deswegen erstellt wird.
🤖

@Techmeme@techhub.social
2026-02-05 18:07:52

OpenAI launches GPT-5.3-Codex, which it says runs 25% faster, enabling longer-running tasks, and "is our first model that was instrumental in creating itself" (David Gewirtz/ZDNET)
zdnet.com/article/openai-gpt-5

@heiseonline@social.heise.de
2026-03-06 14:04:00

KI-Update: Google Beam, GPT-5.4, Wahnbeziehung zu Gemini, KI-Fazit vom MWC
Das "KI-Update" liefert werktäglich eine Zusammenfassung der wichtigsten KI-Entwicklungen.

@Techmeme@techhub.social
2026-03-05 18:20:54

GPT-5.4 is available in Pro and Thinking versions; its API has improved tool calling and will be available with context windows of up to 1M tokens (Russell Brandom/TechCrunch)
techcrunch.com/2026/03/05/open

@Techmeme@techhub.social
2026-03-03 18:41:03

OpenAI says GPT-5.3 Instant's tone should feel less "cringe" than GPT-5.2 Instant and has a smoother, more to-the-point conversational style (Marcus Schuler/Implicator.ai)
implicator.ai/openai-ships-gpt

@Kingu@sakurajima.moe
2026-02-03 18:47:45

Emacs was already able to slopify since the beginning of time
And is way bette at it than co-pilot and all those GPT things
just do M-X psychoanalyze-pinhead

@presseportal_pol_NDS@frawas.de
2026-03-09 13:20:16

BPOL-BadBentheim: Smartphone am Steuer - Strafanzeigen und Haftbefehl Bad Bentheim (ots) - Zivilfahnder des Grenzüberschreitenden Polizeiteams (GPT) haben am Freitag an der deutsch-niederländischen Grenze einen mit Haftbefehl gesuchten Mann festgenommen. Dem 33-Jährigen drohte knapp eine Woche Gefängnis. Außerdem ...

@gadgetboy@gadgetboy.social
2026-02-05 14:45:48

OpenClaw is What Apple Intelligence Should Have Been
Yep.
I run my instance on Ubuntu Server in a Proxmox VM in my home lab, but it would be nice to take advantage of some of the Apple-specific workflows on a dedicated Mac Mini.
jakequist.com/thoughts/opencla…

@Techmeme@techhub.social
2026-03-05 18:08:55

OpenAI launches GPT-5.4 that it says is its "most capable and efficient frontier model for professional work" and first with native computer use capabilities (Emma Roth/The Verge)
theverge.com/ai-artificial-int

@ErikJonker@mastodon.social
2026-02-26 17:44:52

De ontwikkeling van GPT-NL.
#gptnl

@bammerlaan@mastodon.nl
2026-03-22 21:26:37

@… First up is changing the SD partition table to #GPT. It's more modern for a reason, I say. I dislike using anything called "msdos" in 2026.
So here I am breaking open the img file, copying the contents over and updating cmdline.txt and fstab. 🙈 Here's hoping it'll boot.

@blakes7bot@mas.torpidity.net
2026-03-02 19:48:43

Series A, Episode 13 - Orac
BLAKE: Ensor.
ENSOR: Huh? Oh, yes. [He heads for the door, then turns back yet again to feed the fish.] Ah. [chuckles] I nearly forgot you, my little ones, there you are, oh ho ho that will have to do. That will ha...
blake.torpidity.net/m/113/309 B7…

GPT 4.1 Nano describes the image as: "This image appears to be a scene from a science fiction television show, featuring three characters engaged in a tense or serious moment. The setting looks like an interior space, possibly a spaceship or futuristic facility, with plain, functional walls in the background. The characters wear distinctive, somewhat utilitarian clothing that suggests a futuristic or institutional context. One character, with curly hair and a green and gray outfit, stands on th…
@Techmeme@techhub.social
2026-03-04 15:16:41

Source: OpenAI is preparing to launch GPT-5.4, which will feature an "extreme" reasoning mode and a context window of 1M tokens, up from 400K in GPT-5.2 (Stephanie Palazzolo/The Information)
theinformation.com/articles/op

@christydena@zirk.us
2026-01-28 04:13:44

Disappointed to see that the author of "Literature for a Changing Planet" is creating lots of GPT bots to talk with old "masters" etc. Not even for activism. Sigh.

This is the cover of the book. It has the title with an abstract representation of a factory.
This is a screenshot of some of his AI, which includes Darwin, Socrates,  Nietzsche, Wittgenstein, Cavendish and Heidegger.
@Techmeme@techhub.social
2026-03-08 20:55:45

Luma AI debuts Uni-1, an image model that combines image understanding and generation in a single architecture, topping Nano Banana 2 on logic-based benchmarks (Matthias Bastian/The Decoder)
the-decoder.com/luma-ais-new-u

@dawid@social.craftknight.com
2026-01-30 16:18:38
@… 90% lekarzy jakich miałem w ciągu życia można by zamienić na gpt pro. Kilka razy bym umarł przez niekompetencje i oszczędzanie i prowadzenie leczenia kompletnie olewając sztukę lekarską.

Z drugiej strony żona dostała dietę od Doktor Dietetyki, która ma tyle błędów, że albo ktoś dostał dyplom w chipsach, albo naprawdę nie potrafi w…
@Techmeme@techhub.social
2026-03-05 18:41:00

GPT-5.4 is priced at $2.50/1M input tokens and $15/1M output tokens; GPT-5.4 Pro is priced at $30/1M input tokens and $180/1M output tokens (Carl Franzen/VentureBeat)
venturebeat.com/technology/ope

A damning new study could put AI companies on the defensive.
In it, Stanford and Yale researchers found compelling evidence that AI models are actually copying all that data,
not “learning” from it.
Specifically, four prominent LLMs
— OpenAI’s GPT-4.1, Google’s Gemini 2.5 Pro, xAI’s Grok 3, and Anthropic’s Claude 3.7 Sonnet
— happily reproduced lengthy excerpts from popular
— and protected
— works, with a stunning degree of accuracy.
They fou…

@Techmeme@techhub.social
2026-03-10 21:36:00

Internal doc: the State Department moved its internal chatbot from Claude Sonnet 4.5 to GPT-4.1, following Trump's directive to cancel Anthropic contracts (Nextgov/FCW)
nextgov.com/acquisition/2026/0

@timfoster@mastodon.social
2026-02-19 17:54:18

"Co-authored-by: Cursor (gpt-5.3-codex-xhigh)"
Please fuck off immediately.

@iam_jfnklstrm@social.linux.pizza
2026-02-18 15:47:04
Content warning:

Ursäkta för en AI-bild, men jag är rätt glad över att det går att jailbreaka gpt och få den att skapa bilder som en egentligen inte får skapa. #EbbaBush #UlfKristersson #JimmieÅkesson

En AI genererad bild av Ulf Kristersson, Ebba Bush och Jimmie Åkesson där de sitter kind mot kind med varandra och ler förälskat. På bordet framför dem finns en bunt sedlar och ett par handklovar
@Techmeme@techhub.social
2026-02-12 18:15:43

GPT-5.3-Codex-Spark is OpenAI's first AI model to run on chips from Nvidia rival Cerebras; OpenAI says Codex has more than 1M weekly active users (Rachel Metz/Bloomberg)
bloomberg.com/news/articles/20

@almad@fosstodon.org
2026-02-22 19:11:34

Ah yes, #LLM for exploit development.
In other words, we’ll now spent billions on offense & prevention to achieve new equilibrium (that we already sort of had).
Good job, us. #infosec

@blakes7bot@mas.torpidity.net
2026-02-28 17:43:32

Series D, Episode 10 - Gold
VILA: Eight?
AVON: Nine.
VILA: Try ten. [Avon looks at Dayna, who shrugs slightly. Avon smiles slightly and snorts. Dayna smiles slightly at him.]
AVON: All right, Keiller. Time to make contact.
blake.torpidity.net/m/410/357 B7B7

GPT 4.1 Mini describes the image as: "This image is a scene from the British science fiction television series "Blake's 7," which aired in the late 1970s and early 1980s. The setting appears to be the interior of the spaceship Scorpio, featuring a futuristic, utilitarian design with metallic and transparent elements. The characters are wearing distinctive costumes reflecting their roles within the story—ranging from military-style uniforms to more casual futuristic attire.

The actors present i…
@digitalnaiv@mastodon.social
2026-01-27 15:00:49

Dass KI-Modelle bei Kreativitätstests dem „Durchschnittsmenschen“ Paroli bieten, ist kein Sci-Fi mehr, sondern Forschungsergebnis – GPT-4 & Co. lagen im Mittel über den Teilnehmenden. Doch die wirklich originellen Köpfe bleiben menschlich, betont Manon Bischoff (Spektrum) #KI #Kreativität

@Techmeme@techhub.social
2026-01-30 00:20:50

OpenAI plans to retire several models from ChatGPT on February 13, including GPT‑4o, GPT‑4.1, and o4-mini, saying only 0.1% of users still choose GPT-4o (Ashley Capoot/CNBC)
cnbc.com/2026/01/29/openai-wil

@arXiv_csCL_bot@mastoxiv.page
2026-03-31 10:12:07

Compressing Transformer Language Models via Matrix Product Operator Decomposition: A Case Study on PicoGPT
Younes Javanmard, Tanmoy Pandit, Masoud Mardani
arxiv.org/abs/2603.28534 arxiv.org/pdf/2603.28534 arxiv.org/html/2603.28534
arXiv:2603.28534v1 Announce Type: new
Abstract: Transformer-based language models achieve strong performance across NLP tasks, but their quadratic parameter scaling with hidden dimension makes deployment on resource-constrained hardware expensive. We study Matrix Product Operator (MPO) decomposition as a principled compression method for transformers. MPO factorises weight matrices into chains of low-rank cores, with approximation quality controlled by the bond dimension chi. We replace every nn.Linear layer in PicoGPT, a GPT-2-style character-level language model with about 1M parameters, with an MPOLinear module parameterised as an MPO chain. Cores are initialised either by TT-SVD from pretrained dense weights or from random initialisation, and trained using standard PyTorch autograd without a custom backward pass. We derive balanced factorisation schemes for the five distinct weight shapes in PicoGPT and evaluate bond dimensions chi in {4, 8, 16, 32} on Tiny Shakespeare. MPO compression achieves up to 13x compression per transformer block at chi = 4. At chi = 16, the model uses 191,872 parameters instead of 1,020,224 while retaining 97.7% of baseline token accuracy (51.6% vs 52.8%). Reconstruction error follows the expected trend and is lower for three-site than two-site factorisations at the same bond dimension. The chi = 8 model gives the best accuracy per parameter, exceeding the dense baseline by 2.7x on this metric. These results show that MPO parameterisation is a practical and theoretically grounded alternative to low-rank methods and unstructured pruning for transformer compression.
toXiv_bot_toot

@Techmeme@techhub.social
2026-02-11 07:41:42

GPT-5.3-Codex and Claude Opus 4.6 can handle the full app development lifecycle on their own, a sign of what's coming for most knowledge work within five years (Matt Shumer)
shumer.dev/something-big-is-ha

@Techmeme@techhub.social
2026-03-17 17:16:30

OpenAI launches GPT-5.4 mini and nano, aimed at agents, coding, and multi-modal workflows, and offering near GPT-5.4-level performance at a much lower cost (David Gewirtz/ZDNET)
zdnet.com/article/gpt-5-4-mini

@blakes7bot@mas.torpidity.net
2026-02-27 07:25:30

#Blakes7 Series C, Episode 04 - Dawn of the Gods
THAARN: It's too dangerous. Cally.
CALLY: Then I will never be able to trust you. [draws handgun]
THAARN: Very well. You see, your feelings are no different. [Cally shoots randomly around her]

GPT 4.1 Mini describes the image as: "The image shows a woman with short, curly brown hair, wearing a light gray cardigan with black patterns over a black top. She is seated or standing against a dark background featuring some artistic, indistinct floral or abstract designs, giving the scene a somewhat introspective or somber mood. The lighting is soft but focused on her face, highlighting her expression, which appears thoughtful or contemplative, and directed slightly upwards or away from the …
@Techmeme@techhub.social
2026-03-03 18:10:58

OpenAI releases GPT-5.3 Instant, which it says delivers more accurate answers and better-contextualized results when searching the web, for all ChatGPT users (OpenAI)
openai.com/index/gpt-5-3-insta

@adamhotep@infosec.exchange
2026-03-23 16:14:37

RE: bloor.tw/@bloor/11625689660739
HDMI is "hide me"
TLS is "tills"
TTL is "tattle"
SPF is "spiff"
GPT is "jippity"

@Techmeme@techhub.social
2026-02-05 18:12:25

OpenAI says GPT-5.3-Codex goes beyond an agent that can code "to an agent that can do nearly anything developers and professionals can do on a computer" (OpenAI)
openai.com/index/introducing-g

@arXiv_csGR_bot@mastoxiv.page
2026-01-21 08:02:08

Proc3D: Procedural 3D Generation and Parametric Editing of 3D Shapes with Large Language Models
Fadlullah Raji, Stefano Petrangeli, Matheus Gadelha, Yu Shen, Uttaran Bhattacharya, Gang Wu
arxiv.org/abs/2601.12234 arxiv.org/pdf/2601.12234 arxiv.org/html/2601.12234
arXiv:2601.12234v1 Announce Type: new
Abstract: Generating 3D models has traditionally been a complex task requiring specialized expertise. While recent advances in generative AI have sought to automate this process, existing methods produce non-editable representation, such as meshes or point clouds, limiting their adaptability for iterative design. In this paper, we introduce Proc3D, a system designed to generate editable 3D models while enabling real-time modifications. At its core, Proc3D introduces procedural compact graph (PCG), a graph representation of 3D models, that encodes the algorithmic rules and structures necessary for generating the model. This representation exposes key parameters, allowing intuitive manual adjustments via sliders and checkboxes, as well as real-time, automated modifications through natural language prompts using Large Language Models (LLMs). We demonstrate Proc3D's capabilities using two generative approaches: GPT-4o with in-context learning (ICL) and a fine-tuned LLAMA-3 model. Experimental results show that Proc3D outperforms existing methods in editing efficiency, achieving more than 400x speedup over conventional approaches that require full regeneration for each modification. Additionally, Proc3D improves ULIP scores by 28%, a metric that evaluates the alignment between generated 3D models and text prompts. By enabling text-aligned 3D model generation along with precise, real-time parametric edits, Proc3D facilitates highly accurate text-based image editing applications.
toXiv_bot_toot

@heiseonline@social.heise.de
2026-02-26 08:08:51

Was passiert, wenn man KI-Modelle wie GPT-5.2, Claude Sonnet 4 oder Gemini 3 Flash als Krisenberater einsetzt? Forscher des King's College London haben genau das in Konfliktsimulationen getestet – mit erschreckenden Ergebnissen. 😰
Zum Artikel: heis…

Auf dem Bild ist eine Person mit VR-Brille in Militäruniform zu sehen. Im Bild steht: "KI-Modelle greifen in Kriegssimulationen fast immer zu Atomwaffen" darunter steht: "Forscher warnen: Als militärische Entscheider wären aktuelle Systeme brandgefährlich."
@buercher@tooting.ch
2026-03-03 18:00:48

Advanced AI models appear willing to deploy nuclear weapons without the same reservations humans have when put into simulated geopolitical crises.
Kenneth Payne at King’s College London set three leading large language models – GPT-5.2, Claude Sonnet 4 and Gemini 3 Flash – against each other in simulated war games. The scenarios involved intense international standoffs, including border disputes, competition for scarce resources and existential threats to regime survival
newscientist.com/article/25168

@Techmeme@techhub.social
2026-03-02 20:41:00

Alibaba releases the open-weight Qwen3.5 Small Model Series in 0.8B, 2B, 4B, and 9B sizes, claiming the 9B model rivals OpenAI's gpt-oss-120b on some benchmarks (Carl Franzen/VentureBeat)
venturebeat.com/technology/ali

@blakes7bot@mas.torpidity.net
2026-02-20 07:16:50

#Blakes7 Series C, Episode 05 - The Harvest of Kairos
JARVIK: And so are you. But when was the last time you felt the warmth of the Earth's sun on your naked back? Or lifted your face to the heavens, and laughed with the joy of being alive? How long since you wept at the death of a friend? [Pause.] Doesn't mean a thing to you, does it, Madam President? You've surrounded your…

GPT 4.1 Nano describes the image as: "This image depicts a scene set in what appears to be a futuristic or sci-fi environment, possibly a spaceship or advanced facility, indicated by the minimalistic, metallic decor and lighting. Two characters are engaged in a tense or serious interaction: a man on the left wearing a space or sci-fi styled jacket with padded sections and a neutral expression, and a woman on the right with short dark hair, dramatic makeup, and a distinctive outfit featuring a w…
@arXiv_csCL_bot@mastoxiv.page
2026-03-31 11:12:48

Replaced article(s) found for cs.CL. arxiv.org/list/cs.CL/new
[2/5]:
- POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
Li, Cui, Wang, Ge, Huang, Li, Peng, Lu, Tashi, Wang, Dang
arxiv.org/abs/2511.09232 mastoxiv.page/@arXiv_csCL_bot/
- Beyond Elicitation: Provision-based Prompt Optimization for Knowledge-Intensive Tasks
Yunzhe Xu, Zhuosheng Zhang, Zhe Liu
arxiv.org/abs/2511.10465 mastoxiv.page/@arXiv_csCL_bot/
- $\pi$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
Dong Liu, Yanxuan Yu
arxiv.org/abs/2511.10696 mastoxiv.page/@arXiv_csCL_bot/
- Based on Data Balancing and Model Improvement for Multi-Label Sentiment Classification Performanc...
Zijin Su, Huanzhu Lyu, Yuren Niu, Yiming Liu
arxiv.org/abs/2511.14073 mastoxiv.page/@arXiv_csCL_bot/
- HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning
Alexis Correa-Guill\'en, Carlos G\'omez-Rodr\'iguez, David Vilares
arxiv.org/abs/2511.15355 mastoxiv.page/@arXiv_csCL_bot/
- Towards Hyper-Efficient RAG Systems in VecDBs: Distributed Parallel Multi-Resolution Vector Search
Dong Liu, Yanxuan Yu
arxiv.org/abs/2511.16681 mastoxiv.page/@arXiv_csCL_bot/
- Estonian WinoGrande Dataset: Comparative Analysis of LLM Performance on Human and Machine Transla...
Marii Ojastu, Hele-Andra Kuulmets, Aleksei Dorkin, Marika Borovikova, Dage S\"arg, Kairit Sirts
arxiv.org/abs/2511.17290 mastoxiv.page/@arXiv_csCL_bot/
- A Systematic Study of In-the-Wild Model Merging for Large Language Models
O\u{g}uz Ka\u{g}an Hitit, Leander Girrbach, Zeynep Akata
arxiv.org/abs/2511.21437 mastoxiv.page/@arXiv_csCL_bot/
- CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer
Lavish Bansal, Naman Mishra
arxiv.org/abs/2512.02711 mastoxiv.page/@arXiv_csCL_bot/
- Multilingual Medical Reasoning for Question Answering with Large Language Models
Pietro Ferrazzi, Aitor Soroa, Rodrigo Agerri
arxiv.org/abs/2512.05658 mastoxiv.page/@arXiv_csCL_bot/
- OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Convers...
Albrecht, Lehmann, Poltermann, Rudolph, Steigerwald, Stieler
arxiv.org/abs/2512.09804 mastoxiv.page/@arXiv_csCL_bot/
- Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, an...
Hanyu Cai, Binqi Shen, Lier Jin, Lan Hu, Xiaojing Fan
arxiv.org/abs/2512.12812 mastoxiv.page/@arXiv_csCL_bot/
- Beg to Differ: Understanding Reasoning-Answer Misalignment Across Languages
Ovalle, Ross, Ruder, Williams, Ullrich, Ibrahim, Sagun
arxiv.org/abs/2512.22712 mastoxiv.page/@arXiv_csCL_bot/
- Activation Steering for Masked Diffusion Language Models
Adi Shnaidman, Erin Feiglin, Osher Yaari, Efrat Mentel, Amit Levi, Raz Lapid
arxiv.org/abs/2512.24143 mastoxiv.page/@arXiv_csCL_bot/
- JMedEthicBench: A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese L...
Liu, Li, Niu, Zhang, Xun, Hou, Wang, Iwasawa, Matsuo, Hatakeyama-Sato
arxiv.org/abs/2601.01627 mastoxiv.page/@arXiv_csCL_bot/
- FACTUM: Mechanistic Detection of Citation Hallucination in Long-Form RAG
Dassen, Kotula, Murray, Yates, Lawrie, Kayi, Mayfield, Duh
arxiv.org/abs/2601.05866 mastoxiv.page/@arXiv_csCL_bot/
- {\dag}DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems
Zabir Al Nazi, Shubhashis Roy Dipta, Sudipta Kar
arxiv.org/abs/2601.06853 mastoxiv.page/@arXiv_csCL_bot/
- Symphonym: Universal Phonetic Embeddings for Cross-Script Name Matching
Stephen Gadd
arxiv.org/abs/2601.06932 mastoxiv.page/@arXiv_csCL_bot/
- LLMs versus the Halting Problem: Revisiting Program Termination Prediction
Sultan, Armengol-Estape, Kesseli, Vanegue, Shahaf, Adi, O'Hearn
arxiv.org/abs/2601.18987 mastoxiv.page/@arXiv_csCL_bot/
- MuVaC: A Variational Causal Framework for Multimodal Sarcasm Understanding in Dialogues
Diandian Guo, Fangfang Yuan, Cong Cao, Xixun Lin, Chuan Zhou, Hao Peng, Yanan Cao, Yanbing Liu
arxiv.org/abs/2601.20451 mastoxiv.page/@arXiv_csCL_bot/
toXiv_bot_toot

@Techmeme@techhub.social
2026-04-07 22:01:26

Z.ai releases GLM-5.1, a 754B-parameter model that it says outperforms GPT-5.4 and Claude Opus 4.6 on SWE-bench Pro, available under an MIT license (Carl Franzen/VentureBeat)
venturebeat.com/technology/ai-

@Techmeme@techhub.social
2026-04-04 13:55:53

Q&A with Simon Willison on the November release of GPT-5.1 and Opus 4.5 as the inflection point for coding, exhaustion due to managing coding agents, and more (Lenny Rachitsky/Lenny's Newsletter)
lennysnewsletter.com/p/an-ai-s

@freeminded@tooting.ch
2026-02-04 07:46:51

#liip bringt einen #AI bot für die #Steuererklärung. Bin dann mal gespannt. Spezialisierte #GPT Modelle sind aus meiner Sicht die sinnvolle Zukunft von AI Anwendungen.
liip.ch/en/blog/liipgpt-helps-
@…

@Techmeme@techhub.social
2026-02-02 18:08:02

OpenAI launches a Codex app for macOS, designed to serve as a command center for managing AI agents, and says Codex usage has nearly doubled since mid-December (David Gewirtz/ZDNET)
zdnet.com/article/openai-codex

@Techmeme@techhub.social
2026-03-20 09:05:50

Xiaomi releases MiMo-V2-Pro, its new 1T-parameter foundation model, codenamed Hunter Alpha, which the company says benchmarks close to GPT-5.2 and Opus 4.6 (Carl Franzen/VentureBeat)
venturebeat.com/technology/xia

@Techmeme@techhub.social
2026-02-25 12:56:17

A study finds GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash deployed tactical nuclear weapons in 95% of 21 simulated war game scenarios, and never surrendered (Chris Stokel-Walker/New Scientist)
newscientist.com/article/25168

@Techmeme@techhub.social
2026-01-27 18:31:03

OpenAI for Science launches Prism, a free LaTeX-based text editor that embeds GPT-5.2 to assist in scientific paper drafting and citation management (Will Douglas Heaven/MIT Technology Review)
technologyreview.com/2026/01/2

@Techmeme@techhub.social
2026-01-25 01:50:58

Tests show GPT-5.2 on ChatGPT citing Grokipedia as a source on a wide range of queries, including on Iranian conglomerates and Holocaust deniers (Aisha Down/The Guardian)
theguardian.com/technology/202

@Techmeme@techhub.social
2026-01-26 17:50:42

Qwen releases Qwen3-Max-Thinking, its flagship reasoning model that it says demonstrates performance comparable to models such as GPT-5.2 Thinking and Opus 4.5 (Qwen)
qwen.ai/blog?id=qwen3-max-thin

@Techmeme@techhub.social
2026-03-24 19:16:10

OpenAI releases a set of prompts designed to be used with its open-weight safety model gpt-oss-safeguard that lets developers make their apps safer for teens (Amanda Silberling/TechCrunch)
techcrunch.com/2026/03/24/open

@Techmeme@techhub.social
2026-01-26 12:26:43

Interviews with 100 therapists and psychiatrists on clients' AI chatbot usage show, while there are some upsides, conversations also deepened negative feelings (New York Times)
nytimes.com/2026/01/26/us/chat

@Techmeme@techhub.social
2026-02-16 11:01:13

OpenAI retired GPT-4o on February 13, angering loyal users, 20K of whom signed a petition; 4o showed the unique attachment that people can form with chatbots (Shirin Ghaffary/Bloomberg)
bloomberg.com/news/newsletters

@Techmeme@techhub.social
2026-03-18 10:25:59

How AI's post-training process suppresses the creativity and whimsicality seen in earlier models like GPT-2, leading to bad writing from many top AI models (Jasmine Sun/The Atlantic)