2025-12-12 04:09:00
GPT-5.2: Neues KI-Modell von OpenAI soll Büroarbeiten besser unterstützen
Nur einen Monat nach GPT-5.1 kommt ein neues KI-Modell der ChatGPT-Entwickler. GPT-5.2 soll bessere Tabellen, Präsentationen und Code produzieren können.
GPT-5.2: Neues KI-Modell von OpenAI soll Büroarbeiten besser unterstützen
Nur einen Monat nach GPT-5.1 kommt ein neues KI-Modell der ChatGPT-Entwickler. GPT-5.2 soll bessere Tabellen, Präsentationen und Code produzieren können.
OpenAI says GPT-5.2 Thinking hallucinates less than GPT-5.1 and has improved reliability for agentic AI needs; pre-release testers include Notion, Box, Shopify (Hayden Field/The Verge)
https://www.theverge.com/ai-artificial-intelligence/842529/open…
GPT-5.2 models match GPT-5 and 5.1 with a 400K context window and 128K max output tokens, but have a newer knowledge cutoff of Aug. 31, 2025 vs. Sept. 30, 2024 (Simon Willison/Simon Willison's Newsletter)
https://simonw.substack.com/p/gpt-52-and-useful-patterns-for-…
Single layer tiny Co$^4$ outpaces GPT-2 and GPT-BERT
Noor Ul Zain, Mohsin Raza, Ahsan Adeel
https://arxiv.org/abs/2510.08404 https://arxiv.org/pdf/2510.084…
OpenAI launches GPT-5.2, its "best model yet," in Instant, Thinking, and Pro variants, with significant improvements in writing, coding, and reasoning (Maxwell Zeff/Wired)
https://www.wired.com/story/openai-gpt-launch-gemini-code-red/
Freitag: Kritik an eID-Karte wegen Geldwäsche, neues OpenAI-Modell als Bürohilfe
eID-Karte zu einfach zu ergaunern GPT-5.2 für Profi-Nutzer Disney gegen Google-KI wegen Copyright Kritik an EU wegen VMware Roboter-Bewegungen erklärt
Weight Initialization and Variance Dynamics in Deep Neural Networks and Large Language Models
Yankun Han
https://arxiv.org/abs/2510.09423 https://arxiv.org…
XplaiNLP at CheckThat! 2025: Multilingual Subjectivity Detection with Finetuned Transformers and Prompt-Based Inference with Large Language Models
Ariana Sahitaj, Jiaao Li, Pia Wenzel Neves, Fedor Splitt, Premtim Sahitaj, Charlott Jakob, Veronika Solopova, Vera Schmitt
https://arxiv.org/abs/2509.12130
[Thread] GPT-5.2 is now available in the API, priced at $1.75/1M input and $14/1M output tokens; GPT-5.2 Pro is priced at $21/1M input and $168/1M output tokens (@openaidevs)
https://x.com/openaidevs/status/1999184802755354954
CapGeo: A Caption-Assisted Approach to Geometric Reasoning
Yuying Li, Siyi Qian, Hao Liang, Leqi Zheng, Ruichuan An, Yongzhen Guo, Wentao Zhang
https://arxiv.org/abs/2510.09302 …
Growing Perspectives: Modelling Embodied Perspective Taking and Inner Narrative Development Using Large Language Models
Sabrina Patania, Luca Annese, Anna Lambiase, Anita Pellegrini, Tom Foulsham, Azzurra Ruggeri, Silvia Rossi, Silvia Serino, Dimitri Ognibene
https://arxiv.org/abs/2509.11868
Baidu unveils Ernie 5.0, an AI model to process and generate text, images, audio, and video, claiming it beats GPT-5-High and Gemini 2.5 Pro on some benchmarks (Carl Franzen/VentureBeat)
https://venturebeat.com/ai/baidu-unveils-proprietary-ern…
Anthropic open sources a method to score AI model political evenhandedness; Gemini 2.5 Pro got 97%, Grok 4 96%, Claude Opus 4.1 95%, GPT-5 89%, and Llama 4 66% (Ina Fried/Axios)
https://www.axios.com/2025/11/13/anthropic-bot-bias-data
OpenAI says GPT‑5.2 Thinking beats or ties industry professionals on 70.9% of GDPval knowledge work tasks, delivering outputs at >11x the speed and <1% the cost (OpenAI)
https://openai.com/index/introducing-gpt-5-2
Curious that whenever someone shows me “the cool #AI flow” they built that’s supposed to be impressive, the conversation goes the same way:
Stage 1: “But you don’t understand. You don’t like AI because you haven’t used it right. Let me show you how much you can do it with.”
Stage 2: “Here are the steps in the flow and the instructions I feed to this agent / custom GPT / Claude project. I tell it to do X, reference document Y, and aim for Z.”
Stage 3: “Now, let me show you the results it gives.”
*Writes task, presses to run the prompt.*
Stage 4: “Umm sorry it’s taking a while. It’s fast but not instant. And by the way, the prompt isn’t perfect, you can definitely make it better. I just threw this together real quick the other day. It makes some mistakes, but it’s really good.”
Stage 5: “Uuuuuuh actually don’t look at the output.” *scrolls or stops screen share or pulls device away.*
“You know it’s already doing so well, if I do more prompt engineering it will get really good but I need to give it better instructions. And it ran just fine last night, I don’t know what’s up with it. And this is a cheap model, if we use another model it will be better.”
Stage 6: “You know, you really shouldn’t judge this so much. The technology will improve, it will get there sooner than you know and then you’ll regret not trying it sooner.”
So curious that this keeps happening 🤷♀️
#LLMs #work #tech #AIBubble
OpenAI releases an invitation-only Sora app on iOS, powered by Sora 2, to let people create and share AI-generated videos of themselves and their friends (Ina Fried/Axios)
https://www.axios.com/2025/09/30/openai-sora-app-social-ai
MERCER ISLAND SCHOOL BOARD
Wow, you really do have to watch the downballot races. Mercer Island School Board has two (2) candidates (O'Callahan is and Gaspar) that are *both* software CTOs touting their "AI" credentials. Gaspar explicitly wants "free AI classes".
Here's a hint: The only "AI classes" that kids need are ones that teach them how to TURN ALL OF THAT SHIT OFF, and learn to think and write in their own words, not Chat-GPT'…
🧾 Multi-Modal Vision vs. Text-Based Parsing: Benchmarking LLM Strategies for Invoice Processing
#software
Companies are updating insider trading policies to cover prediction markets; Kalshi and others are pushing for federal oversight, including of insider trading (Rocket Drew/The Information)
https://www.theinformation.com/articles/polymark…
Resource-Efficient Fine-Tuning of LLaMA-3.2-3B for Medical Chain-of-Thought Reasoning
Imran Mansha
https://arxiv.org/abs/2510.05003 https://arxiv.org/pdf/2…
A Deep Learning Pipeline for Epilepsy Genomic Analysis Using GPT-2 XL and NVIDIA H100
Muhammad Omer Latif, Hayat Ullah, Muhammad Ali Shafique, Zhihua Dong
https://arxiv.org/abs/2510.00392
OpenAI announces API updates, including GPT-5 Pro, Sora 2 in preview, and gpt-realtime-mini, a voice model that is 70% cheaper than gpt-realtime (Rebecca Bellan/TechCrunch)
https://techcrunch.com/2025/10/06/openai-ramps-up-developer-push…
Replaced article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[2/6]:
- Understanding AI Evaluation Patterns: How Different GPT Models Assess Vision-Language Descriptions
Sajjad Abdoli, Rudi Cilibrasi, Rima Al-Shikh
Evaluating LLM-Generated Legal Explanations for Regulatory Compliance in Social Media Influencer Marketing
Haoyang Gui, Thales Bertaglia, Taylor Annabell, Catalina Goanta, Tjomme Dooper, Gerasimos Spanakis
https://arxiv.org/abs/2510.08111
OpenAI releases an invitation-only Sora app on iOS, powered by Sora 2, to let people create and share AI-generated videos of themselves and their friends (Ina Fried/Axios)
https://www.axios.com/2025/09/30/openai-sora-app-social-ai
OpenAI releases gpt-oss-safeguard, its open-weight reasoning models for safety classification tasks, available in 120B and 20B parameters, under Apache 2.0 (OpenAI)
https://openai.com/index/introducing-gpt-oss-safeguard/
Alibaba Technical Report: Qwen3-VL beats GPT-5 and Gemini 2.5 Pro on visual tasks and has 100% accuracy on "needle-in-a-haystack" tests for 30-minute videos (Jonathan Kemper/The Decoder)
https://the-decoder.com/qwen3-vl-can-scan-two-hour-…
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
Haoyang Zheng, Xinyang Liu, Cindy Xiangrui Kong, Nan Jiang, Zheyuan Hu, Weijian Luo, Wei Deng, Guang Lin
https://arxiv.org/abs/2509.25035
Gemini 3 Pro is priced at $2-$4 per 1M input tokens and $12-$18 per 1M output tokens, cheaper than Claude Sonnet 4.5 but more expensive than GPT-5.1 (Simon Willison/Simon Willison's Weblog)
https://simonwillison.net/2025/Nov/18/gemini-3/
Investigating Bias: A Multilingual Pipeline for Generating, Solving, and Evaluating Math Problems with LLMs
Mariam Mahran, Katharina Simbeck
https://arxiv.org/abs/2509.17701 htt…
A Comparative Evaluation of Large Language Models for Persian Sentiment Analysis and Emotion Detection in Social Media Texts
Kian Tohidi, Kia Dashtipour, Simone Rebora, Sevda Pourfaramarz
https://arxiv.org/abs/2509.14922