
2025-09-11 10:31:00
Interesting, a lab that wants to build opensource (!) attracts a lot of funding 🤔
Reflection AI raises $2B to be America's open frontier AI lab, challenging DeepSeek | TechCrunch https://techcrunch.com/2025/10/09/reflection-…
Malaysian startup Zetrix unveils NurAI, a chatbot for Muslims built using similar techniques to DeepSeek's V3, and plans AI avatars of Islamic scholars (Saritha Rai/Bloomberg)
https://www.bloomberg.com/news/articles/2025-0…
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Jeffrey Amico, Gabriel Passamani Andrade, John Donaghy, Ben Fielding, Tristin Forbus, Harry Grieve, Semih Kara, Jari Kolehmainen, Yihua Lou, Christopher Nies, Edward Phillip Flores Nu\~no, Diogo Ortega, Shikhar Rastogi, Austin Virts, Matthew J. Wright
https://
Reflection AI, which is developing an open-source AI model to compete with DeepSeek, raised $2B led by Nvidia, valuing it at $8B, up from $545M in March (Michael J. de la Merced/New York Times)
https://www.nytimes.com/2025/10/09/business/dealbook/ref…
Agentic LLMs for Question Answering over Tabular Data
Rishit Tyagi, Mohit Gupta, Rahul Bouri
https://arxiv.org/abs/2509.09234 https://arxiv.org/pdf/2509.09…
Evaluating the Clinical Safety of LLMs in Response to High-Risk Mental Health Disclosures
Siddharth Shah, Amit Gupta, Aarav Mann, Alexandre Vaz, Benjamin E. Caldwell, Robert Scholz, Peter Awad, Rocky Allemandi, Doug Faust, Harshita Banka, Tony Rousmaniere
https://arxiv.org/abs/2509.08839
DeepSeek senkt API-Preise um 50 Prozent und stellt V3.2-Exp vor
Das chinesische Start-up DeepSeek hat sein experimentelles KI-Modell V3.2-Exp vorgestellt und die API-Preise um mehr als 50 Prozent gesenkt.
ht…
SemiAnalysis launches InferenceMAX, an open-source benchmark that automatically tracks LLM inference performance across AI models and frameworks every night (Kimbo Chen/SemiAnalysis)
https://newsletter.semianalysis.com/p/inferencemax-open-source-inference
Base Models Know How to Reason, Thinking Models Learn When
Constantin Venhoff, Iv\'an Arcuschin, Philip Torr, Arthur Conmy, Neel Nanda
https://arxiv.org/abs/2510.07364 https…
NOWJ@COLIEE 2025: A Multi-stage Framework Integrating Embedding Models and Large Language Models for Legal Retrieval and Entailment
Hoang-Trung Nguyen, Tan-Minh Nguyen, Xuan-Bach Le, Tuan-Kiet Le, Khanh-Huyen Nguyen, Ha-Thanh Nguyen, Thi-Hai-Yen Vuong, Le-Minh Nguyen
https://arxiv.org/abs/2509.08025
An Iterative LLM Framework for SIBT utilizing RAG-based Adaptive Weight Optimization
Zhuo Xiao (Image Processing Center, Beihang University, Beijing, China), Qinglong Yao (Image Processing Center, Beihang University, Beijing, China), Jingjing Wang (Image Processing Center, Beihang University, Beijing, China), Fugen Zhou (Image Processing Center, Beihang University, Beijing, China), Bo Liu (Image Processing Center, Beihang University, Beijing, China), Haitao Sun (Department of Radiation…
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
Yi Lu, Jianing Wang, Linsen Guo, Wei He, Hongyin Tang, Tao Gui, Xuanjing Huang, Xuezhi Cao, Wei Wang, Xunliang Cai
https://arxiv.org/abs/2510.08189
K2-Think: A Parameter-Efficient Reasoning System
Zhoujun Cheng, Richard Fan, Shibo Hao, Taylor W. Killian, Haonan Li, Suqi Sun, Hector Ren, Alexander Moreno, Daqian Zhang, Tianjun Zhong, Yuxin Xiong, Yuanzhe Hu, Yutao Xie, Xudong Han, Yuqi Wang, Varad Pimpalkhute, Yonghao Zhuang, Aaryamonvikram Singh, Xuezhi Liang, Anze Xie, Jianshu She, Desai Fan, Chengqian Gao, Liqun Ma, Mikhail Yurochkin, John Maggs, Xuezhe Ma, Guowei He, Zhiting Hu, Zhengzhong Liu, Eric P. Xing
The UAE's Institute of Foundation Models open sources its K2 Think model, trained on only ~2,000 AI chips and designed for math, coding, and science research (Cade Metz/New York Times)
https://www.nytimes.com/2025/09/09/technology/uae-emirates-ai-open-sourc…
Transforming Fashion with AI: A Comparative Study of Large Language Models in Apparel Design
Nusrat Jahan Lamia, Sadia Afrin Mim
https://arxiv.org/abs/2509.04705 https://…
Current pricing remains until effective date
📈 Service Resources
Scaled up service resources to better meet API needs. Users are encouraged to utilize the enhanced service.
📚 Documentation
For more details, visit DeepSeek API Docs:
https://api-docs.deepseek.com
Analyzing Prominent LLMs: An Empirical Study of Performance and Complexity in Solving LeetCode Problems
Everton Guimaraes, Nathalia Nascimento, Chandan Shivalingaiah, Asish Nelapati
https://arxiv.org/abs/2508.03931
Sources: DeepSeek plans to use Huawei's Ascend AI chips to train smaller versions of its upcoming R2 models but will still use Nvidia chips for largest models (The Information)
https://www.theinformation.com/articles/deepseek-opts-huawei-chips-train-models…
KI-Update kompakt: ShadowLeak, Nvidia & OpenAI, Siemens, DeepSeek
Das "KI-Update" liefert drei mal pro Woche eine Zusammenfassung der wichtigsten KI-Entwicklungen.
https://www.
Não é que eu esteja contra isto, que não estou, bem pelo contršrio.
Mas sinceramente não estou a ver como se consegue garantir a aplicabilidade destas coisas.
Provavelmente é falta de imaginação minha.
https://jornaleconomico.sapo.p…
DeepSeek performs better than other Large Language Models in Dental Cases
Hexian Zhang, Xinyu Yan, Yanqi Yang, Lijian Jin, Ping Yang, Junwen Wang
https://arxiv.org/abs/2509.02036
Don't miss today's Metacurity for the most critical infosec developments you should know, including
--A bona fide self-replicating worm has infected 187 npm packages,
--BreachForums founder hit with new three-year sentence,
--Coinbase beach suspect accused of participating in $500k bribery scheme,
--DHS intelligence arm exposed sensitive database,
--MSFT seized 338 sites linked to Raccoon0365 stealer,
--DeepSeek is biased against Falun Gong and oth…
DeepSeek releases DeepSeek-V3.2-Exp, saying it built the model using a new technique called DeepSeek Sparse Attention, and halves the pricing of its tools (Saritha Rai/Bloomberg)
https://www.bloomberg.com/news/articles/2025-09-2…
xDeepServe: Model-as-a-Service on Huawei CloudMatrix384
Ao Xiao, Bangzheng He, Baoquan Zhang, Baoxing Huai, Bingji Wang, Bo Wang, Bo Xu, Boyi Hou, Chan Yang, Changhong Liu, Cheng Cui, Chenyu Zhu, Cong Feng, Daohui Wang, Dayun Lin, Duo Zhao, Fengshao Zou, Fu Wang, Gangqiang Zhang, Gengyuan Dan, Guanjie Chen, Guodong Guan, Guodong Yang, Haifeng Li, Haipei Zhu, Hao Feng, Hao Huang, Hao Xu, Hengrui Ma, Hengtao Fan, Hui Liu, Jia Li, Jiang Liu, Jiang Xu, Jie Meng, Jinhan Xin, Junhao Hu, Juwe…
The geopolitical #aiarmsrace seems largely unimpressed by people proclaiming #LLMs have plateaued and #AGI is never coming.
Such assessments are only relevant for the market, but not so much for count…
Compare and buy tulips! One monthly subscription to get the very best tulip colours for all your tulip needs!
https://store.boingboing.net/sales/chatplayground-ai-basic-plan-lifetime-subscriptions
Z.ai, formerly known as Zhipu and which has raised $1.5B from Tencent and others, releases GLM-4.5, an open source AI model that's cheaper to use than DeepSeek (Evelyn Cheng/CNBC)
https://www.cnbc.com/2025/07/28/chinas-latest-ai-…
Large Language Models Versus Static Code Analysis Tools: A Systematic Benchmark for Vulnerability Detection
Damian Gnieciak, Tomasz Szandala
https://arxiv.org/abs/2508.04448 htt…
Overview of the Plagiarism Detection Task at PAN 2025
Andr\'e Greiner-Petter, Maik Fr\"obe, Jan Philip Wahle, Terry Ruas, Bela Gipp, Akiko Aizawa, Martin Potthast
https://arxiv.org/abs/2510.06805
CoComposer: LLM Multi-agent Collaborative Music Composition
Peiwen Xing, Aske Plaat, Niki van Stein
https://arxiv.org/abs/2509.00132 https://arxiv.org/pdf/…
How much of my children's future is AI going to burn up? That depends on how much we feed the hype beast. *That* is why "don't use AI at all without mentioning the drawbacks & a very good reason" is my stance (and I'm an AI researcher, technically).
Local models that run on your laptop: acceptable if produced by ethical means (including data sourcing & compensation for data filtering) & training costs are mitigated. Are such models way worse than the huge datacenter-scale models? Yes, for now. Deal with it.
ChatGPT, Claude, Copilot, even DeepSeek: get out. You're feeding the beast that is consuming my kids' future. Heck, even talking up these models or about how "everyone is using them so it's okay" or about "they're not going away" I'd feeding the beast even if you don't touch them.
I wish it weren't like this, because the capabilities of the big models are cool even once you cut past the hype.
#AI
A reporter details how her mother, a kidney transplant patient who lives in China, bonded with DeepSeek's chatbot as her AI doctor, calling it "more humane" (Viola Zhou/Rest of World)
https://restofworld.org/2025/ai-chatbot-china-sick/
Challenges and Applications of Large Language Models: A Comparison of GPT and DeepSeek family of models
Shubham Sharma, Sneha Tuli, Narendra Badam
https://arxiv.org/abs/2508.21377
Evaluating Open-Source Large Language Models for Technical Telecom Question Answering
Arina Caraus, Alessio Buscemi, Sumit Kumar, Ion Turcanu
https://arxiv.org/abs/2509.21949 ht…
Can large language models assist choice modelling? Insights into prompting strategies and current models capabilities
Georges Sfeir, Gabriel Nova, Stephane Hess, Sander van Cranenburgh
https://arxiv.org/abs/2507.21790
How an open-source approach helped DeepSeek and other Chinese AI companies; Hugging Face: Alibaba's Qwen is now the world's largest open-source AI ecosystem (South China Morning Post)
https://www.scmp.com/tech/big-tech/article
Will Compute Bottlenecks Prevent an Intelligence Explosion?
Parker Whitfill, Cheryl Wu
https://arxiv.org/abs/2507.23181 https://arxiv.org/pdf/2507.23181
Very nice article about LLM architecture, a bit too complicated for me but probably not for others..
https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison
Effectiveness of Large Language Models in Simulating Regional Psychological Structures: An Empirical Examination of Personality and Subjective Well-being
Ke Luoma, Li Zengyi, Liao Jiangqun, Tong Song, Peng Kaiping
https://arxiv.org/abs/2509.25283
Diamonds in the rough: Transforming SPARCs of imagination into a game concept by leveraging medium sized LLMs
Julian Geheeb, Farhan Abid Ivan, Daniel Dyrda, Miriam Ansch\"utz, Georg Groh
https://arxiv.org/abs/2509.24730
Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass Ethical Safeguards for Facilitating Phishing Attacks
Rina Mishra, Gaurav Varshney
https://arxiv.org/abs/2507.12185
Despite Musk's claim Apple "makes it impossible" for non-OpenAI AI apps to top its App Store, DeepSeek was #1 in January and Perplexity was #1 in July in India (Henry Chandonnet/Business Insider)
https://www.businessinsider.com/elon-musk-
Chinese Livestreaming 'Virtual Human' Salespeople Are Outselling Their Human Counterparts https://www.404media.co/chinese-livestreaming-virtual-human-salespeople-are-outselling-their-human-counterparts/
SemGuard: Real-Time Semantic Evaluator for Correcting LLM-Generated Code
Qinglin Wang, Zhihong Sun, Ruyun Wang, Tao Huang, Zhi Jin, Ge Li, Chen Lyu
https://arxiv.org/abs/2509.24507
SLIM: Subtrajectory-Level Elimination for More Effective Reasoning
Xifeng Yao, Chengyuan Ma, Dongyu Lang, Yinhao Ni, Zhiwei Xu, Huarui Xie, Zihao Chen, Guang Shen, Dandan Tu, Yi Bai, Changzheng Zhang
https://arxiv.org/abs/2508.19502
TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix
Ahmet Caner Y\"uz\"ug\"uler, Ahmet \c{C}elik, Jiawei Zhuang, Lukas Cavigelli
https://arxiv.org/abs/2509.21081
In a peer-reviewed Nature article, DeepSeek says it has spent $294,000 on training its R1 model and used 512 Nvidia H800 chips (Eduardo Baptista/Reuters)
https://www.reuters.com/world/china/chinas-deepseek-says-its-hit-ai-model-cos…
Prompt Engineering and the Effectiveness of Large Language Models in Enhancing Human Productivity
Rizal Khoirul Anam
https://arxiv.org/abs/2507.18638 https://
Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities, and Complexity
Domenico Cotroneo, Cristina Improta, Pietro Liguori
https://arxiv.org/abs/2508.21634
Hallucinating with AI: AI Psychosis as Distributed Delusions
Lucy Osler
https://arxiv.org/abs/2508.19588 https://arxiv.org/pdf/2508.19588
Mathematical Computation and Reasoning Errors by Large Language Models
Liang Zhang, Edith Aurora Graf
https://arxiv.org/abs/2508.09932 https://arxiv.org/pd…
Tesla plans to roll out in-car voice assistant features powered by DeepSeek and ByteDance's Doubao in China; Tesla vehicles in the US use Grok (Linda Lew/Bloomberg)
https://www.bloomberg.com/news/articles/2025-08-22/tesla-t…
🚀 #DeepSeek #API Upgraded to V3.1 with Dual-Mode Support & #Anthropic Compatibility
#ai …
Huawei says DeepSeek-R1-Safe, which was trained on 1,000 of its Ascend AI chips, is "nearly 100% successful" in preventing politically sensitive topics (Eduardo Baptista/Reuters)
https://www.reuters.com/business/media-tel
MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
Changsheng Zhao, Ernie Chang, Zechun Liu, Chia-Jung Chang, Wei Wen, Chen Lai, Rick Cao, Yuandong Tian, Raghuraman Krishnamoorthi, Yangyang Shi, Vikas Chandra
https://arxiv.org/abs/2509.24945
DeepSeek details V3.1 and says it surpasses R1 on key benchmarks and is customized to work with next-gen Chinese-made AI chips, after unveiling it on August 19 (Bloomberg)
https://www.bloomberg.com/news/articles/2025-08-21/deep…
From Prompt to Pipeline: Large Language Models for Scientific Workflow Development in Bioinformatics
Khairul Alam, Banani Roy
https://arxiv.org/abs/2507.20122 https://
Evaluating the Limits of Large Language Models in Multilingual Legal Reasoning
Antreas Ioannou, Andreas Shiamishis, Nora Hollenstein, Nezihe Merve G\"urel
https://arxiv.org/abs/2509.22472
DeepSeek releases V3.1, adding a longer context window, with few other details; Chinese local media blames CEO Liang Wenfeng's perfectionism for R2's delay (Bloomberg)
https://www.bloomberg.com/news/articles/2025-08-19…
Thought Purity: Defense Paradigm For Chain-of-Thought Attack
Zihao Xue, Zhen Bi, Long Ma, Zhenlin Hu, Yan Wang, Zhenfang Liu, Qing Sheng, Jie Xiao, Jungang Lou
https://arxiv.org/abs/2507.12314
How AI has transformed data center design, with concerns about overspending on AI infrastructure, sparked by DeepSeek, fading amid the ongoing building frenzy (Financial Times)
https://ig.ft.com/ai-data-centres/
The Emperor's New Chain-of-Thought: Probing Reasoning Theater Bias in Large Reasoning Models
Qian Wang, Yubo Fan, Zhenheng Tang, Nuo Chen, Wenxuan Wang, Bingsheng He
https://arxiv.org/abs/2507.13758
A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1
Marcin Pietro\'n, Rafa{\l} Olszowski, Jakub Gomu{\l}ka, Filip Gampel, Andrzej Tomski
https://arxiv.org/abs/2507.08621
Huawei reports H1 2025 revenue up 3.9% YoY to ~$58.5B, driven by soaring AI compute demand and a rebound in phone sales, and net profit down 32% YoY to ~$5.2B (Bloomberg)
https://www.bloomberg.com/news/articles/2025-08-29/de…
Artificial Finance: How AI Thinks About Money
Orhan Erdem, Ragavi Pobbathi Ashok
https://arxiv.org/abs/2507.10933 https://arxiv.org/p…
DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models
Kaiwen Yan, Xuanqing Shi, Hongcheng Guo, Wenxuan Wang, Zhuosheng Zhang, Chengwei Qin
https://arxiv.org/abs/2508.17803
Chinese AI chip designer Cambricon reports 44-fold revenue growth and a profit of ~$144M in H1 2025, after Beijing encouraged companies to use homegrown tech (Rachel Yeo/Bloomberg)
https://www.bloomberg.com/news/articles/20
A Study on Thinking Patterns of Large Reasoning Models in Code Generation
Kevin Halim, Sin G. Teo, Ruitao Feng, Zhenpeng Chen, Yang Gu, Chong Wang, Yang Liu
https://arxiv.org/abs/2509.13758
Can Large Language Models Understand As Well As Apply Patent Regulations to Pass a Hands-On Patent Attorney Test?
Bhakti Khera, Rezvan Alamian, Pascal A. Scherz, Stephan M. Goetz
https://arxiv.org/abs/2507.10576
The Impact of Language Mixing on Bilingual LLM Reasoning
Yihao Li, Jiayi Xin, Miranda Muqing Miao, Qi Long, Lyle Ungar
https://arxiv.org/abs/2507.15849 htt…
Can We Trust AI to Govern AI? Benchmarking LLM Performance on Privacy and AI Governance Exams
Zane Witherspoon, Thet Mon Aye, YingYing Hao
https://arxiv.org/abs/2508.09036 https…
Jensen Huang hailed AI models from DeepSeek, Alibaba, and Tencent as "world class" at a Beijing expo and said US licenses for H20 chips "will come very fast" (Reuters)
https://www.reuters.com/world/china/nvidias-huang-hail…
A Comparative Evaluation of Large Language Models for Persian Sentiment Analysis and Emotion Detection in Social Media Texts
Kian Tohidi, Kia Dashtipour, Simone Rebora, Sevda Pourfaramarz
https://arxiv.org/abs/2509.14922
Sources: DeepSeek R2's launch delay is due to training issues on Huawei Ascend chips, prompting a switch to Nvidia chips for training and Huawei's for inference (Financial Times)
https://www.ft.com/content/eb984646-6320-4bfe-a78d-a1da2274b092
Punctuation and Predicates in Language Models
Sonakshi Chauhan, Maheep Chaudhary, Koby Choy, Samuel Nellessen, Nandi Schoots
https://arxiv.org/abs/2508.14067 https://
Chinese open-source AI models from DeepSeek, Alibaba's Qwen, and others gaining global traction spurs US policymakers and companies to respond (Raffaele Huang/Wall Street Journal)
https://www.wsj.com/tech/ai/chinas-l…
Let's Use ChatGPT To Write Our Paper! Benchmarking LLMs To Write the Introduction of a Research Paper
Krishna Garg, Firoz Shaikh, Sambaran Bandyopadhyay, Cornelia Caragea
https://arxiv.org/abs/2508.14273
Moonshot's Kimi K2 uses a 1T-parameter MoE architecture with 32B active parameters and outperforms models like GPT-4.1 and DeepSeek-V3 on key benchmarks (Michael Nuñez/VentureBeat)
https://venturebeat.com/ai/moonshot-ais-kimi-k2-outperfor…
Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding
Maciej Skorski, Alina Landowska
https://arxiv.org/abs/2508.13804 https://
Is 'Hope' a person or an idea? A pilot benchmark for NER: comparing traditional NLP tools and large language models on ambiguous entities
Payam Latifi
https://arxiv.org/abs/2509.12098
Benchmarking and Improving LVLMs on Event Extraction from Multimedia Documents
Fuyu Xing, Zimu Wang, Wei Wang, Haiyang Zhang
https://arxiv.org/abs/2509.12876 https://
The Few-shot Dilemma: Over-prompting Large Language Models
Yongjian Tang, Doruk Tuncel, Christian Koerner, Thomas Runkler
https://arxiv.org/abs/2509.13196 https://
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research
Yilun Zhao, Weiyuan Chen, Zhijian Xu, Manasi Patwardhan, Yixin Liu, Chengye Wang, Lovekesh Vig, Arman Cohan
https://arxiv.org/abs/2507.13300
HKGAI-V1: Towards Regional Sovereign Large Language Model for Hong Kong
Sirui Han, Junqi Zhu, Ruiyuan Zhang, Yike Guo
https://arxiv.org/abs/2507.11502 http…