I have a habbit of making (too) many (small) packages for functionality that might be reused in different context. {box} might be an alternative by making scripts into modlues that can be loaded: #RStats
Claude Opus 4.6 and Sonnet 4.6 now offer a 1M context window at standard pricing; it is the default for Claude Code Max, Team, and Enterprise users on Opus 4.6 (Anthropic)
https://claude.com/blog/1m-context-ga
The vOICe for Android 2.82 is out! It bundles two monocular AI depth models in a trade-off between visual detail and CPU load: toggle one or the other by pressing key 'h' ("good model") or capital 'H' ("best model", higher CPU load) via the soft keyboard https://play.google.com/stor…
So to follow up on this, I've caught it in action. Models, when quantized a bit, just do a bit more poorly with short contexts. Even going from f32 (as trained) to bf16 (as usually run) to q8 tends to do okay for "normal" context windows. And q4 you start feeling like "this model is a little stupid and gets stuck sometimes” (it is! It's just that it's still mostly careening about in the space of "plausible" most of the time. Not good guesswork, but still in the zone). With long contexts, the probability of parameters collapsing to zero are higher, so the more context the more likelihood you are to see brokenness.
And then at Q2 (2 bits per parameter) or Q1, the model falls apart completely. Parameters collapse to zero easily. You start seeing "all work and no play makes jack a dull boy” sorts of behavior, with intense and unscrutinized repetition, followed by a hard stop when it just stops working.
And quantization is a parameter that a model vendor can turn relatively easily. (they have to regenerate the model from the base with more quantization, but it's a data transformation on the order of running a terabyte through a straightforward and fast process, not like training).
If you have 1000 customers and enough equipment to handle the requests of 700, going from bf16 to q8 is a no-brainer. Suddenly you can handle the load and have a little spare capacity. They get worse results, probably pay the same per token (or they're on a subscription that hides the cost anyway so you are even freer to make trade-offs. There's a reason that subscription products are kinda poorly described.)
It's also possible for them to vary this across a day: use models during quieter periods? Maybe you get an instance running a bf16 quantization. If you use it during a high use period? You get a Q4 model.
Or intelligent routing is possible. No idea if anyone is doing this, but if they monitor what you send a bit, and you generally shoot for an expensive model for simple requests? They could totally substitute a highly quantized version of the model to answer the question.
There are •so many tricks• that can be pulled here. Some of them very reasonable to make, some of them treading into outright misleading or fraudulent, and it's weirdly hard to draw the line between them.
Was haben Influenza, Influenzer und Künstliche Inteligenzia gemeinsam? (2004)
#sociolgy
By Karl Bode
"CEO said a thing!" journalism is now utterly pervasive, and includes parroting billionaire and CEO claims with a total disregard for whether or not anything being said is actually true.
Despite the fact Musk has increasingly shown himself to be a conspiratorial white supremacist with a head full of room temperature butterscotch, the U.S. press is demonstratively un…
AI tools used in a medical context are easily fooled by misinformation that's delivered in an authoritative tone.
That’s the finding of a recent study published in The Lancet Digital Health which assessed the outputs of 20 LLMs used in a medical context, including models specially trained on medical data.
Sources: Mirendil, founded by former Anthropic researchers to develop AI models for scientific research, is in talks to raise $175M at a $1B valuation (The Information)
https://www.theinformation.com/articles/ex-anthropic-r…
Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking
Ravi Ghadia, Maksim Abraham, Sergei Vorobyov, Max Ryabinin
https://arxiv.org/abs/2602.21196 https://arxiv.org/pdf/2602.21196 https://arxiv.org/html/2602.21196
arXiv:2602.21196v1 Announce Type: new
Abstract: Efficiently processing long sequences with Transformer models usually requires splitting the computations across accelerators via context parallelism. The dominant approaches in this family of methods, such as Ring Attention or DeepSpeed Ulysses, enable scaling over the context dimension but do not focus on memory efficiency, which limits the sequence lengths they can support. More advanced techniques, such as Fully Pipelined Distributed Transformer or activation offloading, can further extend the possible context length at the cost of training throughput. In this paper, we present UPipe, a simple yet effective context parallelism technique that performs fine-grained chunking at the attention head level. This technique significantly reduces the activation memory usage of self-attention, breaking the activation memory barrier and unlocking much longer context lengths. Our approach reduces intermediate tensor memory usage in the attention layer by as much as 87.5$\%$ for 32B Transformers, while matching previous context parallelism techniques in terms of training speed. UPipe can support the context length of 5M tokens when training Llama3-8B on a single 8$\times$H100 node, improving upon prior methods by over 25$\%$.
toXiv_bot_toot
Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[4/5]:
- Retrieving Climate Change Disinformation by Narrative
Upravitelev, Solopova, Jakob, Sahitaj, M\"oller, Schmitt
https://arxiv.org/abs/2603.22015 https://mastoxiv.page/@arXiv_csCL_bot/116283633674519408
- PaperVoyager : Building Interactive Web with Visual Language Models
Dasen Dai, Biao Wu, Meng Fang, Wenhao Wang
https://arxiv.org/abs/2603.22999 https://mastoxiv.page/@arXiv_csCL_bot/116289015432093128
- Continual Robot Skill and Task Learning via Dialogue
Weiwei Gu, Suresh Kondepudi, Anmol Gupta, Lixiao Huang, Nakul Gopalan
https://arxiv.org/abs/2409.03166 https://mastoxiv.page/@arXiv_csRO_bot/113089412115632702
- Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs
Zara Siddique, Irtaza Khalid, Liam D. Turner, Luis Espinosa-Anke
https://arxiv.org/abs/2503.05371 https://mastoxiv.page/@arXiv_csLG_bot/114136994263573386
- SkillFlow: Scalable and Efficient Agent Skill Retrieval System
Fangzhou Li, Pagkratios Tagkopoulos, Ilias Tagkopoulos
https://arxiv.org/abs/2504.06188 https://mastoxiv.page/@arXiv_csAI_bot/114306773220502860
- Large Language Models for Computer-Aided Design: A Survey
Licheng Zhang, Bach Le, Naveed Akhtar, Siew-Kei Lam, Tuan Ngo
https://arxiv.org/abs/2505.08137 https://mastoxiv.page/@arXiv_csLG_bot/114504972217393639
- Structured Agent Distillation for Large Language Model
Liu, Kong, Dong, Yang, Li, Tang, Yuan, Niu, Zhang, Zhao, Lin, Huang, Wang
https://arxiv.org/abs/2505.13820 https://mastoxiv.page/@arXiv_csLG_bot/114544636506163783
- VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
Fan, Zhang, Li, Zhang, Chen, Hu, Wang, Qu, Zhou, Wang, Yan, Xu, Theiss, Chen, Li, Tu, Wang, Ranjan
https://arxiv.org/abs/2505.20279 https://mastoxiv.page/@arXiv_csCV_bot/114578817567171199
- Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification
Bhattacharjee, Tian, Rubin, Lo, Merchant, Hanson, Gounley, Tandon
https://arxiv.org/abs/2506.04450 https://mastoxiv.page/@arXiv_csCR_bot/114635189706505648
- L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search
Ziqi Wang, Boqin Yuan
https://arxiv.org/abs/2509.00761 https://mastoxiv.page/@arXiv_csAI_bot/115140304787881576
- Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
Han, Huang, Liao, Jiang, Lu, Zhao, Wang, Zhou, Jiang, Liang, Zhou, Sun, Yu, Xiao
https://arxiv.org/abs/2509.23392 https://mastoxiv.page/@arXiv_csAI_bot/115293169353788311
- Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
Leander Girrbach, Stephan Alaniz, Genevieve Smith, Trevor Darrell, Zeynep Akata
https://arxiv.org/abs/2510.03721 https://mastoxiv.page/@arXiv_csCV_bot/115332690912652473
- Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Zhang, Hu, Upasani, Ma, Hong, Kamanuru, Rainton, Wu, Ji, Li, Thakker, Zou, Olukotun
https://arxiv.org/abs/2510.04618 https://mastoxiv.page/@arXiv_csLG_bot/115332999596603375
- Mitigating Premature Exploitation in Particle-based Monte Carlo for Inference-Time Scaling
Giannone, Xu, Nayak, Awhad, Sudalairaj, Xu, Srivastava
https://arxiv.org/abs/2510.05825 https://mastoxiv.page/@arXiv_csLG_bot/115338159696513898
- Complete asymptotic type-token relationship for growing complex systems with inverse power-law co...
Pablo Rosillo-Rodes, Laurent H\'ebert-Dufresne, Peter Sheridan Dodds
https://arxiv.org/abs/2511.02069 https://mastoxiv.page/@arXiv_physicssocph_bot/115496283627867809
- ViPRA: Video Prediction for Robot Actions
Sandeep Routray, Hengkai Pan, Unnat Jain, Shikhar Bahl, Deepak Pathak
https://arxiv.org/abs/2511.07732 https://mastoxiv.page/@arXiv_csRO_bot/115535941444003568
- AISAC: An Integrated multi-agent System for Transparent, Retrieval-Grounded Scientific Assistance
Chandrachur Bhattacharya, Sibendu Som
https://arxiv.org/abs/2511.14043
- VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
Yufei Yin, Qianke Meng, Minghao Chen, Jiajun Ding, Zhenwei Shao, Zhou Yu
https://arxiv.org/abs/2512.12360 https://mastoxiv.page/@arXiv_csCV_bot/115729238732682644
- RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering
L\'eo Butsanets, Charles Corbi\`ere, Julien Khlaut, Pierre Manceron, Corentin Dancette
https://arxiv.org/abs/2512.17396 https://mastoxiv.page/@arXiv_csCV_bot/115762705911757243
- Measuring all the noises of LLM Evals
Sida Wang
https://arxiv.org/abs/2512.21326 https://mastoxiv.page/@arXiv_csLG_bot/115779597137785637
toXiv_bot_toot