Florida folks - it's crucial that we stop the latest attack on teachers unions in the state (SB 1296). Please contact the Senate Fiscal Policy Committee today by 4 pm using:
https://secure.ngpvan.com/8Rb6HrtN-UCjXbPhBR-S-Q2
"En el caso de las izquierdas, hay algunas izquierdas que cuando se trata de Cuba miran para otro lado sin entender que Cuba tiene muchas funciones en la batalla cultural de las izquierdas del hemisferio occidental."
"In the case of the left, there are certain left sections that, when Cuba comes up, look the other way, without understanding that Cuba has multiple functions in the cultural battle of the Western hemisphere left."
Iramis R. Cšrdenas:
Workday reports Q4 revenue up 14.5% YoY to $2.53B, vs. $2.52B est., and forecasts FY 2027 subscription revenue below estimates; WDAY drops 9% after hours (Larry Dignan/Constellation Research)
https://www.constellationr.com/insights/ne
En mi escala de condiciones aceptables para bandas que quieren volver a girar cuando han perdido a un miembro esencial estš:
1. Reemplazo
.
.
. {Cualquier otra idea} x 1 millón
.
.
∞. Usar hologramas
So to follow up on this, I've caught it in action. Models, when quantized a bit, just do a bit more poorly with short contexts. Even going from f32 (as trained) to bf16 (as usually run) to q8 tends to do okay for "normal" context windows. And q4 you start feeling like "this model is a little stupid and gets stuck sometimes” (it is! It's just that it's still mostly careening about in the space of "plausible" most of the time. Not good guesswork, but still in the zone). With long contexts, the probability of parameters collapsing to zero are higher, so the more context the more likelihood you are to see brokenness.
And then at Q2 (2 bits per parameter) or Q1, the model falls apart completely. Parameters collapse to zero easily. You start seeing "all work and no play makes jack a dull boy” sorts of behavior, with intense and unscrutinized repetition, followed by a hard stop when it just stops working.
And quantization is a parameter that a model vendor can turn relatively easily. (they have to regenerate the model from the base with more quantization, but it's a data transformation on the order of running a terabyte through a straightforward and fast process, not like training).
If you have 1000 customers and enough equipment to handle the requests of 700, going from bf16 to q8 is a no-brainer. Suddenly you can handle the load and have a little spare capacity. They get worse results, probably pay the same per token (or they're on a subscription that hides the cost anyway so you are even freer to make trade-offs. There's a reason that subscription products are kinda poorly described.)
It's also possible for them to vary this across a day: use models during quieter periods? Maybe you get an instance running a bf16 quantization. If you use it during a high use period? You get a Q4 model.
Or intelligent routing is possible. No idea if anyone is doing this, but if they monitor what you send a bit, and you generally shoot for an expensive model for simple requests? They could totally substitute a highly quantized version of the model to answer the question.
There are •so many tricks• that can be pulled here. Some of them very reasonable to make, some of them treading into outright misleading or fraudulent, and it's weirdly hard to draw the line between them.
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[5/6]:
- Watermarking Degrades Alignment in Language Models: Analysis and Mitigation
Apurv Verma, NhatHai Phan, Shubhendu Trivedi
https://arxiv.org/abs/2506.04462 https://mastoxiv.page/@arXiv_csCL_bot/114635190037336859
- Sensory-Motor Control with Large Language Models via Iterative Policy Refinement
J\^onata Tyska Carvalho, Stefano Nolfi
https://arxiv.org/abs/2506.04867 https://mastoxiv.page/@arXiv_csAI_bot/114635187854195641
- ICE-ID: A Novel Historical Census Dataset for Longitudinal Identity Resolution
de Carvalho, Popov, Kaatee, Correia, Th\'orisson, Li, Bj\"ornsson, Sigur{\dh}arson, Dibangoye
https://arxiv.org/abs/2506.13792 https://mastoxiv.page/@arXiv_csAI_bot/114703312162525342
- Feedback-driven recurrent quantum neural network universality
Lukas Gonon, Rodrigo Mart\'inez-Pe\~na, Juan-Pablo Ortega
https://arxiv.org/abs/2506.16332 https://mastoxiv.page/@arXiv_quantph_bot/114732532383196043
- Programming by Backprop: An Instruction is Worth 100 Examples When Finetuning LLMs
Cook, Sapora, Ahmadian, Khan, Rocktaschel, Foerster, Ruis
https://arxiv.org/abs/2506.18777 https://mastoxiv.page/@arXiv_csAI_bot/114738213040759661
- Stochastic Quantum Spiking Neural Networks with Quantum Memory and Local Learning
Jiechen Chen, Bipin Rajendran, Osvaldo Simeone
https://arxiv.org/abs/2506.21324 https://mastoxiv.page/@arXiv_csNE_bot/114754367612728319
- Enjoying Non-linearity in Multinomial Logistic Bandits: A Minimax-Optimal Algorithm
Pierre Boudart (SIERRA), Pierre Gaillard (Thoth), Alessandro Rudi (PSL, DI-ENS, Inria)
https://arxiv.org/abs/2507.05306 https://mastoxiv.page/@arXiv_statML_bot/114822374525501660
- Characterizing State Space Model and Hybrid Language Model Performance with Long Context
Saptarshi Mitra, Rachid Karami, Haocheng Xu, Sitao Huang, Hyoukjun Kwon
https://arxiv.org/abs/2507.12442 https://mastoxiv.page/@arXiv_csAR_bot/114867589638074984
- Is Exchangeability better than I.I.D to handle Data Distribution Shifts while Pooling Data for Da...
Ayush Roy, Samin Enam, Jun Xia, Won Hwa Kim, Vishnu Suresh Lokhande
https://arxiv.org/abs/2507.19575 https://mastoxiv.page/@arXiv_csCV_bot/114935399825741861
- TASER: Table Agents for Schema-guided Extraction and Recommendation
Nicole Cho, Kirsty Fielding, William Watson, Sumitra Ganesh, Manuela Veloso
https://arxiv.org/abs/2508.13404 https://mastoxiv.page/@arXiv_csAI_bot/115060386723032051
- Morphology-Aware Peptide Discovery via Masked Conditional Generative Modeling
Nuno Costa, Julija Zavadlav
https://arxiv.org/abs/2509.02060 https://mastoxiv.page/@arXiv_qbioBM_bot/115139546511384706
- PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models
Jeongjae Lee, Jong Chul Ye
https://arxiv.org/abs/2509.25774 https://mastoxiv.page/@arXiv_csCV_bot/115298580419859537
- Multi-hop Deep Joint Source-Channel Coding with Deep Hash Distillation for Semantically Aligned I...
Didrik Bergstr\"om, Deniz G\"und\"uz, Onur G\"unl\"u
https://arxiv.org/abs/2510.06868 https://mastoxiv.page/@arXiv_csIT_bot/115343320768797486
- MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile...
Chengshu Li, et al.
https://arxiv.org/abs/2510.18316 https://mastoxiv.page/@arXiv_csRO_bot/115416889485910123
- A Spectral Framework for Graph Neural Operators: Convergence Guarantees and Tradeoffs
Roxanne Holden, Luana Ruiz
https://arxiv.org/abs/2510.20954 https://mastoxiv.page/@arXiv_statML_bot/115445273121677005
- Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
Bazinska, Mathys, Casucci, Rojas-Carulla, Davies, Souly, Pfister
https://arxiv.org/abs/2510.22620 https://mastoxiv.page/@arXiv_csCR_bot/115451397563132982
- Uncertainty Calibration of Multi-Label Bird Sound Classifiers
Raphael Schwinger, Ben McEwen, Vincent S. Kather, Ren\'e Heinrich, Lukas Rauch, Sven Tomforde
https://arxiv.org/abs/2511.08261 https://mastoxiv.page/@arXiv_csSD_bot/115535982708483824
- Two-dimensional RMSD projections for reaction path visualization and validation
Rohit Goswami (Institute IMX and Lab-COSMO, \'Ecole polytechnique f\'ed\'erale de Lausanne)
https://arxiv.org/abs/2512.07329 https://mastoxiv.page/@arXiv_physicschemph_bot/115688910885717951
- Distribution-informed Online Conformal Prediction
Dongjian Hu, Junxi Wu, Shu-Tao Xia, Changliang Zou
https://arxiv.org/abs/2512.07770 https://mastoxiv.page/@arXiv_statML_bot/115689281155541568
- Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Ang Lv, Jin Ma, Yiyuan Ma, Siyuan Qiao
https://arxiv.org/abs/2512.23447 https://mastoxiv.page/@arXiv_csCL_bot/115808311310246601
toXiv_bot_toot
Criei um "starter pack" de contas portuguesas.
https://fedidevs.com/s/OTQ1/
Boa ideia? Mš ideia? É para aprofundar? É para apagar?
Yo pienso una cosa sobre la gente que critica que el Fediverso sea una cšmara de eco. Y es lo siguiente: ellos, porque son ellos, cuando van a un bar con sus amigos ¿se ponen a hablar con todo el mundo dejando que desconocidos entren en las conversaciones que tienen con sus amigos? ¿comparten sus medios de contacto para que puedan seguir dšndoles la turra por teléfono o que puedan ir a casa a hablar con ellos? ¿No es eso una cšmara de eco? ¿Por qué es všlido tener cšmaras de eco en la vida a…