Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@nerb@techhub.social
2026-01-17 18:33:33

@…
This is about glucagon rescue devices.
Endo told me I need to get one. Said there are two groups especially in need of them. Young children and elderly. Said not to hurt my feelings but I am solidly in the second group.
Wow that is some expensive stuff! Debating if I really need it or not. Been able to handle lows myself for over 50 yea…

@aredridel@kolektiva.social
2026-04-14 14:22:42

So to follow up on this, I've caught it in action. Models, when quantized a bit, just do a bit more poorly with short contexts. Even going from f32 (as trained) to bf16 (as usually run) to q8 tends to do okay for "normal" context windows. And q4 you start feeling like "this model is a little stupid and gets stuck sometimes” (it is! It's just that it's still mostly careening about in the space of "plausible" most of the time. Not good guesswork, but still in the zone). With long contexts, the probability of parameters collapsing to zero are higher, so the more context the more likelihood you are to see brokenness.
And then at Q2 (2 bits per parameter) or Q1, the model falls apart completely. Parameters collapse to zero easily. You start seeing "all work and no play makes jack a dull boy” sorts of behavior, with intense and unscrutinized repetition, followed by a hard stop when it just stops working.
And quantization is a parameter that a model vendor can turn relatively easily. (they have to regenerate the model from the base with more quantization, but it's a data transformation on the order of running a terabyte through a straightforward and fast process, not like training).
If you have 1000 customers and enough equipment to handle the requests of 700, going from bf16 to q8 is a no-brainer. Suddenly you can handle the load and have a little spare capacity. They get worse results, probably pay the same per token (or they're on a subscription that hides the cost anyway so you are even freer to make trade-offs. There's a reason that subscription products are kinda poorly described.)
It's also possible for them to vary this across a day: use models during quieter periods? Maybe you get an instance running a bf16 quantization. If you use it during a high use period? You get a Q4 model.
Or intelligent routing is possible. No idea if anyone is doing this, but if they monitor what you send a bit, and you generally shoot for an expensive model for simple requests? They could totally substitute a highly quantized version of the model to answer the question.
There are •so many tricks• that can be pulled here. Some of them very reasonable to make, some of them treading into outright misleading or fraudulent, and it's weirdly hard to draw the line between them.

@Techmeme@techhub.social
2026-03-11 12:56:11

Binance files a New York defamation suit against Dow Jones over the WSJ's February 23 article on the crypto exchange's handling of Iranian-linked transactions (Francisco Rodrigues/CoinDesk)
coindesk.c…

@fgraver@hcommons.social
2026-02-26 19:39:15

Det er snart 10 år siden jeg var med på å skrive denne kronikken om fremtiden for Den norske filmskolen. Jeg står fremdeles for innholdet, men dessverre virker det som de viktigste kampene er tapt.
Den norske filmskolen i et nytt fusjonert landskap

@Mediagazer@mstdn.social
2026-03-11 12:45:54

Binance files a New York defamation suit against Dow Jones over the WSJ's February 23 article on the crypto exchange's handling of Iranian-linked transactions (Francisco Rodrigues/CoinDesk)
coindesk.c…

Illinois is joining a network of the World Health Organization
in hopes of better positioning the state to handle potential health threats,
following the U.S. withdrawal from the group last month.
It’s the state’s latest move into an area that was previously the domain of the federal government,
before the administration began remaking public health policies and guidance.
The Illinois Department of Public Health this week officially joined the World Health Organi…

@andres4ny@social.ridetrans.it
2026-03-05 19:01:07

If you imagine #NYC police as Cartman (from South Park) doing the whole "respect mah authoritah!" bit, the behavior of the NYPD around the snowballs incident makes complete sense. It's not that anyone was hurt, nor was there any damage, but their fragile egos just can't handle not having their authority respected. Even when they do completely dumb shit like wandering into the m…

The NYPD's dragnet against revelers who took part in the viral snowball fight in Washington Square Park on February 23 has now nabbed a second New Yorker—this time, a teenager.

On Wednesday morning, 18-year-old East Harlem resident Eric Wilson Jr. turned himself in at his local precinct, and was arraigned yesterday on misdemeanor charges of obstructing government administration and harassment in the second degree, according to the Manhattan DA's office. Following the snowball fight, the NYPD…
@arXiv_csOS_bot@mastoxiv.page
2026-02-10 09:40:07

Equilibria: Fair Multi-Tenant CXL Memory Tiering At Scale
Kaiyang Zhao, Neha Gholkar, Hasan Maruf, Abhishek Dhanotia, Johannes Weiner, Gregory Price, Ning Sun, Bhavya Dwivedi, Stuart Clark, Dimitrios Skarlatos
arxiv.org/abs/2602.08800 arxiv.org/pdf/2602.08800 arxiv.org/html/2602.08800
arXiv:2602.08800v1 Announce Type: new
Abstract: Memory dominates datacenter system cost and power. Memory expansion via Compute Express Link (CXL) is an effective way to provide additional memory at lower cost and power, but its effective use requires software-level tiering for hyperscaler workloads. Existing tiering solutions, including current Linux support, face fundamental limitations in production deployments. First, they lack multi-tenancy support, failing to handle stacked homogeneous or heterogeneous workloads. Second, limited control-plane flexibility leads to fairness violations and performance variability. Finally, insufficient observability prevents operators from diagnosing performance pathologies at scale.
We present Equilibria, an OS framework enabling fair, multi-tenant CXL tiering at datacenter scale. Equilibria provides per-container controls for memory fair-share allocation and fine-grained observability of tiered-memory usage and operations. It further enforces flexible, user-specified fairness policies through regulated promotion and demotion, and mitigates noisy-neighbor interference by suppressing thrashing.
Evaluated in a large hyperscaler fleet using production workloads and benchmarks, Equilibria helps workloads meet service level objectives (SLOs) while avoiding performance interference. It improves performance over the state-of-the-art Linux solution, TPP, by up to 52% for production workloads and 1.7x for benchmarks. All Equilibria patches have been released to the Linux community.
toXiv_bot_toot

@seeingwithsound@mas.to
2026-03-28 22:37:26

To ChatGPT: Elon Musk focuses on scalable technologies, also for the upcoming Neuralink Blindsight brain implant for the blind. At some point these implants will start to fail in patients. Will Neuralink's handling of failing brain implants prove scalable? chatgpt.com/share/69c85655-c24

@Techmeme@techhub.social
2026-01-28 05:55:54

AI Whistleblower Initiative says OpenAI recently updated its whistleblower policy, addressing 8 of 13 recommendations and going further than Anthropic's policy (Rocket Drew/The Information)
theinformation.com/articles/op