Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@aredridel@kolektiva.social
2026-04-14 14:22:42

So to follow up on this, I've caught it in action. Models, when quantized a bit, just do a bit more poorly with short contexts. Even going from f32 (as trained) to bf16 (as usually run) to q8 tends to do okay for "normal" context windows. And q4 you start feeling like "this model is a little stupid and gets stuck sometimes” (it is! It's just that it's still mostly careening about in the space of "plausible" most of the time. Not good guesswork, but still in the zone). With long contexts, the probability of parameters collapsing to zero are higher, so the more context the more likelihood you are to see brokenness.
And then at Q2 (2 bits per parameter) or Q1, the model falls apart completely. Parameters collapse to zero easily. You start seeing "all work and no play makes jack a dull boy” sorts of behavior, with intense and unscrutinized repetition, followed by a hard stop when it just stops working.
And quantization is a parameter that a model vendor can turn relatively easily. (they have to regenerate the model from the base with more quantization, but it's a data transformation on the order of running a terabyte through a straightforward and fast process, not like training).
If you have 1000 customers and enough equipment to handle the requests of 700, going from bf16 to q8 is a no-brainer. Suddenly you can handle the load and have a little spare capacity. They get worse results, probably pay the same per token (or they're on a subscription that hides the cost anyway so you are even freer to make trade-offs. There's a reason that subscription products are kinda poorly described.)
It's also possible for them to vary this across a day: use models during quieter periods? Maybe you get an instance running a bf16 quantization. If you use it during a high use period? You get a Q4 model.
Or intelligent routing is possible. No idea if anyone is doing this, but if they monitor what you send a bit, and you generally shoot for an expensive model for simple requests? They could totally substitute a highly quantized version of the model to answer the question.
There are •so many tricks• that can be pulled here. Some of them very reasonable to make, some of them treading into outright misleading or fraudulent, and it's weirdly hard to draw the line between them.

@wraithe@mastodon.social
2026-05-13 13:00:54

I know I posted about this a couple months ago, but I never did follow up - By God It Does work!
You have to apply a little force so that the center breaks free, but it works
All these years, and learning new things…😂
bsky.app/profile/did:plc:5iw4w

@Techmeme@techhub.social
2026-04-08 15:10:56

Alibaba and China Telecom launch a data center in southern China that is powered by 10,000 of Alibaba's Zhenwu chips designed for AI training and inferencing (Arjun Kharpal/CNBC)
cnbc.com/2026/04/08/china-alib

@Don_kun@nerdculture.de
2026-04-14 15:55:21

Abfrage von Namenslisten von Kunst-Jury-Mitgliedern - Weimer macht sich jetzt auch bei der Bildenden Kunst unbeliebt.
Hat der eine Wette laufen, dass er sich jede Kulturszene zum Feind machen kann?
tagesspiegel.de/politi…

@thesaigoneer@social.linux.pizza
2026-03-14 07:35:23

Remember I did that bootc KDE container, based on the Bluefin template? That method is really handing the power to the user. I've now tried Origami (Cosmic), today a brief stint with Zirconium (Niri and Dank) and I'm now heading into Zena. That last one looks to be very special, and is also a bootc container. More later.
#archlabs

@wraithe@mastodon.social
2026-05-14 16:01:16

From a world expert on the transmission of infectious diseases, discussing hantavirus.
bsky.app/profile/did:plc:z3nm4

@Sustainable2050@mastodon.energy
2026-03-12 07:55:49

Once again, fossil energy prices are soaring due to global conflicts. Let’s save and replace another 10 billion cubic meters of natural gas per year by ‘orange-green’ energy, within 5 years: emergency plan by @…, the Netherlands Association for Renewable Energy.

@paul@social.van.buu.re
2026-03-12 17:12:30

Wat is een goede Nederlandse vertaling van de 'Priority of Constituencies’:
"In case of conflict, consider users over authors over implementors over specifiers over theoretical purity."
Want deze zin loopt niet lekker, Googeltje: "In geval van conflict, geef voorrang aan gebruikers boven auteurs, boven implementeerders, boven specificatieschrijvers, boven theoretische zuiverheid."
Dit MOET echt beter. En AI is geen vertaler.

@goebelmasse@det.social
2026-05-12 19:28:48

So so, »Demokratie und Mathematik«.
Herr Merz, was halten sie von einer kleinen Regressionsanalyse. Betrachten Sie die AfD-Wahlergebnisse der letzten zehn Jahre, nähern Sie eine affine Funktion an, bestimmen Sie den Regressionskoeffizienten und schätzen Sie das Ergebnis bei der nächsten Bundestagswahl ab!
Meine Fresse! Der Typ will Deutschland brennen sehen.

@chpietsch@fedifreu.de
2026-04-12 09:37:52

Auf dem Podium der Bewegungskonferenz Cables of Resistance beklagt @…, dass sogar Linke »KI«-Tools wie Chatbots und Bildgeneratoren benutzen.
Dabei ist ihr einziger Daseinszweck, Lohnabhängige zu entrechten. Wer sie regelmäßig nutzt, verliert die Fähigkeit zum kritischen Denken. Diese Technologie ist inhärent fa…

Jurgen und andere auf dem Podium bei den Cables of Resistance. Dahinter eine Folie von Tante:

Data – this is your future

mit einem Bild, das einen Stiefel zeigt, der einen Menschen zertritt
Folie mit dem 1984-Zitat »Ignorance is strength« neben dem Text »Zerstörung von Wahrheit und den Strukturen, die sie stützen« (als Folge des Einsatzes generativer »KI«)