2025-12-10 08:50:15
Freudian slip in Meske et. al 2025. http://arxiv.org/abs/2507.21928 #GenAI #vibecoding #AIResearch
Freudian slip in Meske et. al 2025. http://arxiv.org/abs/2507.21928 #GenAI #vibecoding #AIResearch
Kevin Xu argues that it's misleading to characterise the US–China AI competition as a race, since there's mutual co-operation and co-optation going on all the time: #AIResearch #LLM #AIResearch
Are you afraid of our new GenAI overlords taking over our jobs soon? According to a new benchmark, The Remote Labor Index by Scale AI and the Center for AI Safety (CAIS), there's no need to be. The best current models are able to solve around ~2% of the tasks of the index: #AIResearch #GenAI
So, the new LLM from Zhipu, GLM 4.6, is about as good at coding as Anthropic's Sonnet 4.5. but roughly 8 times cheaper. It's impressive since, apparently, Zhipu has raised 13x less capital than Anthropic. Additionally, since GLM 4.6 is an open(ish) model, the inference costs will come down rapidly.
The beginning of the end for the #AI investment bubble?
#AIResearch #opensource
This website illustrates nicely how the US lost the competition–at least for now–in open(ish) LLM models: #AIResearch #AGI_hype
/via Wired
Claude Sonnet 4.5 shows significantly increased situational awareness when testing for alignment, here's a fascinating example from p. 59 of the system card (#anthropology #AIResearch
"I found these ads after I was targeted by one suggesting I join this ethnically ambiguous, dead-eyed family of generic blue hat wearers at the World Series to root on, I guess, the Dodgers."
#AI #generativeAI #meta #AIResearch