2025-10-18 05:41:13
So. Much. This.
https://hci.social/@chrisamaphone/115391556289269102
So. Much. This.
https://hci.social/@chrisamaphone/115391556289269102
A small number of samples can poison LLMs of any size:
https://www.anthropic.com/research/small-samples-poison
"In a joint study with the UK AI Security Institute and the Alan Turing Institute, we found that as few as 250 malicious documents can produce a "
Evaluating and Mitigating LLM-as-a-judge Bias in Communication Systems
Jiaxin Gao, Chen Chen, Yanwen Jia, Xueluan Gong, Kwok-Yan Lam, Qian Wang
https://arxiv.org/abs/2510.12462 …
And more to read in your freetime, if you are interested in AI, from Bluesky, Phillip Isola.
"Over the past year, my lab has been working on fleshing out theory applications of the Platonic Representation Hypothesis.
Today I want to share two new works on this topic:"
Eliciting higher alignment:
https://arxiv.org/…
Racist language
allegedly used by leaders of
Young Republican groups in leaked chats
drew widespread condemnation from both sides of the political aisle Tuesday
and prompted the national youth organization to demand the resignations of those involved.
https://www.axios.co…
An Explorative Study on Distributed Computing Techniques in Training and Inference of Large Language Models
Sheikh Azizul Hakim, Saem Hasan
https://arxiv.org/abs/2510.11211 http…
KnowRL: Teaching Language Models to Know What They Know
Sahil Kale, Devendra Singh Dhami
https://arxiv.org/abs/2510.11407 https://arxiv.org/pdf/2510.11407
Running on the battle-tested #Erlang virtual machine that powers planet-scale systems such as WhatsApp and Ericsson, #Gleam is ready for workloads of any size.
https://gleam…
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Zixin Zhang, Kanghao Chen, Xingwang Lin, Lutao Jiang, Xu Zheng, Yuanhuiyi Lyu, Litao Guo, Yinchuan Li, Ying-Cong Chen
https://arxiv.org/abs/2510.09507
A study finds that as few as 250 malicious documents can produce a "backdoor" vulnerability in an LLM, regardless of model size or training data volume (Anthropic)
https://www.anthropic.com/research/small-samples-poison
UniFField: A Generalizable Unified Neural Feature Field for Visual, Semantic, and Spatial Uncertainties in Any Scene
Christian Maurer, Snehal Jauhri, Sophie Lueth, Georgia Chalvatzaki
https://arxiv.org/abs/2510.06754
Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts
Jihoon Lee, Hoyeon Moon, Kevin Zhai, Arun Kumar Chithanar, Anit Kumar Sahu, Soummya Kar, Chul Lee, Souradip Chakraborty, Amrit Singh Bedi
https://arxiv.org/abs/2510.05040
I gotta step up my Irish learning so we can do some irish anarchism https://todon.eu/@CrimethInc/115501256616016837
Algorithmic Temperature Induced by Adopted Regular Universal Turing Machine
Kentaro Imafuku
https://arxiv.org/abs/2510.11737 https://arxiv.org/pdf/2510.117…
Active Model Selection for Large Language Models
Yavuz Durmazkeser, Patrik Okanovic, Andreas Kirsch, Torsten Hoefler, Nezihe Merve G\"urel
https://arxiv.org/abs/2510.09418 …
A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants
Hans G. W. van Dam
https://arxiv.org/abs/2510.06223 https://arxiv.or…
@… I suspect the problem is that culture informs language, and in American English we aren't used to people taking the train many places.
I wonder what the Brits would say.
In Danish the word for "drove" would be appropriate. You could say "I drove to SFO" and it could equally mean by car, bus, or train without any further clarifi…
Is star complexity a proxy for information based complexity of graphs?
Russell K. Standish
https://arxiv.org/abs/2510.07722 https://arxiv.org/pdf/2510.0772…
TTRV: Test-Time Reinforcement Learning for Vision Language Models
Akshit Singh, Shyam Marjit, Wei Lin, Paul Gavrikov, Serena Yeung-Levy, Hilde Kuehne, Rogerio Feris, Sivan Doveh, James Glass, M. Jehanzeb Mirza
https://arxiv.org/abs/2510.06783
Improved Extended Regular Expression Matching
Philip Bille, Inge Li G{\o}rtz, Rikke Schjeldrup Jessen
https://arxiv.org/abs/2510.09311 https://arxiv.org/pd…
It’s strange to watch the world ignore that it’s not just the medium that matters, but the message does too.
And by “the message” I mean both the ideas and the precise language or visuals used to communicate them.
Subtle differences in linguistic execution of the same idea in the same format can have radically opposite effects. The same applies to an image captured from a different angle or in a different style or in a different composition.
And yet it feels like most organizations and most people within them are determined to march on ignoring any consideration of subtlety and craft.
#writing #design #art #marketing
"AI in the guise of Machine Learning, Deep Learning, GenerativeAI (GenAI), or Large Language Models (LLMs)... can be very useful in certain application areas such as recognising or generating patterns in large data sets. However, their key drawback is that any correctness arguments will be inherently probabilistic as they are usually based on unknown data distributions and are therefore susceptible to errors (sometimes termed “hallucinations”). "
RE: #fascism, especially American fascism, there's something you can do. If you're in the US, don't buy anything if you can until Dec 2nd. If you do have to buy something, buy second hand, or buy local. If you're outside, don't buy anything from the US or any US company at all.
Spread the word. Keep it on people's mind. Write your own post. Talk to people you know in person. Print out flyers and post them around town.
The system understands the language of money. If you want a response, you have to speak the language the system understands.
Trump is extremely vulnerable. Don't wait for the regime to recovery. Hit it hard right now, with an economic blockade. Who knows, it might just crumble.
#USPol
I've been thinking about that stupid "Unified Executive" rubbish that the maga-klan likes to spout.
They argue that Art II Sect 1 sez "The executive Power shall be vested in a President of the United States of America." meaning that the entire executive, all of it, every bit, is carried within the human frame of the President.
OK.
That language does not admit of any possibility of delegation of any part of that executive authority to another. So if …
Not a bad start, part 2 kinda kicked my ass because reading is really, really hard. I made the assumption that there wouldn't be any spins greater than 99 which was a horrible, horrible assumption to make.
However, not too difficult overall, happy enough with the solve.
#adventOfCode
Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies
Chunsan Hong, Seonho An, Min-Soo Kim, Jong Chul Ye
https://arxiv.org/abs/2510.05725 https://…
Observing Without Doing: Pseudo-Apprenticeship Patterns in Student LLM Use
Jade Hak, Nathaniel Lam Johnson, Matin Amoozadeh, Amin Alipour, Souti Chattopadhyay
https://arxiv.org/abs/2510.04986
See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation
Chih Yao Hu, Yang-Sen Lin, Yuna Lee, Chih-Hai Su, Jie-Ying Lee, Shr-Ruei Tsai, Chin-Yang Lin, Kuan-Wen Chen, Tsung-Wei Ke, Yu-Lun Liu
https://arxiv.org/abs/2509.22653
Satellite: Detecting and Analyzing Smart Contract Vulnerabilities caused by Subcontract Misuse
Zeqin Liao, Yuhong Nan, Zixu Gao, Henglong Liang, Sicheng Hao, Jiajing Wu, Zibin Zheng
https://arxiv.org/abs/2509.23679
Stochastic Self-Organization in Multi-Agent Systems
Nurbek Tastan, Samuel Horvath, Karthik Nandakumar
https://arxiv.org/abs/2510.00685 https://arxiv.org/pd…
Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation
Ruoyu Chen, Xiaoqing Guo, Kangwei Liu, Siyuan Liang, Shiming Liu, Qunli Zhang, Hua Zhang, Xiaochun Cao
https://arxiv.org/abs/2509.22496
FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline
Haotian Wu, Shufan Jiang, Chios Chen, Yiyang Feng, Hehai Lin, Heqing Zou, Yao Shu, Yanran Li, Chengwei Qin
https://arxiv.org/abs/2510.06800
Go With The Flow: Churn-Tolerant Decentralized Training of Large Language Models
Nikolay Blagoev, Bart Cox, J\'er\'emie Decouchant, Lydia Y. Chen
https://arxiv.org/abs/2509.21221
Unsupervised Learning and Representation of Mandarin Tonal Categories by a Generative CNN
Kai Schenck, Ga\v{s}per Begu\v{s}
https://arxiv.org/abs/2509.17859 https://
POVQA: Preference-Optimized Video Question Answering with Rationales for Data Efficiency
Ashim Dahal, Ankit Ghimire, Saydul Akbar Murad, Nick Rahimi
https://arxiv.org/abs/2510.01009
Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs
Ziliang Wang, Kang An, Xuhui Zheng, Faqiang Qian, Weikun Zhang, Cijun Ouyang, Jialu Cai, Yuhang Wang, Yichao Wu
https://arxiv.org/abs/2510.00861
Robust Vision-Language Models via Tensor Decomposition: A Defense Against Adversarial Attacks
Het Patel, Muzammil Allie, Qian Zhang, Jia Chen, Evangelos E. Papalexakis
https://arxiv.org/abs/2509.16163 …
Synthetic Dialogue Generation for Interactive Conversational Elicitation & Recommendation (ICER)
Moonkyung Ryu, Chih-Wei Hsu, Yinlam Chow, Mohammad Ghavamzadeh, Craig Boutilier
https://arxiv.org/abs/2510.02331
WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification
Yiwen Jiang, Deval Mehta, Siyuan Yan, Yaling Shen, Zimu Wang, Zongyuan Ge
https://arxiv.org/abs/2509.17740
Beyond the Score: Uncertainty-Calibrated LLMs for Automated Essay Assessment
Ahmed Karim (Judy), Qiao Wang (Judy), Zheng Yuan
https://arxiv.org/abs/2509.15926 https://