Let's Measure the Elephant in the Room: Facilitating Personalized Automated Analysis of Privacy Policies at Scale
Rui Zhao, Vladyslav Melnychuk, Jun Zhao, Jesse Wright, Nigel Shadbolt
https://arxiv.org/abs/2507.14214
AI, AGI, and learning efficiency
My 4-month-old kid is not DDoSing Wikipedia right now, nor will they ever do so before learning to speak, read, or write. Their entire "training corpus" will not top even 100 million "tokens" before they can speak & understand language, and do so with real intentionally.
Just to emphasize that point: 100 words-per-minute times 60 minutes-per-hour times 12 hours-per-day times 365 days-per-year times 4 years is a mere 105,120,000 words. That's a ludicrously *high* estimate of words-per-minute and hours-per-day, and 4 years old (the age of my other kid) is well after basic speech capabilities are developed in many children, etc. More likely the available "training data" is at least 1 or 2 orders of magnitude less than this.
The point here is that large language models, trained as they are on multiple *billions* of tokens, are not developing their behavioral capabilities in a way that's remotely similar to humans, even if you believe those capabilities are similar (they are by certain very biased ways of measurement; they very much aren't by others). This idea that humans must be naturally good at acquiring language is an old one (see e.g. #AI #LLM #AGI
ATLANTIS: AI-driven Threat Localization, Analysis, and Triage Intelligence System
Taesoo Kim, HyungSeok Han, Soyeon Park, Dae R. Jeong, Dohyeok Kim, Dongkwan Kim, Eunsoo Kim, Jiho Kim, Joshua Wang, Kangsu Kim, Sangwoo Ji, Woosun Song, Hanqing Zhao, Andrew Chin, Gyejin Lee, Kevin Stevens, Mansour Alharthi, Yizhuo Zhai, Cen Zhang, Joonun Jang, Yeongjin Jang, Ammar Askar, Dongju Kim, Fabian Fleischer, Jeongin Cho, Junsik Kim, Kyungjoon Ko, Insu Yun, Sangdon Park, Dowoo Baik, Haein Lee, Hy…
Developer-LLM Conversations: An Empirical Study of Interactions and Generated Code Quality
Suzhen Zhong, Ying Zou, Bram Adams
https://arxiv.org/abs/2509.10402 https://
A Humanoid Social Robot as a Teaching Assistant in the Classroom
Thomas Sievers
https://arxiv.org/abs/2508.05646 https://arxiv.org/pdf/2508.05646
How Robust are LLM-Generated Library Imports? An Empirical Study using Stack Overflow
Jasmine Latendresse, SayedHassan Khatoonabadi, Emad Shihab
https://arxiv.org/abs/2507.10818
REMINDER: Working With Tainted Legacies (virtual NeMLA panel)
https://ift.tt/fVI7RB8
updated: Monday, August 11, 2025 - 7:48amfull name / name of organization: Northeast Modern Language…
via Input 4 RELCFP
VoltanaLLM: Feedback-Driven Frequency Control and State-Space Routing for Energy-Efficient LLM Serving
Jiahuan Yu (University of Illinois Urbana-Champaign), Aryan Taneja (University of Illinois Urbana-Champaign), Junfeng Lin (Tsinghua University), Minjia Zhang (University of Illinois Urbana-Champaign)
https://arxiv.org/abs/2509.04827
InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching
Yilun Wang, Pengfei Chen, Haiyu Huang, Zilong He, Gou Tan, Chuanfu Zhang, Jingkai He, Zibin Zheng
https://arxiv.org/abs/2507.08523
MobileRAG: Enhancing Mobile Agent with Retrieval-Augmented Generation
Gowen Loo, Chang Liu, Qinghong Yin, Xiang Chen, Jiawei Chen, Jingyuan Zhang, Yu Tian
https://arxiv.org/abs/2509.03891