Tootfinder

Opt-in global Mastodon full text search. Join the index!

@fanf@mendeddrum.org
2025-06-09 20:42:03

from my link log —
Alan Kay did not invent object-oriented programming.
hillelwayne.com/post/alan-kay/
saved 2025-05-11

@arXiv_quantph_bot@mastoxiv.page
2025-06-10 18:10:10

This arxiv.org/abs/1809.01751 has been replaced.
link: scholar.google.com/scholar?q=a

@arXiv_csAI_bot@mastoxiv.page
2025-07-11 09:18:01

On Trustworthy Rule-Based Models and Explanations
Mohamed Siala, Jordi Planes, Joao Marques-Silva
arxiv.org/abs/2507.07576

@arXiv_csSE_bot@mastoxiv.page
2025-06-10 16:56:19

This arxiv.org/abs/2412.06430 has been replaced.
initial toot: mastoxiv.page/@arXiv_csSE_…

@arXiv_csCR_bot@mastoxiv.page
2025-07-11 07:55:01

Disa: Accurate Learning-based Static Disassembly with Attentions
Peicheng Wang, Monika Santra, Mingyu Liu, Cong Sun, Dongrui Zeng, Gang Tan
arxiv.org/abs/2507.07246

@arXiv_csCE_bot@mastoxiv.page
2025-06-11 07:19:23

KP-PINNs: Kernel Packet Accelerated Physics Informed Neural Networks
Siyuan Yang, Cheng Song, Zhilu Lai, Wenjia Wang
arxiv.org/abs/2506.08563

@jake4480@c.im
2025-08-07 13:42:38

Around 20 years ago, I made my first and only change to something on Wikipedia. It was for an underground rap artist- I just made a correction or two to something that was obviously incorrect, and my information was correct, thinking nothing of it. I looked at it the next day, and a Wikipedia editor or mod or whatever changed it back. That was the last time I ever edited anything there. I still look up things on Wikipedia (with a grain of salt) but that experience really bugged me. The Wikip…

@arXiv_csLG_bot@mastoxiv.page
2025-07-11 10:23:51

Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
Ziyue Li, Yang Li, Tianyi Zhou
arxiv.org/abs/2507.07996 arxiv.org/pdf/2507.07996 arxiv.org/html/2507.07996
arXiv:2507.07996v1 Announce Type: new
Abstract: Can a pretrained neural network adapt its architecture to different inputs without any finetuning? Do we need all layers for simple tasks, and are they adequate for challenging tasks? We found that the layers of a pretrained large language model (LLM) can be manipulated as separate modules to build a better and even shallower model customized for each test sample. In particular, each layer from the pretrained model can be skipped/pruned or repeated multiple times as recurrent neural networks (RNN), and stacked with others in arbitrary orders, yielding a chain-of-layers (CoLa) per sample. This compositional space greatly expands the scope of existing works on looped/recurrent pretrained modules, layer pruning, or early-exit networks. We develop a Monte Carlo Tree Search (MCTS) protocol to explore and identify the optimal CoLa for each sample from math and commonsense reasoning benchmarks. Compared to a static model of a fixed depth, CoLa allows shortcut paths (fast thinking), recurrence of the same layer(s) (slow thinking), and combining both, offering more flexible, dynamic architectures for different inputs. We conduct an extensive analysis of the MCTS-optimized CoLa, which leads to two key findings: (1) For >75% of samples with correct predictions by the original LLM, we can find shorter CoLa, suggesting a large space for improving inference efficiency; (2) For >60% of samples with originally incorrect predictions, we can identify CoLa achieving correct predictions, suggesting a large space of performance enhancement. Our results highlight the shortcomings of using a fixed architecture of pre-trained LLMs for inference on different samples and pave the way to unlock the generalization power of test-time depth adaptation.
toXiv_bot_toot

@arXiv_quantph_bot@mastoxiv.page
2025-06-10 18:43:30

This arxiv.org/abs/2411.04215 has been replaced.
initial toot: mastoxiv.page/@arXiv_qu…

@arXiv_csSE_bot@mastoxiv.page
2025-06-10 10:44:13

Adversarial Attack Classification and Robustness Testing for Large Language Models for Code
Yang Liu, Armstrong Foundjem, Foutse Khomh, Heng Li
arxiv.org/abs/2506.07942

@Techmeme@techhub.social
2025-08-05 21:40:52

Wikipedia editors adopt a policy giving admins the authority to quickly delete AI-generated articles that meet certain criteria, like incorrect citations (Emanuel Maiberg/404 Media)
404media.co/wikipedia-editors-

@Mediagazer@mstdn.social
2025-08-06 08:01:26

Wikipedia editors adopt a policy giving admins the authority to quickly delete AI-generated articles that meet certain criteria, like incorrect citations (Emanuel Maiberg/404 Media)
404media.co/wikipedia-editors-

@arXiv_quantph_bot@mastoxiv.page
2025-07-10 09:05:01

Where Photons Have Been: Nowhere Without All Components of Their Wavefunctions
R. E. Kastner
arxiv.org/abs/2507.06362

@arXiv_csDS_bot@mastoxiv.page
2025-08-08 08:25:02

A Refutation of Elmasry's $\tilde{O}(m \sqrt{n})$-Time Algorithm for Single-Source Shortest Paths
Sunny Atalig, Marek Chrobak
arxiv.org/abs/2508.04872

@AdamCoffman@mathstodon.xyz
2025-06-05 20:37:49

As recently as last year I was recommending to my #Math students that if they were having trouble with an elementary algebra expression, they could try using the Google search bar to solve or simplify something. Maybe I should show them this instead.🤦

google's AI displays multiple algebra errors and arrives at a clearly incorrect solution to a search bar query about solving a polynomial equation.   (apparently my no-punching-down personal rule does not apply in this situation.)
@arXiv_mathPR_bot@mastoxiv.page
2025-07-08 11:20:30

A tightness criterion for fragmentations
Gabriel Berzunza Ojeda, Cecilia Holmgren, Svante Janson
arxiv.org/abs/2507.05102

@samir@functional.computer
2025-07-01 10:06:19
Content warning:  

@… Incorrect, Legalodon is a giant shark with a bow who’s in love with a dwarf.

@LaChasseuse@mastodon.scot
2025-07-03 10:27:49

How to use the internet badly:
- I did a search on opening hours for the Scotrail ticket office in my town, got 6 different answers, 2 of them from Scotrail sources
- I tweeted to their twitter acct, telling them abt the variety of answers online, and could they please tell me the correct opening hours
- They tweeted back with what turned out to be incorrect information
- I thanked them and said they should contact the other travel sites and update their info /1

@tiotasram@kolektiva.social
2025-08-04 15:49:00

Should we teach vibe coding? Here's why not.
Should AI coding be taught in undergrad CS education?
1/2
I teach undergraduate computer science labs, including for intro and more-advanced core courses. I don't publish (non-negligible) scholarly work in the area, but I've got years of craft expertise in course design, and I do follow the academic literature to some degree. In other words, In not the world's leading expert, but I have spent a lot of time thinking about course design, and consider myself competent at it, with plenty of direct experience in what knowledge & skills I can expect from students as they move through the curriculum.
I'm also strongly against most uses of what's called "AI" these days (specifically, generative deep neutral networks as supplied by our current cadre of techbro). There are a surprising number of completely orthogonal reasons to oppose the use of these systems, and a very limited number of reasonable exceptions (overcoming accessibility barriers is an example). On the grounds of environmental and digital-commons-pollution costs alone, using specifically the largest/newest models is unethical in most cases.
But as any good teacher should, I constantly question these evaluations, because I worry about the impact on my students should I eschew teaching relevant tech for bad reasons (and even for his reasons). I also want to make my reasoning clear to students, who should absolutely question me on this. That inspired me to ask a simple question: ignoring for one moment the ethical objections (which we shouldn't, of course; they're very stark), at what level in the CS major could I expect to teach a course about programming with AI assistance, and expect students to succeed at a more technically demanding final project than a course at the same level where students were banned from using AI? In other words, at what level would I expect students to actually benefit from AI coding "assistance?"
To be clear, I'm assuming that students aren't using AI in other aspects of coursework: the topic of using AI to "help you study" is a separate one (TL;DR it's gross value is not negative, but it's mostly not worth the harm to your metacognitive abilities, which AI-induced changes to the digital commons are making more important than ever).
So what's my answer to this question?
If I'm being incredibly optimistic, senior year. Slightly less optimistic, second year of a masters program. Realistic? Maybe never.
The interesting bit for you-the-reader is: why is this my answer? (Especially given that students would probably self-report significant gains at lower levels.) To start with, [this paper where experienced developers thought that AI assistance sped up their work on real tasks when in fact it slowed it down] (arxiv.org/abs/2507.09089) is informative. There are a lot of differences in task between experienced devs solving real bugs and students working on a class project, but it's important to understand that we shouldn't have a baseline expectation that AI coding "assistants" will speed things up in the best of circumstances, and we shouldn't trust self-reports of productivity (or the AI hype machine in general).
Now we might imagine that coding assistants will be better at helping with a student project than at helping with fixing bugs in open-source software, since it's a much easier task. For many programming assignments that have a fixed answer, we know that many AI assistants can just spit out a solution based on prompting them with the problem description (there's another elephant in the room here to do with learning outcomes regardless of project success, but we'll ignore this over too, my focus here is on project complexity reach, not learning outcomes). My question is about more open-ended projects, not assignments with an expected answer. Here's a second study (by one of my colleagues) about novices using AI assistance for programming tasks. It showcases how difficult it is to use AI tools well, and some of these stumbling blocks that novices in particular face.
But what about intermediate students? Might there be some level where the AI is helpful because the task is still relatively simple and the students are good enough to handle it? The problem with this is that as task complexity increases, so does the likelihood of the AI generating (or copying) code that uses more complex constructs which a student doesn't understand. Let's say I have second year students writing interactive websites with JavaScript. Without a lot of care that those students don't know how to deploy, the AI is likely to suggest code that depends on several different frameworks, from React to JQuery, without actually setting up or including those frameworks, and of course three students would be way out of their depth trying to do that. This is a general problem: each programming class carefully limits the specific code frameworks and constructs it expects students to know based on the material it covers. There is no feasible way to limit an AI assistant to a fixed set of constructs or frameworks, using current designs. There are alternate designs where this would be possible (like AI search through adaptation from a controlled library of snippets) but those would be entirely different tools.
So what happens on a sizeable class project where the AI has dropped in buggy code, especially if it uses code constructs the students don't understand? Best case, they understand that they don't understand and re-prompt, or ask for help from an instructor or TA quickly who helps them get rid of the stuff they don't understand and re-prompt or manually add stuff they do. Average case: they waste several hours and/or sweep the bugs partly under the rug, resulting in a project with significant defects. Students in their second and even third years of a CS major still have a lot to learn about debugging, and usually have significant gaps in their knowledge of even their most comfortable programming language. I do think regardless of AI we as teachers need to get better at teaching debugging skills, but the knowledge gaps are inevitable because there's just too much to know. In Python, for example, the LLM is going to spit out yields, async functions, try/finally, maybe even something like a while/else, or with recent training data, the walrus operator. I can't expect even a fraction of 3rd year students who have worked with Python since their first year to know about all these things, and based on how students approach projects where they have studied all the relevant constructs but have forgotten some, I'm not optimistic seeing these things will magically become learning opportunities. Student projects are better off working with a limited subset of full programming languages that the students have actually learned, and using AI coding assistants as currently designed makes this impossible. Beyond that, even when the "assistant" just introduces bugs using syntax the students understand, even through their 4th year many students struggle to understand the operation of moderately complex code they've written themselves, let alone written by someone else. Having access to an AI that will confidently offer incorrect explanations for bugs will make this worse.
To be sure a small minority of students will be able to overcome these problems, but that minority is the group that has a good grasp of the fundamentals and has broadened their knowledge through self-study, which earlier AI-reliant classes would make less likely to happen. In any case, I care about the average student, since we already have plenty of stuff about our institutions that makes life easier for a favored few while being worse for the average student (note that our construction of that favored few as the "good" students is a large part of this problem).
To summarize: because AI assistants introduce excess code complexity and difficult-to-debug bugs, they'll slow down rather than speed up project progress for the average student on moderately complex projects. On a fixed deadline, they'll result in worse projects, or necessitate less ambitious project scoping to ensure adequate completion, and I expect this remains broadly true through 4-6 years of study in most programs (don't take this as an endorsement of AI "assistants" for masters students; we've ignored a lot of other problems along the way).
There's a related problem: solving open-ended project assignments well ultimately depends on deeply understanding the problem, and AI "assistants" allow students to put a lot of code in their file without spending much time thinking about the problem or building an understanding of it. This is awful for learning outcomes, but also bad for project success. Getting students to see the value of thinking deeply about a problem is a thorny pedagogical puzzle at the best of times, and allowing the use of AI "assistants" makes the problem much much worse. This is another area I hope to see (or even drive) pedagogical improvement in, for what it's worth.
1/2

@Mediagazer@mstdn.social
2025-06-04 11:30:50

The BBC rejects a White House claim that it took down a Gaza story, after Press Secretary Karoline Leavitt accused the BBC of taking "the word of Hamas" (Barbara Tasch/BBC)
bbc.com/news/articles/ce814ez7

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:19:57

Benford's Curse: Tracing Digit Bias to Numerical Hallucination in LLMs
Jiandong Shao, Yao Lu, Jianfei Yang
arxiv.org/abs/2506.01734

@arXiv_csDC_bot@mastoxiv.page
2025-06-05 09:37:42

This arxiv.org/abs/2412.09840 has been replaced.
initial toot: mastoxiv.page/@arXiv_csDC_…

@migueldeicaza@mastodon.social
2025-05-24 14:23:52

Dear Apple SwiftUI friends, this bug is killing the vibe of our users: FB16257334
Repro for those in the community:
github.com/feedback-assistant/

@kurtsh@mastodon.social
2025-05-29 00:22:20

Remember when Jimmy "The Greek" Snyder made those arrogantly racist & fundamentally incorrect comments in the 80s?
He got FIRED from CBS for that.
Your move, #NewYorkTimes.
☑️ The New York Times Just Published Some Bizarre Race Science About Asian Women

@catsalad@infosec.exchange
2025-06-22 12:06:29

Advanced Purrsistent Threat (APT)

"My laptop is set up to take pictures after three incorrect password attempts"

[Photo of a kitty caught trying to log in to the humans laptop. Usually they don't get caught, but even the best hacker cats make mistakes from time to time.]
@arXiv_csRO_bot@mastoxiv.page
2025-06-03 07:50:52

Diffusion Models for Increasing Accuracy in Olfaction Sensors and Datasets
Kordel K. France, Ovidiu Daescu
arxiv.org/abs/2506.00455

@arXiv_csAR_bot@mastoxiv.page
2025-08-05 07:32:59

Silent Data Corruption by 10x Test Escapes Threatens Reliable Computing
Subhasish Mitra, Subho Banerjee, Martin Dixon, Rama Govindaraju, Peter Hochschild, Eric X. Liu, Bharath Parthasarathy, Parthasarathy Ranganathan
arxiv.org/abs/2508.01786

@arXiv_csPL_bot@mastoxiv.page
2025-07-02 08:05:00

Estimating Correctness Without Oracles in LLM-Based Code Generation
Thomas Valentin, Ardi Madadi, Gaetano Sapia, Marcel B\"ohme
arxiv.org/abs/2507.00057

@arXiv_statCO_bot@mastoxiv.page
2025-07-09 08:19:22

hassediagrams:an R package that generates the Hasse diagram of the layout structure and the restricted layout structure
Damianos Michaelides, Simon T. Bate, Marion J. Chatfield
arxiv.org/abs/2507.05949

@arXiv_hepph_bot@mastoxiv.page
2025-06-03 18:06:20

This arxiv.org/abs/2505.20982 has been replaced.
initial toot: mastoxiv.page/@arXiv_hepp…

@arXiv_csCY_bot@mastoxiv.page
2025-06-03 07:18:53

Distinguishing Fact from Fiction: Student Traits, Attitudes, and AI Hallucination Detection in Business School Assessment
Canh Thien Dang, An Nguyen
arxiv.org/abs/2506.00050

@arXiv_csHC_bot@mastoxiv.page
2025-06-02 09:58:02

This arxiv.org/abs/2503.00303 has been replaced.
initial toot: mastoxiv.page/@arXiv_csHC_…

@arXiv_astrophHE_bot@mastoxiv.page
2025-07-04 09:47:11

Spherically Symmetric Accretion with Self-Gravity: Analytical Formulae and Numerical Validation
Cheng-Liang Jiao, Er-gang Zhao, Liying Zhu, Xiang-dong Shi
arxiv.org/abs/2507.02621

@arXiv_csDS_bot@mastoxiv.page
2025-08-07 09:28:34

Approximation Algorithms for Scheduling Crowdsourcing Tasks in Mobile Social Networks
Chi-Yeh Chen
arxiv.org/abs/2508.04159 arxiv.org/pdf/2…

@arXiv_csNI_bot@mastoxiv.page
2025-07-18 08:48:12

Bidirectional Age of Incorrect Information: A Performance Metric for Status Updates in Virtual Dynamic Environments
Chiara Schiavo, Manuele Favero, Alessandro Buratto, Leonardo Badia
arxiv.org/abs/2507.13312

@arXiv_statME_bot@mastoxiv.page
2025-07-03 08:36:10

Minority Representation in Network Rankings: Methods for Estimation, Testing, and Fairness
Hui Shen, Peter W. MacDonald, Eric D. Kolaczyk
arxiv.org/abs/2507.01136

@andycarolan@social.lol
2025-07-22 15:21:31

I'm seeing some really awful, low effort "may be..." ALT text recently. Clearly generated by an automatic process rather than by a human.
Is bad* alt text worse than no alt text?
*completely incorrect, and misleading
#Accessibility #a11y

@arXiv_mathCO_bot@mastoxiv.page
2025-07-01 10:27:33

Experimenting with Permutation Wordle
Aurora Hiveley
arxiv.org/abs/2506.23452 arxiv.org/pdf/2506.23452

@bici@mastodon.social
2025-06-23 22:12:33

"Deep down in his heart, I believe Trump knows he’s an incompetent a**hole."
Dear Mr King. You are incorrect about the Heart ❤️ part

@arXiv_statML_bot@mastoxiv.page
2025-06-30 08:58:10

Classification with Reject Option: Distribution-free Error Guarantees via Conformal Prediction
Johan Hallberg Szabadv\'ary, Tuwe L\"ofstr\"om, Ulf Johansson, Cecilia S\"onstr\"od, Ernst Ahlberg, Lars Carlsson
arxiv.org/abs/2506.21802

@gfriend@mas.to
2025-05-21 18:22:10

A strange political story that somehow seems strangely relevant today.
apple.news/AZzxjXuHrQNuJ_10aeK

@midtsveen@social.linux.pizza
2025-07-23 02:44:28

It is very funny when you get blocked for sharing a "Comparison of Android-based Operating Systems" that I didn't make, and if you think anything is factually incorrect with the comparison chart, you can contribute to it.
eylenburg.github.io/android_co

@mgorny@social.treehouse.systems
2025-07-24 03:59:47

#Python world be like:
"Oh, hi, we wrote a new library implementing this spec."
"Hey, it looks like it doesn't conform to the spec, it doesn't pass the examples from it."
"Oh, you're right, we'll fix it ASAP."
…and that was over 3 years ago.
And yet projects keep adding a dependency on this library which has a single "pre-alpha" release 3.5 years ago and whose very first bug report points out it's incorrect.

@Techmeme@techhub.social
2025-07-24 15:06:10

The EU says it will investigate whether KKR provided incorrect or misleading information in its €22B acquisition of Telecom Italia's fixed-line network (Foo Yun Chee/Reuters)
reuters.com/legal/litigation/e

@arXiv_csDB_bot@mastoxiv.page
2025-07-31 07:38:51

Scalability, Availability, Reproducibility and Extensibility in Islamic Database Systems
Umar Siddiqui, Habiba Youssef, Adel Sabour, Mohamed Ali
arxiv.org/abs/2507.22384

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:21:05

Self-ensemble: Mitigating Confidence Distortion for Large Language Models
Zicheng Xu, Guanchu Wang, Guangyao Zheng, Yu-Neng Chuang, Alexander Szalay, Xia Hu, Vladimir Braverman
arxiv.org/abs/2506.01951

@primonatura@mstdn.social
2025-06-16 11:00:37

"Trump says power plants don't add to air pollution. Climate scientists say it’s 'nonsensical'"
#US #USA #America

@arXiv_csCV_bot@mastoxiv.page
2025-07-25 10:21:02

SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
Si-Woo Kim, MinJu Jeon, Ye-Chan Kim, Soeun Lee, Taewhan Kim, Dong-Jin Kim
arxiv.org/abs/2507.18616

@tml@urbanists.social
2025-06-13 18:26:38

A heads-up for #Interrail in Denmark: The Rail Planner app seems to have some incorrect data at the moment. The ICL train I just took between Kolding and Copenhagen is missing from the app. As always, trust bahn.de or the DB Navigator app more.

@arXiv_csSE_bot@mastoxiv.page
2025-06-04 13:37:45

This arxiv.org/abs/2410.05605 has been replaced.
initial toot: mastoxiv.page/@arXiv_csSE_…

@arXiv_csAI_bot@mastoxiv.page
2025-07-31 08:32:41

Beyond Accuracy: How AI Metacognitive Sensitivity improves AI-assisted Decision Making
ZhaoBin Li, Mark Steyvers
arxiv.org/abs/2507.22365 a…

@peter_mcmahan@mas.to
2025-05-14 17:27:35

I noticed that OpenStreetMaps somehow lost Lake Michigan, so I went investigating. Apparently back in April somebody accidentally changed it from a "lake" to a "school" and it's taking months to be fixed across all regions/renderers.

Mobile browser screenshot of the northern Midwest united states from openstreetmap.org. Lake Michigan is conspicuously missing from the Great Lakes.
@arXiv_astrophIM_bot@mastoxiv.page
2025-06-27 09:25:09

The Rayleigh Criterion: Resolution Limits of Astronomical Periodograms
V. Ramirez Delgado, J. S. Caicedo Vivas, S. Dodson-Robinson, C. Haley
arxiv.org/abs/2506.20864

@arXiv_grqc_bot@mastoxiv.page
2025-06-23 10:03:30

Impact of Detector Calibration Accuracy on Black Hole Spectroscopy
Mallika R. Sinha, Ling Sun, Sizheng Ma
arxiv.org/abs/2506.15979

@arXiv_csIR_bot@mastoxiv.page
2025-07-24 08:20:59

R4ec: A Reasoning, Reflection, and Refinement Framework for Recommendation Systems
Hao Gu, Rui Zhong, Yu Xia, Wei Yang, Chi Lu, Peng Jiang, Kun Gai
arxiv.org/abs/2507.17249

@arXiv_csCG_bot@mastoxiv.page
2025-06-24 08:06:40

Optimal Parallel Algorithms for Convex Hulls in 2D and 3D under Noisy Primitive Operations
Michael T. Goodrich, Vinesh Sridhar
arxiv.org/abs/2506.17507

@chrislowles@mastodon.social
2025-06-18 10:01:19

You know it's bad when the DeArrow title for a video straight up has "Video has lots of incorrect information" on the end.

@arXiv_physicsgenph_bot@mastoxiv.page
2025-08-05 08:58:00

A conformal basis for cosmology with energy conservation
J. M. Greben
arxiv.org/abs/2508.00948 arxiv.org/pdf/2508.00948

@arXiv_eessIV_bot@mastoxiv.page
2025-06-19 08:42:47

Classification of Multi-Parametric Body MRI Series Using Deep Learning
Boah Kim, Tejas Sudharshan Mathai, Kimberly Helm, Peter A. Pinto, Ronald M. Summers
arxiv.org/abs/2506.15182

@samir@functional.computer
2025-06-14 19:13:47

@… Incorrect, we only think we are because most of us haven’t spent time in Switzerland.

@arXiv_csCL_bot@mastoxiv.page
2025-07-31 09:54:01

Investigating Hallucination in Conversations for Low Resource Languages
Amit Das, Md. Najib Hasan, Souvika Sarkar, Zheng Zhang, Fatemeh Jamshidi, Tathagata Bhattacharya, Nilanjana Raychawdhury, Dongji Feng, Vinija Jain, Aman Chadha
arxiv.org/abs/2507.22720

@Dwemthy@social.linux.pizza
2025-05-14 19:14:51

I get the desire for live coding interviews, you can't just take people's word that they know how to do it.
But what's the point in throwing Advent of Code style problems at me and interrupting a naive or incorrect approach before I even start implementing it? Let me write unoptimized code for you and then make it better! I'm not going to write the perfect implementation first try for every problem, but I can show you my process and prove I can write _some_ code.

@arXiv_csSE_bot@mastoxiv.page
2025-07-25 09:37:22

YATE: The Role of Test Repair in LLM-Based Unit Test Generation
Michael Konstantinou, Renzo Degiovanni, Jie M. Zhang, Mark Harman, Mike Papadakis
arxiv.org/abs/2507.18316

@arXiv_csCV_bot@mastoxiv.page
2025-06-18 09:36:53

ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM
Yujun Wang, Jinhe Bi, Yunpu Ma, Soeren Pirk
arxiv.org/abs/2506.14766

@arXiv_physicsappph_bot@mastoxiv.page
2025-07-22 08:28:50

A rediscovery of stiff pentmodes. A comment on "High bulk modulus pentamodes: the three-dimensional metal water"
Graeme W. Milton
arxiv.org/abs/2507.15014

@arXiv_csAI_bot@mastoxiv.page
2025-06-24 11:33:30

Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?
Zhiting Mei, Christina Zhang, Tenny Yin, Justin Lidard, Ola Shorinwa, Anirudha Majumdar
arxiv.org/abs/2506.18183

@arXiv_csCL_bot@mastoxiv.page
2025-07-29 11:47:51

FRED: Financial Retrieval-Enhanced Detection and Editing of Hallucinations in Language Models
Likun Tan, Kuan-Wei Huang, Kevin Wu
arxiv.org/abs/2507.20930

@arXiv_csPL_bot@mastoxiv.page
2025-07-18 08:25:42

Towards Formal Verification of LLM-Generated Code from Natural Language Prompts
Aaron Councilman, David Fu, Aryan Gupta, Chengxiao Wang, David Grove, Yu-Xiong Wang, Vikram Adve
arxiv.org/abs/2507.13290

@arXiv_csSE_bot@mastoxiv.page
2025-06-24 11:15:40

Is Your Automated Software Engineer Trustworthy?
Noble Saji Mathews, Meiyappan Nagappan
arxiv.org/abs/2506.17812 arxi…

@arXiv_csRO_bot@mastoxiv.page
2025-06-18 08:50:20

Hard Contacts with Soft Gradients: Refining Differentiable Simulators for Learning and Control
Anselm Paulus, A. Ren\'e Geist, Pierre Schumacher, V\'it Musil, Georg Martius
arxiv.org/abs/2506.14186

@arXiv_csLG_bot@mastoxiv.page
2025-06-12 10:00:21

On a few pitfalls in KL divergence gradient estimation for RL
Yunhao Tang, R\'emi Munos
arxiv.org/abs/2506.09477

@arXiv_csCY_bot@mastoxiv.page
2025-06-18 08:12:18

Students' Reliance on AI in Higher Education: Identifying Contributing Factors
Griffin Pitts, Neha Rani, Weedguet Mildort, Eva-Marie Cook
arxiv.org/abs/2506.13845

@arXiv_csCR_bot@mastoxiv.page
2025-07-16 08:17:11

LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents
Zihe Yan, Zhuosheng Zhang
arxiv.org/abs/2507.10610

@arXiv_eessIV_bot@mastoxiv.page
2025-06-18 08:48:21

DREAM: On hallucinations in AI-generated content for nuclear medicine imaging
Menghua Xia, Reimund Bayerlein, Yanis Chemli, Xiaofeng Liu, Jinsong Ouyang, Georges El Fakhri, Ramsey D. Badawi, Quanzheng Li, Chi Liu
arxiv.org/abs/2506.13995

@arXiv_hepph_bot@mastoxiv.page
2025-07-22 11:07:10

Comments on Exploring Quantum Statistics for Dirac and Majorana Neutrinos using Spinor-Helicity technique (arXiv:2507.07180 [hep-ph])
C. S. Kim, M. V. N. Murthy, Dibyakrupa Sahoo
arxiv.org/abs/2507.14463

@arXiv_csCL_bot@mastoxiv.page
2025-07-29 07:43:51

Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning
Shengyuan Wang, Jie Feng, Tianhui Liu, Dan Pei, Yong Li
arxiv.org/abs/2507.19586

@arXiv_csDC_bot@mastoxiv.page
2025-06-12 07:26:41

TTrace: Lightweight Error Checking and Diagnosis for Distributed Training
Haitian Jiang, Shaowei Zhu, Zhen Zhang, Zhenyu Song, Xinwei Fu, Zhen Jia, Yida Wang, Jinyang Li
arxiv.org/abs/2506.09280

@arXiv_csIR_bot@mastoxiv.page
2025-07-15 09:10:12

Criteria-Based LLM Relevance Judgments
Naghmeh Farzi, Laura Dietz
arxiv.org/abs/2507.09488 arxiv.org/pdf/2507.09488…

@arXiv_csCV_bot@mastoxiv.page
2025-06-12 10:17:11

Text-Aware Image Restoration with Diffusion Models
Jaewon Min, Jin Hyeon Kim, Paul Hyunbin Cho, Jaeeun Lee, Jihye Park, Minkyu Park, Sangpil Kim, Hyunhee Park, Seungryong Kim
arxiv.org/abs/2506.09993

@arXiv_csCY_bot@mastoxiv.page
2025-06-18 08:13:00

The Ethics of Generative AI in Anonymous Spaces: A Case Study of 4chan's /pol/ Board
Parth Gaba, Emiliano De Cristofaro
arxiv.org/abs/2506.14191

@arXiv_csCL_bot@mastoxiv.page
2025-06-26 09:02:50

COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees
Zhiyuan Wang, Jinhao Duan, Qingni Wang, Xiaofeng Zhu, Tianlong Chen, Xiaoshuang Shi, Kaidi Xu
arxiv.org/abs/2506.20178

@arXiv_statAP_bot@mastoxiv.page
2025-07-16 08:16:51

Is Your Model Risk ALARP? Evaluating Prospective Safety-Critical Applications of Complex Models
Domenic Di Francesco, Alan Forrest, Fiona McGarry, Nicholas Hall, Adam Sobey
arxiv.org/abs/2507.10817

@arXiv_csSE_bot@mastoxiv.page
2025-06-19 08:36:38

An Empirical Study of Bugs in Data Visualization Libraries
Weiqi Lu, Yongqiang Tian, Xiaohan Zhong, Haoyang Ma, Zhenyang Xu, Shing-Chi Cheung, Chengnian Sun
arxiv.org/abs/2506.15084

@arXiv_csCL_bot@mastoxiv.page
2025-07-14 09:50:42

The Curious Case of Factuality Finetuning: Models' Internal Beliefs Can Improve Factuality
Benjamin Newman, Abhilasha Ravichander, Jaehun Jung, Rui Xin, Hamish Ivison, Yegor Kuznetsov, Pang Wei Koh, Yejin Choi
arxiv.org/abs/2507.08371

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:59:49

Potemkin Understanding in Large Language Models
Marina Mancoridis, Bec Weeks, Keyon Vafa, Sendhil Mullainathan
arxiv.org/abs/2506.21521 arxiv.org/pdf/2506.21521 arxiv.org/html/2506.21521
arXiv:2506.21521v1 Announce Type: new
Abstract: Large language models (LLMs) are regularly evaluated using benchmark datasets. But what justifies making inferences about an LLM's capabilities based on its answers to a curated set of questions? This paper first introduces a formal framework to address this question. The key is to note that the benchmarks used to test LLMs -- such as AP exams -- are also those used to test people. However, this raises an implication: these benchmarks are only valid tests if LLMs misunderstand concepts in ways that mirror human misunderstandings. Otherwise, success on benchmarks only demonstrates potemkin understanding: the illusion of understanding driven by answers irreconcilable with how any human would interpret a concept. We present two procedures for quantifying the existence of potemkins: one using a specially designed benchmark in three domains, the other using a general procedure that provides a lower-bound on their prevalence. We find that potemkins are ubiquitous across models, tasks, and domains. We also find that these failures reflect not just incorrect understanding, but deeper internal incoherence in concept representations.
toXiv_bot_toot