
2025-08-07 12:47:55
"What works in #India will scale better everywhere else. Naturally, the country is a battleground for #AI search."
This is fishing for data and the next #enshittification to roll ou…
"What works in #India will scale better everywhere else. Naturally, the country is a battleground for #AI search."
This is fishing for data and the next #enshittification to roll ou…
So I've found my answer after maybe ~30 minutes of effort. First stop was the first search result on Startpage (https://millennialhawk.com/does-poop-have-calories/), which has some evidence of maybe-AI authorship but which is better than a lot of slop. It actually has real links & cites research, so I'll start by looking at the sources.
It claims near the top that poop contains 4.91 kcal per gram (note: 1 kcal = 1 Calorie = 1000 calories, which fact I could find/do trust despite the slop in that search). Now obviously, without a range or mention of an average, this isn't the whole picture, but maybe it's an average to start from? However, the citation link is to a study (https://pubmed.ncbi.nlm.nih.gov/32235930/) which only included 27 people with impaired glucose tolerance and obesity. Might have the cited stat, but it's definitely not a broadly representative one if this is the source. The public abstract does not include the stat cited, and I don't want to pay for the article. I happen to be affiliated with a university library, so I could see if I have access that way, but it's a pain to do and not worth it for this study that I know is too specific. Also most people wouldn't have access that way.
Side note: this doing-the-research protect has the nice benefit of letting you see lots of cool stuff you wouldn't have otherwise. The abstract of this study is pretty cool and I learned a bit about gut microbiome changes from just reading the abstract.
My next move was to look among citations in this article to see if I could find something about calorie content of poop specifically. Luckily the article page had indicators for which citations were free to access. I ended up reading/skimming 2 more articles (a few more interesting facts about gut microbiomes were learned) before finding this article whose introduction has what I'm looking for: https://pmc.ncbi.nlm.nih.gov/articles/PMC3127503/
Here's the relevant paragraph:
"""
The alteration of the energy-balance equation, which is defined by the equilibrium of energy intake and energy expenditure (1–5), leads to weight gain. One less-extensively-studied component of the energy-balance equation is energy loss in stools and urine. Previous studies of healthy adults showed that ≈5% of ingested calories were lost in stools and urine (6). Individuals who consume high-fiber diets exhibit a higher fecal energy loss than individuals who consume low-fiber diets with an equivalent energy content (7, 8). Webb and Annis (9) studied stool energy loss in 4 lean and 4 obese individuals and showed a tendency to lower the fecal energy excretion in obese compared with lean study participants.
"""
And there's a good-enough answer if we do some math, along with links to more in-depth reading if we want them. A Mayo clinic calorie calculator suggests about 2250 Calories per day for me to maintain my weight, I think there's probably a lot of variation in that number, but 5% of that would be very roughly 100 Calories lost in poop per day, so maybe an extremely rough estimate for a range of humans might be 50-200 Calories per day. Interestingly, one of the AI slop pages I found asserted (without citation) 100-200 Calories per day, which kinda checks out. I had no way to trust that number though, and as we saw with the provenance of the 4.91 kcal/gram, it might not be good provenance.
To double-check, I visited this link from the paragraph above: https://www.sciencedirect.com/science/article/abs/pii/S0022316622169853?via=ihub
It's only a 6-person study, but just the abstract has numbers: ~250 kcal/day pooped on a low-fiber diet vs. ~400 kcal/day pooped on a high-fiber diet. That's with intakes of ~2100 and ~2350 kcal respectively, which is close to the number from which I estimated 100 kcal above, so maybe the first estimate from just the 5% number was a bit low.
Glad those numbers were in the abstract, since the full text is paywalled... It's possible this study was also done on some atypical patient group...
Just to come full circle, let's look at that 4.91 kcal/gram number again. A search suggests 14-16 ounces of poop per day is typical, with at least two sources around 14 ounces, or ~400 grams. (AI slop was strong here too, with one including a completely made up table of "studies" that was summarized as 100-200 grams/day). If we believe 400 grams/day of poop, then 4.91 kcal/gram would be almost 2000 kcal/day, which is very clearly ludicrous! So that number was likely some unrelated statistic regurgitated by the AI. I found that number in at least 3 of the slop pages I waded through in my initial search.
Search for the Efimov state near the 3$\alpha$ threshold
A. Baishya, S. Santra, T. Singh, P. C. Rout, A. Pal, H. Kumawat, T. Santhosh, P. Taya, M. Meher, Jyotisankar Das
https://arxiv.org/abs/2507.03514
Fediscovery explores the possibilities for better search and discovery on the Fediverse -- in the form of an optional, pluggable service.
This service should be decentralized, independent of any one specific Fediverse service and respect user choice and privacy.
Fediscovery @ FOSDEM 2025
We were very grateful to be able to present the Fediscovery project atFOSDEM 2025. The talk was recorded and you can download both the slides and the recorded video
Segment First, Retrieve Better: Realistic Legal Search via Rhetorical Role-Based Queries
Shubham Kumar Nigam, Tanmay Dubey, Noel Shallum, Arnab Bhattacharya
https://arxiv.org/abs/2508.00679
Not only Gmail has been infested with bad AI, but also other Google products. Chrome has a "help me write" feature. You can [disable it in the AI innovations settings](chrome://settings/ai). Unfortunately the intrusive popup is still showing up in Gmail and I can't find any way to turn it off. (See image)
There's also a Chrome setting that allows better browser history search. I've wanted that for years so I've turned that one on for now. Maybe it's better than all the other Gemini crap?
I couldn't come up with a better way than this to minimize a Chrome app on autostart (KDE, Wayland, clean session)
(avoids setting a KDE window rule, which makes awkward opening the app in session)
google-chrome [arguments] & sleep 2; kdotool search --class [window class] windowminimize &
Series C, Episode 04 - Dawn of the Gods
GROFF: A finger?
TARRANT: A finger. And as you can see, it is better designed for pressing buttons than holding writing implements. So why can't we use computers?
https://blake.torpidity.net/m/304/299 B7B2
On a better complexity upper bound of Ward-Szabo theorem
Takashi Ishizuka
https://arxiv.org/abs/2507.23345 https://arxiv.org/pdf/2507.23345
Should we teach vibe coding? Here's why not.
Should AI coding be taught in undergrad CS education?
1/2
I teach undergraduate computer science labs, including for intro and more-advanced core courses. I don't publish (non-negligible) scholarly work in the area, but I've got years of craft expertise in course design, and I do follow the academic literature to some degree. In other words, In not the world's leading expert, but I have spent a lot of time thinking about course design, and consider myself competent at it, with plenty of direct experience in what knowledge & skills I can expect from students as they move through the curriculum.
I'm also strongly against most uses of what's called "AI" these days (specifically, generative deep neutral networks as supplied by our current cadre of techbro). There are a surprising number of completely orthogonal reasons to oppose the use of these systems, and a very limited number of reasonable exceptions (overcoming accessibility barriers is an example). On the grounds of environmental and digital-commons-pollution costs alone, using specifically the largest/newest models is unethical in most cases.
But as any good teacher should, I constantly question these evaluations, because I worry about the impact on my students should I eschew teaching relevant tech for bad reasons (and even for his reasons). I also want to make my reasoning clear to students, who should absolutely question me on this. That inspired me to ask a simple question: ignoring for one moment the ethical objections (which we shouldn't, of course; they're very stark), at what level in the CS major could I expect to teach a course about programming with AI assistance, and expect students to succeed at a more technically demanding final project than a course at the same level where students were banned from using AI? In other words, at what level would I expect students to actually benefit from AI coding "assistance?"
To be clear, I'm assuming that students aren't using AI in other aspects of coursework: the topic of using AI to "help you study" is a separate one (TL;DR it's gross value is not negative, but it's mostly not worth the harm to your metacognitive abilities, which AI-induced changes to the digital commons are making more important than ever).
So what's my answer to this question?
If I'm being incredibly optimistic, senior year. Slightly less optimistic, second year of a masters program. Realistic? Maybe never.
The interesting bit for you-the-reader is: why is this my answer? (Especially given that students would probably self-report significant gains at lower levels.) To start with, [this paper where experienced developers thought that AI assistance sped up their work on real tasks when in fact it slowed it down] (https://arxiv.org/abs/2507.09089) is informative. There are a lot of differences in task between experienced devs solving real bugs and students working on a class project, but it's important to understand that we shouldn't have a baseline expectation that AI coding "assistants" will speed things up in the best of circumstances, and we shouldn't trust self-reports of productivity (or the AI hype machine in general).
Now we might imagine that coding assistants will be better at helping with a student project than at helping with fixing bugs in open-source software, since it's a much easier task. For many programming assignments that have a fixed answer, we know that many AI assistants can just spit out a solution based on prompting them with the problem description (there's another elephant in the room here to do with learning outcomes regardless of project success, but we'll ignore this over too, my focus here is on project complexity reach, not learning outcomes). My question is about more open-ended projects, not assignments with an expected answer. Here's a second study (by one of my colleagues) about novices using AI assistance for programming tasks. It showcases how difficult it is to use AI tools well, and some of these stumbling blocks that novices in particular face.
But what about intermediate students? Might there be some level where the AI is helpful because the task is still relatively simple and the students are good enough to handle it? The problem with this is that as task complexity increases, so does the likelihood of the AI generating (or copying) code that uses more complex constructs which a student doesn't understand. Let's say I have second year students writing interactive websites with JavaScript. Without a lot of care that those students don't know how to deploy, the AI is likely to suggest code that depends on several different frameworks, from React to JQuery, without actually setting up or including those frameworks, and of course three students would be way out of their depth trying to do that. This is a general problem: each programming class carefully limits the specific code frameworks and constructs it expects students to know based on the material it covers. There is no feasible way to limit an AI assistant to a fixed set of constructs or frameworks, using current designs. There are alternate designs where this would be possible (like AI search through adaptation from a controlled library of snippets) but those would be entirely different tools.
So what happens on a sizeable class project where the AI has dropped in buggy code, especially if it uses code constructs the students don't understand? Best case, they understand that they don't understand and re-prompt, or ask for help from an instructor or TA quickly who helps them get rid of the stuff they don't understand and re-prompt or manually add stuff they do. Average case: they waste several hours and/or sweep the bugs partly under the rug, resulting in a project with significant defects. Students in their second and even third years of a CS major still have a lot to learn about debugging, and usually have significant gaps in their knowledge of even their most comfortable programming language. I do think regardless of AI we as teachers need to get better at teaching debugging skills, but the knowledge gaps are inevitable because there's just too much to know. In Python, for example, the LLM is going to spit out yields, async functions, try/finally, maybe even something like a while/else, or with recent training data, the walrus operator. I can't expect even a fraction of 3rd year students who have worked with Python since their first year to know about all these things, and based on how students approach projects where they have studied all the relevant constructs but have forgotten some, I'm not optimistic seeing these things will magically become learning opportunities. Student projects are better off working with a limited subset of full programming languages that the students have actually learned, and using AI coding assistants as currently designed makes this impossible. Beyond that, even when the "assistant" just introduces bugs using syntax the students understand, even through their 4th year many students struggle to understand the operation of moderately complex code they've written themselves, let alone written by someone else. Having access to an AI that will confidently offer incorrect explanations for bugs will make this worse.
To be sure a small minority of students will be able to overcome these problems, but that minority is the group that has a good grasp of the fundamentals and has broadened their knowledge through self-study, which earlier AI-reliant classes would make less likely to happen. In any case, I care about the average student, since we already have plenty of stuff about our institutions that makes life easier for a favored few while being worse for the average student (note that our construction of that favored few as the "good" students is a large part of this problem).
To summarize: because AI assistants introduce excess code complexity and difficult-to-debug bugs, they'll slow down rather than speed up project progress for the average student on moderately complex projects. On a fixed deadline, they'll result in worse projects, or necessitate less ambitious project scoping to ensure adequate completion, and I expect this remains broadly true through 4-6 years of study in most programs (don't take this as an endorsement of AI "assistants" for masters students; we've ignored a lot of other problems along the way).
There's a related problem: solving open-ended project assignments well ultimately depends on deeply understanding the problem, and AI "assistants" allow students to put a lot of code in their file without spending much time thinking about the problem or building an understanding of it. This is awful for learning outcomes, but also bad for project success. Getting students to see the value of thinking deeply about a problem is a thorny pedagogical puzzle at the best of times, and allowing the use of AI "assistants" makes the problem much much worse. This is another area I hope to see (or even drive) pedagogical improvement in, for what it's worth.
1/2
More Expert-like Eye Gaze Movement Patterns are Related to Better X-ray Reading
Pingjing Yang, Jennifer Cromley, Jana Diesner
https://arxiv.org/abs/2507.18637 https://
GREAT: Guiding Query Generation with a Trie for Recommending Related Search about Video at Kuaishou
Ninglu Shao, Jinshan Wang, Chenxu Wang, Qingbiao Li, Xiaoxue Zang, Han Li
https://arxiv.org/abs/2507.15267
Discovering the underlying analytic structure within Standard Model constants using artificial intelligence
S. V. Chekanov, H. Kjellerstrand
https://arxiv.org/abs/2507.00225
Abandoned by Trump, a farmer and a migrant search for a better future (Washington Post)
https://www.washingtonpost.com/investigations/interactive/2025/trump-farmers-grants-freeze-usda-migrants
http://www.memeorandum.com/250621/p27#a250621p27
This is an excellent article that touches on many things that are happening right now in rural America.
Well reported and well written.
It's long but you won't be able to stop reading.
https://wapo.st/409sf4c
(Gift link)
I tried DuckDuckDo again these days for no reason in special. To my surprise, the search experienced there is much better than Google's. Not only there is less crap in the page, the results seemed even more relevant. It feels as good as old Google.
DuckDuckGo is now the default search engine in my personal browsers.
We learned how to search instead of memorizing everything. And it’s the same with AI today - there’s no need to know everything by heart when AI can handle that for us.
AI is here to stay, and even if it evolves or gets replaced by something better, the transformation never stops. It’s perfectly fine to not have all the answers in your head because we can focus on more important things.
Something appears to have changed at ChatGPT to make deep research better. I ran it yesterday and one search took three hours! It also prompted me with a series of multiple questions first, not just the usual four.
Join us for our Get-Together right after today's last session at 18:00 CEST!
Head over to the partner area, where you can unwind with tasty snacks and refreshing drinks while enjoying great live music, all sponsored by Search Guard. What better way to end the first day of the conference than by catching up with old friends and making new connections?
#bbuzz
AutoScale: Linear Scalarization Guided by Multi-Task Optimization Metrics
Yi Yang, Kei Ikemura, Qingwen Zhang, Xiaomeng Zhu, Ci Li, Nazre Batool, Sina Sharif Mansouri, John Folkesson
https://arxiv.org/abs/2508.13979
Hybrid Classical-Quantum Rainbow Table Attack on Human Passwords
MA. Khajeian
https://arxiv.org/abs/2507.14600 https://arxiv.org/pdf/…
Accounting for shelf width in selecting altimetry observations for coastal sea level variability improves its agreement with tide gauges
Vandana Sukumaran, Bramha Dutt Vishwakarma
https://arxiv.org/abs/2508.20046
#Blakes7 Series B, Episode 03 - Weapon
COSER: I should have known better. A labor-grade slave. You're pathetic.
[Reception chamber]
SERVALAN: Travis, you are pathetic.
TRAVIS: If you say so.
SERVALAN: Of all the cripple-brained idiots.
Boosting Rust Unit Test Coverage through Hybrid Program Analysis and Large Language Models
Bei Chu, Yang Feng, Kui Liu, Hange Shi, Zifan Nan, Zhaoqiang Guo, Baowen Xu
https://arxiv.org/abs/2506.09002
Series A, Episode 08 - Duel
BLAKE: Seen any sign of Travis?
JENNA: Have you?
BLAKE: No. We'd better make ourselves some weapons.
[LATER - Blake is sharpening two stakes into spears with his machete]
https://blake.torpidity.net/m/108/321 B7B5
Energy as a Primitive Ontology for the Physical World
J. E. Horvath, B. B. Martins
https://arxiv.org/abs/2506.12692 https://arxiv.org…
Tree-Structured Parzen Estimator Can Solve Black-Box Combinatorial Optimization More Efficiently
Kenshin Abe, Yunzhuo Wang, Shuhei Watanabe
https://arxiv.org/abs/2507.08053 https://arxiv.org/pdf/2507.08053 https://arxiv.org/html/2507.08053
arXiv:2507.08053v1 Announce Type: new
Abstract: Tree-structured Parzen estimator (TPE) is a versatile hyperparameter optimization (HPO) method supported by popular HPO tools. Since these HPO tools have been developed in line with the trend of deep learning (DL), the problem setups often used in the DL domain have been discussed for TPE such as multi-objective optimization and multi-fidelity optimization. However, the practical applications of HPO are not limited to DL, and black-box combinatorial optimization is actively utilized in some domains, e.g., chemistry and biology. As combinatorial optimization has been an untouched, yet very important, topic in TPE, we propose an efficient combinatorial optimization algorithm for TPE. In this paper, we first generalize the categorical kernel with the numerical kernel in TPE, enabling us to introduce a distance structure to the categorical kernel. Then we discuss modifications for the newly developed kernel to handle a large combinatorial search space. These modifications reduce the time complexity of the kernel calculation with respect to the size of a combinatorial search space. In the experiments using synthetic problems, we verified that our proposed method identifies better solutions with fewer evaluations than the original TPE. Our algorithm is available in Optuna, an open-source framework for HPO.
toXiv_bot_toot
Unmasking real-world audio deepfakes: A data-centric approach
David Combei, Adriana Stan, Dan Oneata, Nicolas M\"uller, Horia Cucu
https://arxiv.org/abs/2506.09606
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
Ziyue Li, Yang Li, Tianyi Zhou
https://arxiv.org/abs/2507.07996 https://arxiv.org/pdf/2507.07996 https://arxiv.org/html/2507.07996
arXiv:2507.07996v1 Announce Type: new
Abstract: Can a pretrained neural network adapt its architecture to different inputs without any finetuning? Do we need all layers for simple tasks, and are they adequate for challenging tasks? We found that the layers of a pretrained large language model (LLM) can be manipulated as separate modules to build a better and even shallower model customized for each test sample. In particular, each layer from the pretrained model can be skipped/pruned or repeated multiple times as recurrent neural networks (RNN), and stacked with others in arbitrary orders, yielding a chain-of-layers (CoLa) per sample. This compositional space greatly expands the scope of existing works on looped/recurrent pretrained modules, layer pruning, or early-exit networks. We develop a Monte Carlo Tree Search (MCTS) protocol to explore and identify the optimal CoLa for each sample from math and commonsense reasoning benchmarks. Compared to a static model of a fixed depth, CoLa allows shortcut paths (fast thinking), recurrence of the same layer(s) (slow thinking), and combining both, offering more flexible, dynamic architectures for different inputs. We conduct an extensive analysis of the MCTS-optimized CoLa, which leads to two key findings: (1) For >75% of samples with correct predictions by the original LLM, we can find shorter CoLa, suggesting a large space for improving inference efficiency; (2) For >60% of samples with originally incorrect predictions, we can identify CoLa achieving correct predictions, suggesting a large space of performance enhancement. Our results highlight the shortcomings of using a fixed architecture of pre-trained LLMs for inference on different samples and pave the way to unlock the generalization power of test-time depth adaptation.
toXiv_bot_toot