Tootfinder

No exact results. Similar results found.

@tiotasram@kolektiva.social
2025-05-26 12:51:54

Let's say you find a really cool forum online that has lots of good advice on it. It's even got a very active community that's happy to answer questions very quickly, and the community seems to have a wealth of knowledge about all sorts of subjects.
You end up visiting this community often, and trusting the advice you get to answer all sorts of everyday questions you might have, which before you might have found answers to using a web search (of course web search is now full of SEI spam and other crap so it's become nearly useless).
Then one day, you ask an innocuous question about medicine, and from this community you get the full homeopathy treatment as your answer. Like, somewhat believable on the face of it, includes lots of citations to reasonable-seeming articles, except that if you know even a tiny bit about chemistry and biology (which thankfully you do), you know that the homoeopathy answers are completely bogus and horribly dangerous (since they offer non-treatments for real diseases). Your opinion of this entire forum suddenly changes. "Oh my God, if they've been homeopathy believers all this time, what other myths have they fed me as facts?"
You stop using the forum for anything, and go back to slogging through SEI crap to answer your everyday questions, because one you realize that this forum is a community that's fundamentally untrustworthy, you realize that the value of getting advice from it on any subject is negative: you knew enough to spot the dangerous homeopathy answer, but you know there might be other such myths that you don't know enough to avoid, and any community willing to go all-in on one myth has shown itself to be capable of going all in on any number of other myths.
...
This has been a parable about large language models.
#AI #LLM

@arXiv_csCV_bot@mastoxiv.page
2025-06-30 10:16:50

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Xi Chen, Mingkang Zhu, Shaoteng Liu, Xiaoyang Wu, Xiaogang Xu, Yu Liu, Xiang Bai, Hengshuang Zhao
https://arxiv.org/abs/2506.22434

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
This work explores enabling Chain-of-Thought (CoT) reasoning to link visual cues across multiple images. A straightforward solution is to adapt rule-based reinforcement learning for Vision-Language Models (VLMs). However, such methods typically rely on manually curated question-answer pairs, which can be particularly challenging when dealing with fine grained visual details and complex logic across images. Inspired by self-supervised visual representation learning, we observe that images contai…

@arXiv_csLO_bot@mastoxiv.page
2025-06-30 07:46:30

Negated String Containment is Decidable (Technical Report)
Vojt\v{e}ch Havlena, Michal He\v{c}ko, Luk\'a\v{s} Hol\'ik, Ond\v{r}ej Leng\'al
https://arxiv.org/abs/2506.22061

Negated String Containment is Decidable (Technical Report)
We provide a positive answer to a long-standing open question of the decidability of the not-contains string predicate. Not-contains is practically relevant, for instance in symbolic execution of string manipulating programs. Particularly, we show that the predicate notContains(x1 ... xn, y1 ... ym), where x1 ... xn and y1 ... ym are sequences of string variables constrained by regular languages, is decidable. Decidability of a not-contains predicate combined with chain-free word equations and …

@Techmeme@techhub.social
2025-06-25 06:10:49

Daydream, which raised a $50M seed in June 2024 to build a generative AI shopping agent for fashion, launches in beta, with an app expected later this summer (Hilary Milnes/Vogue Business)
https://www.voguebusiness.com/story/techno

Is Daydream’s AI platform the answer to fashion’s discovery problem?
Founder Julie Bornstein raised $50 million last year to create the ultimate AI shopping experience. Today, it launches in beta.

@arXiv_csAI_bot@mastoxiv.page
2025-06-24 10:50:20

Action Language BC
Joseph Babb, Joohyung Lee
https://arxiv.org/abs/2506.18044 https://arxiv.org/pdf/2506.18044

Action Language BC+
Action languages are formal models of parts of natural language that are designed to describe effects of actions. Many of these languages can be viewed as high level notations of answer set programs structured to represent transition systems. However, the form of answer set programs considered in the earlier work is quite limited in comparison with the modern Answer Set Programming (ASP) language, which allows several useful constructs for knowledge representation, such as choice rules, aggrega…

@ginevra@hachyderm.io
2025-06-20 00:35:29

Language learning has been part of me since high school. I'm solid in 2 non-English languages, crappy but survivable in 2 others. I've played with & started learning others many times.
I'm real busy rn, but language learning could be a fun thing to do for myself & make me feel like I'm still me.
But I'm stumped about my language picks. I learnt the obvious European languages in school; later tried key Asian languages. What do I want to do now?
African languages? I won't be getting a chance to use them much in Aus, & I'm unlikely to get to a stage where I can read literature.
I tried Slovenian/Slovene on a whim & really love it, but I'll never go there. Is the practical but unfun answer grind out more kanji/hanzi? Or is whimsically learning a language spoken by only 2.5 million people reasonable? I will continue struggling through with Ukrainian, 'cause I think it's important.
#LanguageLearning

@arXiv_eessAS_bot@mastoxiv.page
2025-05-30 07:22:43

Spoken question answering for visual queries
Nimrod Shabtay, Zvi Kons, Avihu Dekel, Hagai Aronowitz, Ron Hoory, Assaf Arbelle
https://arxiv.org/abs/2505.23308

Spoken question answering for visual queries
Question answering (QA) systems are designed to answer natural language questions. Visual QA (VQA) and Spoken QA (SQA) systems extend the textual QA system to accept visual and spoken input respectively. This work aims to create a system that enables user interaction through both speech and images. That is achieved through the fusion of text, speech, and image modalities to tackle the task of spoken VQA (SVQA). The resulting multi-modal model has textual, visual, and spoken inputs and can ans…

@arXiv_csRO_bot@mastoxiv.page
2025-06-26 09:45:30

HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction
Zhonghao Shi, Enyu Zhao, Nathaniel Dennler, Jingzhen Wang, Xinyang Xu, Kaleen Shrestha, Mengxue Fu, Daniel Seita, Maja Matari\'c
https://arxiv.org/abs/2506.20566

HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction
Real-time human perception is crucial for effective human-robot interaction (HRI). Large vision-language models (VLMs) offer promising generalizable perceptual capabilities but often suffer from high latency, which negatively impacts user experience and limits VLM applicability in real-world scenarios. To systematically study VLM capabilities in human perception for HRI and performance-latency trade-offs, we introduce HRIBench, a visual question-answering (VQA) benchmark designed to evaluate VL…

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:59:49

Potemkin Understanding in Large Language Models
Marina Mancoridis, Bec Weeks, Keyon Vafa, Sendhil Mullainathan
https://arxiv.org/abs/2506.21521 https://arxiv.org/pdf/2506.21521 https://arxiv.org/html/2506.21521
arXiv:2506.21521v1 Announce Type: new
Abstract: Large language models (LLMs) are regularly evaluated using benchmark datasets. But what justifies making inferences about an LLM's capabilities based on its answers to a curated set of questions? This paper first introduces a formal framework to address this question. The key is to note that the benchmarks used to test LLMs -- such as AP exams -- are also those used to test people. However, this raises an implication: these benchmarks are only valid tests if LLMs misunderstand concepts in ways that mirror human misunderstandings. Otherwise, success on benchmarks only demonstrates potemkin understanding: the illusion of understanding driven by answers irreconcilable with how any human would interpret a concept. We present two procedures for quantifying the existence of potemkins: one using a specially designed benchmark in three domains, the other using a general procedure that provides a lower-bound on their prevalence. We find that potemkins are ubiquitous across models, tasks, and domains. We also find that these failures reflect not just incorrect understanding, but deeper internal incoherence in concept representations.
toXiv_bot_toot

@arXiv_csCL_bot@mastoxiv.page
2025-06-26 07:33:20

Inference Scaled GraphRAG: Improving Multi Hop Question Answering on Knowledge Graphs
Travis Thompson, Seung-Hwan Lim, Paul Liu, Ruoying He, Dongkuan Xu
https://arxiv.org/abs/2506.19967

Inference Scaled GraphRAG: Improving Multi Hop Question Answering on Knowledge Graphs
Large Language Models (LLMs) have achieved impressive capabilities in language understanding and generation, yet they continue to underperform on knowledge-intensive reasoning tasks due to limited access to structured context and multi-hop information. Retrieval-Augmented Generation (RAG) partially mitigates this by grounding generation in retrieved context, but conventional RAG and GraphRAG methods often fail to capture relational structure across nodes in knowledge graphs. We introduce Infere…

Tootfinder

Opt-in global Mastodon full text search. Join the index!