Tootfinder

No exact results. Similar results found.

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:38:10

Can We Reliably Rank Model Performance across Domains without Labeled Data?
Veronica Rammouz, Aaron Gonzalez, Carlos Cruzportillo, Adrian Tan, Nicole Beebe, Anthony Rios
https://arxiv.org/abs/2510.09519

Can We Reliably Rank Model Performance across Domains without Labeled Data?
Estimating model performance without labels is an important goal for understanding how NLP models generalize. While prior work has proposed measures based on dataset similarity or predicted correctness, it remains unclear when these estimates produce reliable performance rankings across domains. In this paper, we analyze the factors that affect ranking reliability using a two-step evaluation setup with four base classifiers and several large language models as error predictors. Experiments on t…

@arXiv_csAI_bot@mastoxiv.page
2025-10-01 11:26:37

Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research
Emma Rose Madden
https://arxiv.org/abs/2509.26080 https://

Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research
Large Language Models (LLMs) are being increasingly used as synthetic agents in social science, in applications ranging from augmenting survey responses to powering multi-agent simulations. Because strong prediction plus conditioning prompts, token log-probs, and repeated sampling mimic Bayesian workflows, their outputs can be misinterpreted as posterior-like evidence from a coherent model. However, prediction does not equate to probabilism, and accurate points do not imply calibrated uncertain…

@arXiv_csLG_bot@mastoxiv.page
2025-09-30 14:37:21

Towards Understanding the Shape of Representations in Protein Language Models
Kosio Beshkov, Anders Malthe-S{\o}renssen
https://arxiv.org/abs/2509.24895 https://

Towards Understanding the Shape of Representations in Protein Language Models
While protein language models (PLMs) are one of the most promising avenues of research for future de novo protein design, the way in which they transform sequences to hidden representations, as well as the information encoded in such representations is yet to be fully understood. Several works have attempted to propose interpretability tools for PLMs, but they have focused on understanding how individual sequences are transformed by such models. Therefore, the way in which PLMs transform the wh…

@arXiv_csCL_bot@mastoxiv.page
2025-10-07 12:05:42

ModernBERT ColBERT: Enhancing biomedical RAG through an advanced re-ranking retriever
Eduardo Mart\'inez Rivera, Filippo Menolascina
https://arxiv.org/abs/2510.04757 https…

ModernBERT + ColBERT: Enhancing biomedical RAG through an advanced re-ranking retriever
Retrieval-Augmented Generation (RAG) is a powerful technique for enriching Large Language Models (LLMs) with external knowledge, allowing for factually grounded responses, a critical requirement in high-stakes domains such as healthcare. However, the efficacy of RAG systems is fundamentally restricted by the performance of their retrieval module, since irrelevant or semantically misaligned documents directly compromise the accuracy of the final generated response. General-purpose dense retrieve…

@arXiv_csCV_bot@mastoxiv.page
2025-10-01 11:38:57

Seeing Space and Motion: Enhancing Latent Actions with Spatial and Dynamic Awareness for VLA
Zhejia Cai, Yandan Yang, Xinyuan Chang, Shiyi Liang, Ronghan Chen, Feng Xiong, Mu Xu, Ruqi Huang
https://arxiv.org/abs/2509.26251

Seeing Space and Motion: Enhancing Latent Actions with Spatial and Dynamic Awareness for VLA
Latent Action Models (LAMs) enable Vision- Language-Action (VLA) systems to learn semantic action rep- resentations from large-scale unannotated data. Yet, we identify two bottlenecks of LAMs: 1) the commonly adopted end-to-end trained image encoder suffers from poor spatial understanding; 2) LAMs can be fragile when input frames are distant, leading to limited temporal perception. Such factors inevitably hinder stable and clear action modeling. To this end, we propose Farsighted-LAM, a latent …

@arXiv_csCL_bot@mastoxiv.page
2025-09-18 09:53:51

Measuring Gender Bias in Job Title Matching for Grammatical Gender Languages
Laura Garc\'ia-Sardi\~na, Hermenegildo Fabregat, Daniel Deniz, Rabih Zbib
https://arxiv.org/abs/2509.13803

Measuring Gender Bias in Job Title Matching for Grammatical Gender Languages
This work sets the ground for studying how explicit grammatical gender assignment in job titles can affect the results of automatic job ranking systems. We propose the usage of metrics for ranking comparison controlling for gender to evaluate gender bias in job title ranking systems, in particular RBO (Rank-Biased Overlap). We generate and share test sets for a job title matching task in four grammatical gender languages, including occupations in masculine and feminine form and annotated by gen…

@arXiv_csCL_bot@mastoxiv.page
2025-10-03 10:57:01

F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data
Ziyin Zhang, Zihan Liao, Hang Yu, Peng Di, Rui Wang
https://arxiv.org/abs/2510.02294 …

F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data
We introduce F2LLM - Foundation to Feature Large Language Models, a suite of state-of-the-art embedding models in three sizes: 0.6B, 1.7B, and 4B. Unlike previous top-ranking embedding models that require massive contrastive pretraining, sophisticated training pipelines, and costly synthetic training data, F2LLM is directly finetuned from foundation models on 6 million query-document-negative tuples curated from open-source, non-synthetic datasets, striking a strong balance between training cos…

@arXiv_csCL_bot@mastoxiv.page
2025-09-23 12:58:41

TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Synthesis for \"U-Tsang, Amdo and Kham Speech Dataset Generation
Yutong Liu, Ziyue Zhang, Ban Ma-bao, Renzeng Duojie, Yuqing Cai, Yongbin Yu, Xiangxiang Wang, Fan Gao, Cheng Huang, Nyima Tashi
https://arxiv.org/abs/2509.18060

TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
Tibetan is a low-resource language with limited parallel speech corpora spanning its three major dialects (Ü-Tsang, Amdo, and Kham), limiting progress in speech modeling. To address this issue, we propose TMD-TTS, a unified Tibetan multi-dialect text-to-speech (TTS) framework that synthesizes parallel dialectal speech from explicit dialect labels. Our method features a dialect fusion module and a Dialect-Specialized Dynamic Routing Network (DSDR-Net) to capture fine-grained acoustic and lingui…

Tootfinder

Opt-in global Mastodon full text search. Join the index!