Tootfinder

No exact results. Similar results found.

@arXiv_csCL_bot@mastoxiv.page
2025-09-15 09:54:21

Scaling Arabic Medical Chatbots Using Synthetic Data: Enhancing Generative AI with Synthetic Patient Records
Abdulrahman Allam, Seif Ahmed, Ali Hamdi, Khaled Shaban
https://arxiv.org/abs/2509.10108

Scaling Arabic Medical Chatbots Using Synthetic Data: Enhancing Generative AI with Synthetic Patient Records
The development of medical chatbots in Arabic is significantly constrained by the scarcity of large-scale, high-quality annotated datasets. While prior efforts compiled a dataset of 20,000 Arabic patient-doctor interactions from social media to fine-tune large language models (LLMs), model scalability and generalization remained limited. In this study, we propose a scalable synthetic data augmentation strategy to expand the training corpus to 100,000 records. Using advanced generative AI system…

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:31:50

ReTraceQA: Evaluating Reasoning Traces of Small Language Models in Commonsense Question Answering
Francesco Maria Molfese, Luca Moroni, Ciro Porcaro, Simone Conia, Roberto Navigli
https://arxiv.org/abs/2510.09351

ReTraceQA: Evaluating Reasoning Traces of Small Language Models in Commonsense Question Answering
While Small Language Models (SLMs) have demonstrated promising performance on an increasingly wide array of commonsense reasoning benchmarks, current evaluation practices rely almost exclusively on the accuracy of their final answers, neglecting the validity of the reasoning processes that lead to those answers. To address this issue, we introduce ReTraceQA, a novel benchmark that introduces process-level evaluation for commonsense reasoning tasks. Our expert-annotated dataset reveals that in a…

@arXiv_csCV_bot@mastoxiv.page
2025-10-01 11:50:47

Autoproof: Automated Segmentation Proofreading for Connectomics
Gary B Huang, William M Katz, Stuart Berg, Louis Scheffer
https://arxiv.org/abs/2509.26585 https://

Autoproof: Automated Segmentation Proofreading for Connectomics
Producing connectomes from electron microscopy (EM) images has historically required a great deal of human proofreading effort. This manual annotation cost is the current bottleneck in scaling EM connectomics, for example, in making larger connectome reconstructions feasible, or in enabling comparative connectomics where multiple related reconstructions are produced. In this work, we propose using the available ground-truth data generated by this manual annotation effort to learn a machine lear…

@arXiv_csAI_bot@mastoxiv.page
2025-09-18 09:50:51

CrowdAgent: Multi-Agent Managed Multi-Source Annotation System
Maosheng Qin, Renyu Zhu, Mingxuan Xia, Chenkai Chen, Zhen Zhu, Minmin Lin, Junbo Zhao, Lu Xu, Changjie Fan, Runze Wu, Haobo Wang
https://arxiv.org/abs/2509.14030

CrowdAgent: Multi-Agent Managed Multi-Source Annotation System
High-quality annotated data is a cornerstone of modern Natural Language Processing (NLP). While recent methods begin to leverage diverse annotation sources-including Large Language Models (LLMs), Small Language Models (SLMs), and human experts-they often focus narrowly on the labeling step itself. A critical gap remains in the holistic process control required to manage these sources dynamically, addressing complex scheduling and quality-cost trade-offs in a unified manner. Inspired by real-wor…

@arXiv_csCL_bot@mastoxiv.page
2025-10-15 10:40:51

BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)
Tomas Ruiz, Siyao Peng, Barbara Plank, Carsten Schwemmer
https://arxiv.org/abs/2510.12516

BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)
Test-time scaling is a family of techniques to improve LLM outputs at inference time by performing extra computation. To the best of our knowledge, test-time scaling has been limited to domains with verifiably correct answers, like mathematics and coding. We transfer test-time scaling to the LeWiDi-2025 tasks to evaluate annotation disagreements. We experiment with three test-time scaling methods: two benchmark algorithms (Model Averaging and Majority Voting), and a Best-of-N sampling method. T…

@arXiv_csCL_bot@mastoxiv.page
2025-09-19 07:45:11

Advancing Conversational AI with Shona Slang: A Dataset and Hybrid Model for Digital Inclusion
Happymore Masoka
https://arxiv.org/abs/2509.14249 https://ar…

Advancing Conversational AI with Shona Slang: A Dataset and Hybrid Model for Digital Inclusion
African languages remain underrepresented in natural language processing (NLP), with most corpora limited to formal registers that fail to capture the vibrancy of everyday communication. This work addresses this gap for Shona, a Bantu language spoken in Zimbabwe and Zambia, by introducing a novel Shona--English slang dataset curated from anonymized social media conversations. The dataset is annotated for intent, sentiment, dialogue acts, code-mixing, and tone, and is publicly available at https…

Tootfinder

Opt-in global Mastodon full text search. Join the index!