
2025-06-30 10:21:50
Refining Czech GEC: Insights from a Multi-Experiment Approach
Petr Pechman, Milan Straka, Jana Strakov\'a, Jakub N\'aplava
https://arxiv.org/abs/2506.22402
Refining Czech GEC: Insights from a Multi-Experiment Approach
Petr Pechman, Milan Straka, Jana Strakov\'a, Jakub N\'aplava
https://arxiv.org/abs/2506.22402
Ich glaube, @… würde staunen, wenn er hört, unter welchen Bedingungen man z.B. an der @… mit nennenswert großen Corpora von geschützten Werken arbeiten darf: an einem Terminal ohne Internet und ohne Möglich…
EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora
Fangyuan Zhang, Zhengjun Huang, Yingli Zhou, Qintian Guo, Zhixun Li, Wensheng Luo, Di Jiang, Yixiang Fang, Xiaofang Zhou
https://arxiv.org/abs/2506.20963
Universal Music Representations? Evaluating Foundation Models on World Music Corpora
Charilaos Papaioannou, Emmanouil Benetos, Alexandros Potamianos
https://arxiv.org/abs/2506.17055
CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation
Deepon Halder, Thanmay Jayakumar, Raj Dabre
https://arxiv.org/abs/2506.19952
Replaced article(s) found for physics.soc-ph. https://arxiv.org/list/physics.soc-ph/new
[1/1]:
- Entropy and type-token ratio in gigaword corpora
Pablo Rosillo-Rodes, Maxi San Miguel, David Sanchez
Identifying economic narratives in large text corpora -- An integrated approach using Large Language Models
Tobias Schmidt, Kai-Robin Lange, Matthias Reccius, Henrik M\"uller, Michael Roos, Carsten Jentsch
https://arxiv.org/abs/2506.15041
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
Yilin Zhang, Xinran Zhao, Zora Zhiruo Wang, Chenyang Yang, Jiayi Wei, Tongshuang Wu
https://arxiv.org/abs/2506.15655
When Should Dense Retrievers Be Updated in Evolving Corpora? Detecting Out-of-Distribution Corpora Using GradNormIR
Dayoon Ko, Jinyoung Kim, Sohyeon Kim, Jinhyuk Kim, Jaehoon Lee, Seonghak Song, Minyoung Lee, Gunhee Kim
https://arxiv.org/abs/2506.01877
Knowledge-Aware Diverse Reranking for Cross-Source Question Answering
Tong Zhou
https://arxiv.org/abs/2506.20476 https://arxiv.org/pd…
NDCG-Consistent Softmax Approximation with Accelerated Convergence
Yuanhao Pu, Defu Lian, Xiaolong Chen, Xu Huang, Jin Chen, Enhong Chen
https://arxiv.org/abs/2506.09454
Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems
Haowei Wang, Rupeng Zhang, Junjie Wang, Mingyang Li, Yuekai Huang, Dandan Wang, Qing Wang
https://arxiv.org/abs/2506.06151
Harnessing the Power of Reinforcement Learning for Language-Model-Based Information Retriever via Query-Document Co-Augmentation
Jingming Liu, Yumeng Li, Wei Shi, Yao-Xiang Ding, Hui Su, Kun Zhou
https://arxiv.org/abs/2506.18670
Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
Hao Xu, Jiacheng Liu, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi
https://arxiv.org/abs/2506.12229
Testing Hypotheses of Covariate Effects on Topics of Discourse
Gabriel Phelan, David A. Campbell
https://arxiv.org/abs/2506.05570 https://
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data
Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari
https://arxiv.org/abs/2506.15614
Retrieval-Augmented Code Review Comment Generation
Hyunsun Hong, Jongmoon Baik
https://arxiv.org/abs/2506.11591 https://arxiv.org/pdf…
This https://arxiv.org/abs/2504.02712 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIT_…
S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamlessly Speech-Text Alignment and Streaming Speech Decoder
Yu Pan, Yuguang Yang, Yanni Hu, Jianhao Ye, Xiang Zhang, Hongbin Zhou, Lei Ma, Jianjun Zhao
https://arxiv.org/abs/2506.11160
eSapiens: A Real-World NLP Framework for Multimodal Document Understanding and Enterprise Knowledge Processing
Isaac Shi, Zeyuan Li, Wenli Wang, Lewei He, Yang Yang, Tianyu Shi
https://arxiv.org/abs/2506.16768
This https://arxiv.org/abs/2504.13416 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
Comparing Retrieval Strategies to Capture Interdisciplinary Scientific Research: A Bibliometric Evaluation of the Integration of Neuroscience and Computer Science
Malena Mendez Isla, Agustin Mauro, Diego Kozlowski
https://arxiv.org/abs/2506.03187
Replaced article(s) found for q-bio.OT. https://arxiv.org/list/q-bio.OT/new
[1/1]:
English dictionaries, gold and silver standard corpora for biomedical natural language processing...
Classification of Quality Characteristics in Online User Feedback using Linguistic Analysis, Crowdsourcing and LLMs
Eduard C. Groen, Fabiano Dalpiaz, Martijn van Vliet, Boris Winter, Joerg Doerr, Sjaak Brinkkemper
https://arxiv.org/abs/2506.11722
Architecture is All You Need: Improving LLM Recommenders by Dropping the Text
Kevin Foley, Shaghayegh Agah, Kavya Priyanka Kakinada
https://arxiv.org/abs/2506.15833
This https://arxiv.org/abs/2408.16028 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…
PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning
Yuhui Shi, Yehan Yang, Qiang Sheng, Hao Mi, Beizhe Hu, Chaoxi Xu, Juan Cao
https://arxiv.org/abs/2506.15683
Enabling automatic transcription of child-centered audio recordings from real-world environments
Daniil Kocharov, Okko R\"as\"anen
https://arxiv.org/abs/2506.11747
Understanding the Effect of Knowledge Graph Extraction Error on Downstream Graph Analyses: A Case Study on Affiliation Graphs
Erica Cai, Brendan O'Connor
https://arxiv.org/abs/2506.12367
This https://arxiv.org/abs/2502.11191 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…
This https://arxiv.org/abs/2505.22251 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…
This https://arxiv.org/abs/2506.06266 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…
Self-Supervised Speech Quality Assessment (S3QA): Leveraging Speech Foundation Models for a Scalable Speech Quality Metric
Mattson Ogg, Caitlyn Bishop, Han Yi, Sarah Robinson
https://arxiv.org/abs/2506.01655
Benford's Curse: Tracing Digit Bias to Numerical Hallucination in LLMs
Jiandong Shao, Yao Lu, Jianfei Yang
https://arxiv.org/abs/2506.01734 https://