Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csCL_bot@mastoxiv.page
2025-06-30 10:21:50

Refining Czech GEC: Insights from a Multi-Experiment Approach
Petr Pechman, Milan Straka, Jana Strakov\'a, Jakub N\'aplava
arxiv.org/abs/2506.22402

@awinkler@openbiblio.social
2025-06-29 15:51:00

Ich glaube, @… würde staunen, wenn er hört, unter welchen Bedingungen man z.B. an der @… mit nennenswert großen Corpora von geschützten Werken arbeiten darf: an einem Terminal ohne Internet und ohne Möglich…

@arXiv_csIR_bot@mastoxiv.page
2025-06-27 08:08:29

EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora
Fangyuan Zhang, Zhengjun Huang, Yingli Zhou, Qintian Guo, Zhixun Li, Wensheng Luo, Di Jiang, Yixiang Fang, Xiaofang Zhou
arxiv.org/abs/2506.20963

@arXiv_csSD_bot@mastoxiv.page
2025-06-23 10:33:30

Universal Music Representations? Evaluating Foundation Models on World Music Corpora
Charilaos Papaioannou, Emmanouil Benetos, Alexandros Potamianos
arxiv.org/abs/2506.17055

@arXiv_csCL_bot@mastoxiv.page
2025-06-26 07:31:50

CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation
Deepon Halder, Thanmay Jayakumar, Raj Dabre
arxiv.org/abs/2506.19952

@arXiv_physicssocph_bot@mastoxiv.page
2025-06-25 12:57:34

Replaced article(s) found for physics.soc-ph. arxiv.org/list/physics.soc-ph/
[1/1]:
- Entropy and type-token ratio in gigaword corpora
Pablo Rosillo-Rodes, Maxi San Miguel, David Sanchez

@arXiv_econGN_bot@mastoxiv.page
2025-06-19 08:39:22

Identifying economic narratives in large text corpora -- An integrated approach using Large Language Models
Tobias Schmidt, Kai-Robin Lange, Matthias Reccius, Henrik M\"uller, Michael Roos, Carsten Jentsch
arxiv.org/abs/2506.15041

@arXiv_csSE_bot@mastoxiv.page
2025-06-19 08:37:13

cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
Yilin Zhang, Xinran Zhao, Zora Zhiruo Wang, Chenyang Yang, Jiayi Wei, Tongshuang Wu
arxiv.org/abs/2506.15655

@arXiv_csIR_bot@mastoxiv.page
2025-06-03 07:38:39

When Should Dense Retrievers Be Updated in Evolving Corpora? Detecting Out-of-Distribution Corpora Using GradNormIR
Dayoon Ko, Jinyoung Kim, Sohyeon Kim, Jinhyuk Kim, Jaehoon Lee, Seonghak Song, Minyoung Lee, Gunhee Kim
arxiv.org/abs/2506.01877

@arXiv_csCL_bot@mastoxiv.page
2025-06-26 09:36:40

Knowledge-Aware Diverse Reranking for Cross-Source Question Answering
Tong Zhou
arxiv.org/abs/2506.20476 arxiv.org/pd…

@arXiv_csLG_bot@mastoxiv.page
2025-06-12 10:00:11

NDCG-Consistent Softmax Approximation with Accelerated Convergence
Yuanhao Pu, Defu Lian, Xiaolong Chen, Xu Huang, Jin Chen, Enhong Chen
arxiv.org/abs/2506.09454

@arXiv_csCR_bot@mastoxiv.page
2025-06-09 08:14:02

Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems
Haowei Wang, Rupeng Zhang, Junjie Wang, Mingyang Li, Yuekai Huang, Dandan Wang, Qing Wang
arxiv.org/abs/2506.06151

@arXiv_csIR_bot@mastoxiv.page
2025-06-24 11:43:30

Harnessing the Power of Reinforcement Learning for Language-Model-Based Information Retriever via Query-Document Co-Augmentation
Jingming Liu, Yumeng Li, Wei Shi, Yao-Xiang Ding, Hui Su, Kun Zhou
arxiv.org/abs/2506.18670

@arXiv_csCL_bot@mastoxiv.page
2025-06-17 09:33:51

Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
Hao Xu, Jiacheng Liu, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi
arxiv.org/abs/2506.12229

@arXiv_statME_bot@mastoxiv.page
2025-06-09 09:42:12

Testing Hypotheses of Covariate Effects on Topics of Discourse
Gabriel Phelan, David A. Campbell
arxiv.org/abs/2506.05570

@arXiv_csSD_bot@mastoxiv.page
2025-06-19 08:36:13

TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data
Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari
arxiv.org/abs/2506.15614

@arXiv_csSE_bot@mastoxiv.page
2025-06-16 10:22:19

Retrieval-Augmented Code Review Comment Generation
Hyunsun Hong, Jongmoon Baik
arxiv.org/abs/2506.11591 arxiv.org/pdf…

@arXiv_csIT_bot@mastoxiv.page
2025-06-03 16:20:54

This arxiv.org/abs/2504.02712 has been replaced.
initial toot: mastoxiv.page/@arXiv_csIT_…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-16 08:30:59

S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamlessly Speech-Text Alignment and Streaming Speech Decoder
Yu Pan, Yuguang Yang, Yanni Hu, Jianhao Ye, Xiang Zhang, Hongbin Zhou, Lei Ma, Jianjun Zhao
arxiv.org/abs/2506.11160

@arXiv_csIR_bot@mastoxiv.page
2025-06-23 09:44:00

eSapiens: A Real-World NLP Framework for Multimodal Document Understanding and Enterprise Knowledge Processing
Isaac Shi, Zeyuan Li, Wenli Wang, Lewei He, Yang Yang, Tianyu Shi
arxiv.org/abs/2506.16768

@arXiv_csLG_bot@mastoxiv.page
2025-06-10 19:17:32

This arxiv.org/abs/2504.13416 has been replaced.
initial toot: mastoxiv.page/@arXiv_csLG_…

@arXiv_csDL_bot@mastoxiv.page
2025-06-05 07:17:16

Comparing Retrieval Strategies to Capture Interdisciplinary Scientific Research: A Bibliometric Evaluation of the Integration of Neuroscience and Computer Science
Malena Mendez Isla, Agustin Mauro, Diego Kozlowski
arxiv.org/abs/2506.03187

@arXiv_qbioOT_bot@mastoxiv.page
2025-06-16 14:57:43

Replaced article(s) found for q-bio.OT. arxiv.org/list/q-bio.OT/new
[1/1]:
English dictionaries, gold and silver standard corpora for biomedical natural language processing...

@arXiv_csSE_bot@mastoxiv.page
2025-06-16 10:25:39

Classification of Quality Characteristics in Online User Feedback using Linguistic Analysis, Crowdsourcing and LLMs
Eduard C. Groen, Fabiano Dalpiaz, Martijn van Vliet, Boris Winter, Joerg Doerr, Sjaak Brinkkemper
arxiv.org/abs/2506.11722

@arXiv_csIR_bot@mastoxiv.page
2025-06-23 08:03:49

Architecture is All You Need: Improving LLM Recommenders by Dropping the Text
Kevin Foley, Shaghayegh Agah, Kavya Priyanka Kakinada
arxiv.org/abs/2506.15833

@arXiv_csCR_bot@mastoxiv.page
2025-06-03 16:55:16

This arxiv.org/abs/2408.16028 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCR_…

@arXiv_csCL_bot@mastoxiv.page
2025-06-19 08:16:54

PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning
Yuhui Shi, Yehan Yang, Qiang Sheng, Hao Mi, Beizhe Hu, Chaoxi Xu, Juan Cao
arxiv.org/abs/2506.15683

@arXiv_csSD_bot@mastoxiv.page
2025-06-16 08:04:09

Enabling automatic transcription of child-centered audio recordings from real-world environments
Daniil Kocharov, Okko R\"as\"anen
arxiv.org/abs/2506.11747

@arXiv_csCL_bot@mastoxiv.page
2025-06-17 10:04:25

Understanding the Effect of Knowledge Graph Extraction Error on Downstream Graph Analyses: A Case Study on Affiliation Graphs
Erica Cai, Brendan O'Connor
arxiv.org/abs/2506.12367

@arXiv_csCR_bot@mastoxiv.page
2025-06-03 17:36:14

This arxiv.org/abs/2502.11191 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCR_…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-06 09:39:10

This arxiv.org/abs/2505.22251 has been replaced.
initial toot: mastoxiv.page/@arXiv_ees…

@arXiv_csCL_bot@mastoxiv.page
2025-06-10 19:06:51

This arxiv.org/abs/2506.06266 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCL_…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-03 07:57:39

Self-Supervised Speech Quality Assessment (S3QA): Leveraging Speech Foundation Models for a Scalable Speech Quality Metric
Mattson Ogg, Caitlyn Bishop, Han Yi, Sarah Robinson
arxiv.org/abs/2506.01655

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:19:57

Benford's Curse: Tracing Digit Bias to Numerical Hallucination in LLMs
Jiandong Shao, Yao Lu, Jianfei Yang
arxiv.org/abs/2506.01734