Tootfinder

@arXiv_csCL_bot@mastoxiv.page
2025-08-06 10:15:50

Cropping outperforms dropout as an augmentation strategy for training self-supervised text embeddings
Rita Gonz\'alez-M\'arquez, Philipp Berens, Dmitry Kobak
https://arxiv.org/abs/2508.03453

Cropping outperforms dropout as an augmentation strategy for training self-supervised text embeddings
Text embeddings, i.e. vector representations of entire texts, play an important role in many NLP applications, such as retrieval-augmented generation, sentiment analysis, clustering, or visualizing collections of texts for data exploration. Currently, top-performing embedding models are derived from pre-trained language models via extensive supervised fine-tuning using curated text pairs. This contrasts with computer vision, where self-supervised training based on data augmentations has demonst…

@arXiv_csHC_bot@mastoxiv.page
2025-08-07 09:35:14

StepWrite: Adaptive Planning for Speech-Driven Text Generation
Hamza El Alaoui, Atieh Taheri, Yi-Hao Peng, Jeffrey P. Bigham
https://arxiv.org/abs/2508.04011 https://

StepWrite: Adaptive Planning for Speech-Driven Text Generation
People frequently use speech-to-text systems to compose short texts with voice. However, current voice-based interfaces struggle to support composing more detailed, contextually complex texts, especially in scenarios where users are on the move and cannot visually track progress. Longer-form communication, such as composing structured emails or thoughtful responses, requires persistent context tracking, structured guidance, and adaptability to evolving user intentions--capabilities that convent…

@arXiv_csCV_bot@mastoxiv.page
2025-09-05 10:25:11

AnomalyLMM: Bridging Generative Knowledge and Discriminative Retrieval for Text-Based Person Anomaly Search
Hao Ju, Hu Zhang, Zhedong Zheng
https://arxiv.org/abs/2509.04376 http…

AnomalyLMM: Bridging Generative Knowledge and Discriminative Retrieval for Text-Based Person Anomaly Search
With growing public safety demands, text-based person anomaly search has emerged as a critical task, aiming to retrieve individuals with abnormal behaviors via natural language descriptions. Unlike conventional person search, this task presents two unique challenges: (1) fine-grained cross-modal alignment between textual anomalies and visual behaviors, and (2) anomaly recognition under sparse real-world samples. While Large Multi-modal Models (LMMs) excel in multi-modal understanding, their pot…

@publicvoit@graz.social
2025-07-06 09:13:28

The Advantages of #Text-Based #Information Versus #Videos, #Audio or

The Advantages of Text-Based Information Versus Videos, Audio or Images
The Advantages of Text-Based Information Versus Videos, Audio or Images

@arXiv_csCL_bot@mastoxiv.page
2025-08-06 10:21:50

EmbedGrad: Gradient-Based Prompt Optimization in Embedding Space for Large Language Models
Xiaoming Hou, Jiquan Zhang, Zibin Lin, DaCheng Tao, Shengli Zhang
https://arxiv.org/abs/2508.03533

EmbedGrad: Gradient-Based Prompt Optimization in Embedding Space for Large Language Models
Effectively adapting powerful pretrained foundation models to diverse tasks remains a key challenge in AI deployment. Current approaches primarily follow two paradigms:discrete optimization of text prompts through prompt engineering, or continuous adaptation via additional trainable parameters. Both exhibit limitations-discrete methods lack refinement precision while parameter-based techniques increase complexity and reduce interpretability. To address these constraints, we propose EmbedGrad, a…

@arXiv_csCY_bot@mastoxiv.page
2025-07-08 08:08:00

Challenges for AI in Multimodal STEM Assessments: a Human-AI Comparison
Aymeric de Chillaz, Anna Sotnikova, Patrick Jermann, Antoine Bosselut
https://arxiv.org/abs/2507.03013

Challenges for AI in Multimodal STEM Assessments: a Human-AI Comparison
Generative AI systems have rapidly advanced, with multimodal input capabilities enabling reasoning beyond text-based tasks. In education, these advancements could influence assessment design and question answering, presenting both opportunities and challenges. To investigate these effects, we introduce a high-quality dataset of 201 university-level STEM questions, manually annotated with features such as image type, role, problem complexity, and question format. Our study analyzes how these fea…

@arXiv_csCV_bot@mastoxiv.page
2025-08-06 10:38:50

Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
Hyungjin Kim, Seokho Ahn, Young-Duk Seo
https://arxiv.org/abs/2508.03481 h…

Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
Personalized generation in T2I diffusion models aims to naturally incorporate individual user preferences into the generation process with minimal user intervention. However, existing studies primarily rely on prompt-level modeling with large-scale models, often leading to inaccurate personalization due to the limited input token capacity of T2I diffusion models. To address these limitations, we propose DrUM, a novel method that integrates user profiling with a transformer-based adapter to enab…

@arXiv_csDL_bot@mastoxiv.page
2025-09-05 07:55:31

The changing role of cited papers over time: An analysis of highly cited papers based on a large full-text dataset
Gege Lin, Nees Jan van Eck, Haiyan Hou, Zhigang Hu
https://arxiv.org/abs/2509.04190

The changing role of cited papers over time: An analysis of highly cited papers based on a large full-text dataset
This paper examines how the role of cited papers evolves over time by analyzing nearly 900 highly cited papers (HCPs) published between 2000 and 2016 and the full text of over 220,000 papers citing them. We investigate multiple citation characteristics, including citation location within the full text, reference and in-text citation types, citation sentiment, and textual and bibliographic relatedness between citing and cited papers. Our findings reveal that as HCPs age, they tend to be cited ea…

@arXiv_csSE_bot@mastoxiv.page
2025-09-04 09:10:41

The Impact of Critique on LLM-Based Model Generation from Natural Language: The Case of Activity Diagrams
Parham Khamsepour, Mark Cole, Ish Ashraf, Sandeep Puri, Mehrdad Sabetzadeh, Shiva Nejati
https://arxiv.org/abs/2509.03463

The Impact of Critique on LLM-Based Model Generation from Natural Language: The Case of Activity Diagrams
Large Language Models (LLMs) show strong potential for automating the generation of models from natural-language descriptions. A common approach is an iterative generate-critique-refine loop, where candidate models are produced, evaluated, and updated based on detected issues. This process needs to address: (1) structural correctness - compliance with well-formedness rules - and (2) semantic alignment - accurate reflection of the intended meaning in the source text. We present LADEX (LLM-based …

@jswright61@ruby.social
2025-08-05 16:29:43

Dear Apple,
It makes sense for you to reorder my frequently used emoji based on how frequently I use them, but please wait until I’ve closed the emoji keyboard before you do.
I wanted to text my wife three ♥️ but after the first it switched to 🤔and then for the third it went to 🤷. It’s a good thing I noticed because
♥️♥️♥️ is very different than ♥️🤔🤷

@arXiv_hepph_bot@mastoxiv.page
2025-07-03 09:58:40

Precision determination of $\alpha_\text{s}$ from Dijet Cross Sections in the Multi-TeV Range
Jo\~ao Pires
https://arxiv.org/abs/2507.01670 https://…

Precision determination of $α_\text{s}$ from Dijet Cross Sections in the Multi-TeV Range
In this talk we present a determination of the strong coupling constant $α_\text{s}$ and its energy-scale dependence based on a next-to-next-to-leading order (NNLO) QCD analysis of dijet production. Using the invariant mass of the dijet system to probe $α_\text{s}$ at different scales, we extract a value of $α_\text{s}(m_\text{Z})=0.1178 \pm 0.0022$ from LHC dijet data. The combination of various LHC datasets significantly extends the precision and scale reach of the analysis, enabling the f…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-05 08:54:01

Some Remarks on the $l_1$-Robust Solution of LexRank Problem
Anna Timonina-Farkas
https://arxiv.org/abs/2509.04131 https://arxiv.org/pdf/2509.04131

Some Remarks on the $l_1$-Robust Solution of LexRank Problem
Graph-based ranking methods, such as LexRank, are fundamental in Natural Language Processing (NLP) applications like text summarization, as they measure the relative importance of textual units. Building on recent advances in ranking methods for growing and dynamic graphs, we develop a robust variant of LexRank that operates on stochastic similarity graphs with uncertain and expanding structure. Our approach introduces a novel $l_1$-based formulation that captures ambiguity in both transition p…

@tiotasram@kolektiva.social
2025-07-04 20:14:31

Long; central Massachusetts colonial history
Today on a whim I visited a site in Massachusetts marked as "Huguenot Fort Ruins" on OpenStreetMaps. I drove out with my 4-year-old through increasingly rural central Massachusetts forests & fields to end up on a narrow street near the top of a hill beside a small field. The neighboring houses had huge lawns, some with tractors.
Appropriately for this day and this moment in history, the history of the site turns out to be a microcosm of America. Across the field beyond a cross-shaped stone memorial stood an info board with a few diagrams and some text. The text of the main sign (including typos/misspellings) read:
"""
Town Is Formed
Early in the 1680's, interest began to generate to develop a town in the area west of Natick in the south central part of the Commonwealth that would be suitable for a settlement. A Mr. Hugh Campbell, a Scotch merchant of Boston petitioned the court for land for a colony. At about the same time, Joseph Dudley and William Stoughton also were desirous of obtaining land for a settlement. A claim was made for all lands west of the Blackstone River to the southern land of Massachusetts to a point northerly of the Springfield Road then running southwesterly until it joined the southern line of Massachusetts.
Associated with Dudley and Stoughton was Robert Thompson of London, England, Dr. Daniel Cox and John Blackwell, both of London and Thomas Freak of Hannington, Wiltshire, as proprietors. A stipulation in the acquisition of this land being that within four years thirty families and an orthodox minister settle in the area. An extension of this stipulation was granted at the end of the four years when no group large enough seemed to be willing to take up the opportunity.
In 1686, Robert Thompson met Gabriel Bernor and learned that he was seeking an area where his countrymen, who had fled their native France because of the Edict of Nantes, were desirous of a place to live. Their main concern was to settle in a place that would allow them freedom of worship. New Oxford, as it was the so-named, at that time included the larger part of Charlton, one-fourth of Auburn, one-fifth of Dudley and several square miles of the northeast portion of Southbridge as well as the easterly ares now known as Webster.
Joseph Dudley's assessment that the area was capable of a good settlement probably was based on the idea of the meadows already established along with the plains, ponds, brooks and rivers. Meadows were a necessity as they provided hay for animal feed and other uses by the settlers. The French River tributary books and streams provided a good source for fishing and hunting. There were open areas on the plains as customarily in November of each year, the Indians burnt over areas to keep them free of underwood and brush. It appeared then that this area was ready for settling.
The first seventy-five years of the settling of the Town of Oxford originally known as Manchaug, embraced three different cultures. The Indians were known to be here about 1656 when the Missionary, John Eliott and his partner Daniel Gookin visited in the praying towns. Thirty years later, in 1686, the Huguenots walked here from Boston under the guidance of their leader Isaac Bertrand DuTuffeau. The Huguenot's that arrived were not peasants, but were acknowledged to be the best Agriculturist, Wine Growers, Merchant's, and Manufacter's in France. There were 30 families consisting of 52 people. At the time of their first departure (10 years), due to Indian insurrection, there were 80 people in the group, and near their Meetinghouse/Church was a Cemetery that held 20 bodies. In 1699, 8 to 10 familie's made a second attempt to re-settle, failing after only four years, with the village being completely abandoned in 1704.
The English colonist made their way here in 1713 and established what has become a permanent settlement.
"""
All that was left of the fort was a crumbling stone wall that would have been the base of a higher wooden wall according to a picture of a model (I didn't think to get a shot of that myself). Only trees and brush remain where the multi-story main wooden building was.
This story has so many echoes in the present:
- The rich colonialists from Boston & London agree to settle the land, buying/taking land "rights" from the colonial British court that claimed jurisdiction without actually having control of the land. Whether the sponsors ever actually visited the land themselves I don't know. They surely profited somehow, whether from selling on the land rights later or collecting taxes/rent or whatever, by they needed poor laborers to actually do the work of developing the land (& driving out the original inhabitants, who had no say in the machinations of the Boston court).
- The land deal was on condition that there capital-holders who stood to profit would find settlers to actually do the work of colonizing. The British crown wanted more territory to be controlled in practice not just in theory, but they weren't going to be the ones to do the hard work.
- The capital-holders actually failed to find enough poor suckers to do their dirty work for 4 years, until the Huguenots, fleeing religious persecution in France, were desperate enough to accept their terms.
- Of course, the land was only so ripe for settlement because of careful tending over centuries by the natives who were eventually driven off, and whose land management practices are abandoned today. Given the mention of praying towns (& dates), this was after King Phillip's war, which resulted in at least some forced resettlement of native tribes around the area, but the descendants of those "Indians" mentioned in this sign are still around. For example, this is the site of one local band of Nipmuck, whose namesake lake is about 5 miles south of the fort site: #LandBack.

@arXiv_csIR_bot@mastoxiv.page
2025-09-05 08:16:21

Efficient Item ID Generation for Large-Scale LLM-based Recommendation
Anushya Subbiah, Vikram Aggarwal, James Pine, Steffen Rendle, Krishna Sayana, Kun Su
https://arxiv.org/abs/2509.03746

Efficient Item ID Generation for Large-Scale LLM-based Recommendation
Integrating product catalogs and user behavior into LLMs can enhance recommendations with broad world knowledge, but the scale of real-world item catalogs, often containing millions of discrete item identifiers (Item IDs), poses a significant challenge. This contrasts with the smaller, tokenized text vocabularies typically used in LLMs. The predominant view within the LLM-based recommendation literature is that it is infeasible to treat item ids as a first class citizen in the LLM and instead s…

@arXiv_csCR_bot@mastoxiv.page
2025-07-03 09:22:50

Graph Representation-based Model Poisoning on Federated LLMs in CyberEdge Networks
Hanlin Cai, Haofan Dong, Houtianfu Wang, Kai Li, Ozgur B. Akan
https://arxiv.org/abs/2507.01694 …

Graph Representation-based Model Poisoning on Federated LLMs in CyberEdge Networks
Federated large language models (FedLLMs) provide powerful generative capabilities in CyberEdge networks while protecting data privacy. However, FedLLMs remains highly vulnerable to model poisoning attacks. This article first reviews recent model poisoning techniques and existing defense mechanisms for FedLLMs, highlighting critical limitations, particularly under non-IID text distributions. In particular, current defenses primarily utilize distance-based outlier detection or norm constraints, …

@arXiv_csSD_bot@mastoxiv.page
2025-09-03 11:25:23

The AudioMOS Challenge 2025
Wen-Chin Huang, Hui Wang, Cheng Liu, Yi-Chiao Wu, Andros Tjandra, Wei-Ning Hsu, Erica Cooper, Yong Qin, Tomoki Toda
https://arxiv.org/abs/2509.01336 …

The AudioMOS Challenge 2025
This is the summary paper for the AudioMOS Challenge 2025, the very first challenge for automatic subjective quality prediction for synthetic audio. The challenge consists of three tracks. The first track aims to assess text-to-music samples in terms of overall quality and textual alignment. The second track is based on the four evaluation dimensions of Meta Audiobox Aesthetics, and the test set consists of text-to-speech, text-to-audio, and text-to-music samples. The third track focuses on syn…

@arXiv_csAI_bot@mastoxiv.page
2025-09-03 11:12:03

BALM-TSF: Balanced Multimodal Alignment for LLM-Based Time Series Forecasting
Shiqiao Zhou, Holger Sch\"oner, Huanbo Lyu, Edouard Fouch\'e, Shuo Wang
https://arxiv.org/abs/2509.00622

BALM-TSF: Balanced Multimodal Alignment for LLM-Based Time Series Forecasting
Time series forecasting is a long-standing and highly challenging research topic. Recently, driven by the rise of large language models (LLMs), research has increasingly shifted from purely time series methods toward harnessing textual modalities to enhance forecasting performance. However, the vast discrepancy between text and temporal data often leads current multimodal architectures to over-emphasise one modality while neglecting the other, resulting in information loss that harms forecastin…

@arXiv_eessSP_bot@mastoxiv.page
2025-09-04 09:34:21

Handwriting Imagery EEG Classification based on Convolutional Neural Networks
Hao Yang, Guang Ouyang
https://arxiv.org/abs/2509.03111 https://arxiv.org/pdf…

Handwriting Imagery EEG Classification based on Convolutional Neural Networks
Handwriting imagery has emerged as a promising paradigm for brain-computer interfaces (BCIs) aimed at translating brain activity into text output. Compared with invasively recorded electroencephalography (EEG), non-invasive recording offers a more practical and feasible approach to capturing brain signals for BCI. This study explores the limit of decoding non-invasive EEG associated with handwriting imagery into English letters using deep neural networks. To this end, five participants were ins…

@arXiv_csCV_bot@mastoxiv.page
2025-09-05 10:21:01

TauGenNet: Plasma-Driven Tau PET Image Synthesis via Text-Guided 3D Diffusion Models
Yuxin Gong (for the Alzheimer's Disease Neuroimaging Initiative), Se-in Jang (for the Alzheimer's Disease Neuroimaging Initiative), Wei Shao (for the Alzheimer's Disease Neuroimaging Initiative), Yi Su (for the Alzheimer's Disease Neuroimaging Initiative), Kuang Gong (for the Alzheimer's Disease Neuroimaging Initiative)

TauGenNet: Plasma-Driven Tau PET Image Synthesis via Text-Guided 3D Diffusion Models
Accurate quantification of tau pathology via tau positron emission tomography (PET) scan is crucial for diagnosing and monitoring Alzheimer's disease (AD). However, the high cost and limited availability of tau PET restrict its widespread use. In contrast, structural magnetic resonance imaging (MRI) and plasma-based biomarkers provide non-invasive and widely available complementary information related to brain anatomy and disease progression. In this work, we propose a text-guided 3D diffusion …

@arXiv_mathQA_bot@mastoxiv.page
2025-08-05 08:57:40

A Gentle Introduction to Algebraic Operads
Felicia Ferraioli
https://arxiv.org/abs/2508.01886 https://arxiv.org/pdf/2508.01886

A Gentle Introduction to Algebraic Operads
This text, based on the author's Bachelor's thesis, introduces the theory of Algebraic Operads, a mathematical formalism that provides a unifying framework for modern algebra. We demonstrate how the fundamental theories of associative, commutative, and Lie algebras can be fully recovered as categories of representations of three archetypal operads: $\mathsf{As}$, $\mathsf{Com}$ and $\mathsf{Lie}$ -- the so-called 'three graces' of algebra. Following a deductive and self-contained approach, th…

@arXiv_csCL_bot@mastoxiv.page
2025-08-04 09:51:20

GHTM: A Graph based Hybrid Topic Modeling Approach in Low-Resource Bengali Language
Farhana Haque, Md. Abdur Rahman, Sumon Ahmed
https://arxiv.org/abs/2508.00605 https://…

GHTM: A Graph based Hybrid Topic Modeling Approach in Low-Resource Bengali Language
Topic modeling is a Natural Language Processing (NLP) technique that is used to identify latent themes and extract topics from text corpora by grouping similar documents based on their most significant keywords. Although widely researched in English, topic modeling remains understudied in Bengali due to its morphological complexity, lack of adequate resources and initiatives. In this contribution, a novel Graph Convolutional Network (GCN) based model called GHTM (Graph-Based Hybrid Topic Model)…

@arXiv_csHC_bot@mastoxiv.page
2025-09-05 09:41:21

SRWToolkit: An Open Source Wizard of Oz Toolkit to Create Social Robotic Avatars
Atikkhan Faridkhan Nilgar, Kristof Van Laerhoven, Ayub Kinoti
https://arxiv.org/abs/2509.04356 h…

SRWToolkit: An Open Source Wizard of Oz Toolkit to Create Social Robotic Avatars
We present SRWToolkit, an open-source Wizard of Oz toolkit designed to facilitate the rapid prototyping of social robotic avatars powered by local large language models (LLMs). Our web-based toolkit enables multimodal interaction through text input, button-activated speech, and wake-word command. The toolkit offers real-time configuration of avatar appearance, behavior, language, and voice via an intuitive control panel. In contrast to prior works that rely on cloud-based LLM services, SRWToolk…

@arXiv_eessIV_bot@mastoxiv.page
2025-07-04 08:31:01

A robust and versatile deep learning model for prediction of the arterial input function in dynamic small animal $\left[^{18}\text{F}\right]$FDG PET imaging
Christian Salomonsen, Luigi Tommaso Luppino, Fredrik Aspheim, Kristoffer Wickstr{\o}m, Elisabeth Wetzer, Michael Kampffmeyer, Rodrigo Berzaghi, Rune Sundset, Robert Jenssen, Samuel Kuttner
https:…

A robust and versatile deep learning model for prediction of the arterial input function in dynamic small animal $\left[^{18}\text{F}\right]$FDG PET imaging
Dynamic positron emission tomography (PET) and kinetic modeling are pivotal in advancing tracer development research in small animal studies. Accurate kinetic modeling requires precise input function estimation, traditionally achieved via arterial blood sampling. However, arterial cannulation in small animals like mice, involves intricate, time-consuming, and terminal procedures, precluding longitudinal studies. This work proposes a non-invasive, fully convolutional deep learning-based approach…

@arXiv_eessAS_bot@mastoxiv.page
2025-09-03 10:22:03

MPO: Multidimensional Preference Optimization for Language Model-based Text-to-Speech
Kangxiang Xia, Xinfa Zhu, Jixun Yao, Lei Xie
https://arxiv.org/abs/2509.00685 https://

MPO: Multidimensional Preference Optimization for Language Model-based Text-to-Speech
In recent years, text-to-speech (TTS) has seen impressive advancements through large-scale language models, achieving human-level speech quality. Integrating human feedback has proven effective for enhancing robustness in these systems. However, current approaches face challenges in optimizing TTS with preference data across multiple dimensions and often suffer from performance degradation due to overconfidence in rewards. We propose Multidimensional Preference Optimization (MPO) to better alig…

@arXiv_csCV_bot@mastoxiv.page
2025-08-06 10:29:10

Zero Shot Domain Adaptive Semantic Segmentation by Synthetic Data Generation and Progressive Adaptation
Jun Luo, Zijing Zhao, Yang Liu
https://arxiv.org/abs/2508.03300 https://

Zero Shot Domain Adaptive Semantic Segmentation by Synthetic Data Generation and Progressive Adaptation
Deep learning-based semantic segmentation models achieve impressive results yet remain limited in handling distribution shifts between training and test data. In this paper, we present SDGPA (Synthetic Data Generation and Progressive Adaptation), a novel method that tackles zero-shot domain adaptive semantic segmentation, in which no target images are available, but only a text description of the target domain's style is provided. To compensate for the lack of target domain training data, we ut…

@arXiv_econGN_bot@mastoxiv.page
2025-07-04 07:40:01

Seeing Through Green: Text-Based Classification and the Firm's Returns from Green Patents
Lapo Santarlasci, Armando Rungi, Antonio Zinilli
https://arxiv.org/abs/2507.02287

Seeing Through Green: Text-Based Classification and the Firm's Returns from Green Patents
This paper introduces Natural Language Processing for identifying ``true'' green patents from official supporting documents. We start our training on about 12.4 million patents that had been classified as green from previous literature. Thus, we train a simple neural network to enlarge a baseline dictionary through vector representations of expressions related to environmental technologies. After testing, we find that ``true'' green patents represent about 20\% of the total of patents classifie…

@Allegra@social.linux.pizza
2025-08-03 14:35:10

Did you know I studied electrical engineering besides work? On Thursday, I finished my bachelor's thesis with the final presentation and it's time to finally present my project: ZEReader, a microcontroller-based E-Reader.
Inspired by the Open Book Project by Joey Castillo, I designed my own platform from scratch. My focus was on building a reader usable in everyday life that is capable of handling books in the EPUB format. The project is still in a very early phase, but it shows…

A rendering of the. ZEReader PCB with a Raspberry Pi Pico 2 as a main component.

A picture of of the KiCad board layout view.

The assembled ZEReader device, with display and PCB. There is no enclosing yet and the text is 90 degrees rotated.

@Techmeme@techhub.social
2025-06-16 18:30:38

Threads plans to let users hide text or images that spoil a piece of entertainment, blurring the text or image that has been marked as a spoiler (Alex Weprin/The Hollywood Reporter)
https://www.hollywoodreporter.com/business/digita…

Meta’s Threads Is Declaring War on Spoilers (Exclusive)
The text-based app from Meta is launching a tool that lets users hide spoilers in text or images, to encourage discussion of TV shows and films on the platform.

@arXiv_statML_bot@mastoxiv.page
2025-08-01 08:48:41

DICOM De-Identification via Hybrid AI and Rule-Based Framework for Scalable, Uncertainty-Aware Redaction
Kyle Naddeo, Nikolas Koutsoubis, Rahul Krish, Ghulam Rasool, Nidhal Bouaynaya, Tony OSullivan, Raj Krish
https://arxiv.org/abs/2507.23736

DICOM De-Identification via Hybrid AI and Rule-Based Framework for Scalable, Uncertainty-Aware Redaction
Access to medical imaging and associated text data has the potential to drive major advances in healthcare research and patient outcomes. However, the presence of Protected Health Information (PHI) and Personally Identifiable Information (PII) in Digital Imaging and Communications in Medicine (DICOM) files presents a significant barrier to the ethical and secure sharing of imaging datasets. This paper presents a hybrid de-identification framework developed by Impact Business Information Solutio…

@arXiv_csSE_bot@mastoxiv.page
2025-08-04 09:04:30

Accurate and Consistent Graph Model Generation from Text with Large Language Models
Boqi Chen, Ou Wei, Bingzhou Zheng, Gunter Mussbacher
https://arxiv.org/abs/2508.00255 https:/…

Accurate and Consistent Graph Model Generation from Text with Large Language Models
Graph model generation from natural language description is an important task with many applications in software engineering. With the rise of large language models (LLMs), there is a growing interest in using LLMs for graph model generation. Nevertheless, LLM-based graph model generation typically produces partially correct models that suffer from three main issues: (1) syntax violations: the generated model may not adhere to the syntax defined by its metamodel, (2) constraint inconsistencies:…

@arXiv_mathph_bot@mastoxiv.page
2025-09-04 08:46:11

On the inextensibility assumption in the stability of elastic rings: overhaul of a traditional paradigm
Federico Guarracino, Ida Mascolo
https://arxiv.org/abs/2509.02738 https:/…

On the inextensibility assumption in the stability of elastic rings: overhaul of a traditional paradigm
One of the oldest and most common structural engineering issues is the elastic buckling of circular rings under external pressure, which has a fundamental importance in a number of applications in general mechanics, engineering and bio-physics, just to name a few. Levy is considered to have provided the first significant solution to this problem in 1884, and most stability text-books make reference to this original solution, which is based on the Euler-Bernoulli beam model. Following this incip…

@cdarwin@c.im
2025-08-30 21:26:55

The teachings of Falun Gong stem entirely from its enigmatic leader,
Li Hongzhi, whom followers view as a “God-like figure.”
Since settling in the United States in the late 1990s, Li has remained reclusive, but his beliefs shape the core tenets of Falun Gong and its many tendrils:
in addition to traditional Buddhist practices like qigong, a type of movement-based meditation,
Falun Gong also teaches that homosexuality creates “bad karma” and is “comparable to organize…

La Belle Epoch | Los Angeles Review of Books
Arielle Gordon traces the rise of “The Epoch Times” through her grandmother’s text messages.

@arXiv_hepex_bot@mastoxiv.page
2025-08-01 09:04:51

Preliminary design and simulation for CEPC fast luminosity monitor detector based on 4H-SiC
Yanpeng Li, Meng Li, Xingrui Wang, Weimin Song, Xiyuan Zhang, Congcong Wang, Suyu Xiao, Haoyu Shi, Dou Wang, Philip Bambade, Xin Shi
https://arxiv.org/abs/2507.23368

Preliminary design and simulation for CEPC fast luminosity monitor detector based on 4H-SiC
The Circular Electron-Positron Collider (CEPC), a next-generation high-luminosity collider, employs a crab waist scheme to achieve ultrahigh $5 \times 10^{34} \, \text{cm}^{-2}\text{s}^{-1}$ luminosity at Higgs mode. Owing to the extremely small beam size, the luminosity is highly sensitive to the stability of final focusing elements, where mechanical vibrations (e.g. ground motion) may induce beam offsets and luminosity degradation. To address this, a luminosity-driven dithering system is impl…

@arXiv_csIR_bot@mastoxiv.page
2025-09-03 10:15:53

ERank: Fusing Supervised Fine-Tuning and Reinforcement Learning for Effective and Efficient Text Reranking
Yuzheng Cai, Yanzhao Zhang, Dingkun Long, Mingxin Li, Pengjun Xie, Weiguo Zheng
https://arxiv.org/abs/2509.00520

ERank: Fusing Supervised Fine-Tuning and Reinforcement Learning for Effective and Efficient Text Reranking
Text reranking models are a crucial component in modern systems like Retrieval-Augmented Generation, tasked with selecting the most relevant documents prior to generation. However, current Large Language Models (LLMs) powered rerankers often face a fundamental trade-off. On one hand, Supervised Fine-Tuning based pointwise methods that frame relevance as a binary classification task lack the necessary scoring discrimination, particularly for those built on reasoning LLMs. On the other hand, appr…

@arXiv_csCV_bot@mastoxiv.page
2025-08-06 10:32:50

VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation
Yufei Xue, Yushi Huang, Jiawei Shao, Jun Zhang
https://arxiv.org/abs/2508.03351

VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation
Post-training quantization (PTQ) has emerged as an effective approach for compressing large models and accelerating their inference without retraining. While PTQ has been extensively studied in the context of large language models (LLMs), its applicability to vision-language models (VLMs) remains underexplored. In this paper, we identify a modality discrepancy (\emph{i.e.}, limited text tokens \emph{vs.} excessive and redundant vision tokens) of VLMs. However, existing Hessian-based LLM PTQ met…

@arXiv_csSD_bot@mastoxiv.page
2025-09-03 09:58:03

PicoAudio2: Temporal Controllable Text-to-Audio Generation with Natural Language Description
Zihao Zheng, Zeyu Xie, Xuenan Xu, Wen Wu, Chao Zhang, Mengyue Wu
https://arxiv.org/abs/2509.00683

PicoAudio2: Temporal Controllable Text-to-Audio Generation with Natural Language Description
Controllable text-to-audio generation (TTA) has attracted much attention recently. Although existing works can achieve fine-grained controllability based on timestamp information, sound event categories are limited to a fixed set. Moreover, since only simulated data is used for training, the generated audio quality and generalization performance on real data are limited. To tackle this issue, we propose PicoAudio2, improving temporal-controllable TTA via a new data processing pipeline and model…

@arXiv_csCR_bot@mastoxiv.page
2025-09-04 07:34:30

Secure Password Generator Based on Secure Pseudo-Random Number Generator
Abel C. H. Chen
https://arxiv.org/abs/2509.02578 https://arxiv.org/pdf/2509.02578

Secure Password Generator Based on Secure Pseudo-Random Number Generator
In recent years, numerous incidents involving the leakage of website accounts and text passwords (referred to as passwords) have raised significant concerns regarding the potential exposure of personal information. These events underscore the critical importance of both information security and password protection. While many of these breaches are attributable to vulnerabilities within website infrastructure, the strength and security of the passwords themselves also play a crucial role. Conseq…

@Mediagazer@mstdn.social
2025-06-16 18:55:45

Threads plans to let users hide text or images that spoil a piece of entertainment, blurring the text or image that has been marked as a spoiler (Alex Weprin/The Hollywood Reporter)
https://www.hollywoodreporter.com/business/digita…

Meta’s Threads Is Declaring War on Spoilers (Exclusive)
The text-based app from Meta is launching a tool that lets users hide spoilers in text or images, to encourage discussion of TV shows and films on the platform.

@arXiv_csCL_bot@mastoxiv.page
2025-09-05 09:53:11

SiLVERScore: Semantically-Aware Embeddings for Sign Language Generation Evaluation
Saki Imai, Mert \.Inan, Anthony Sicilia, Malihe Alikhani
https://arxiv.org/abs/2509.03791 http…

SiLVERScore: Semantically-Aware Embeddings for Sign Language Generation Evaluation
Evaluating sign language generation is often done through back-translation, where generated signs are first recognized back to text and then compared to a reference using text-based metrics. However, this two-step evaluation pipeline introduces ambiguity: it not only fails to capture the multimodal nature of sign language-such as facial expressions, spatial grammar, and prosody-but also makes it hard to pinpoint whether evaluation errors come from sign generation model or the translation system…

@arXiv_nuclex_bot@mastoxiv.page
2025-08-29 08:59:11

First Observation of Solar Neutrino Interactions on $^{13}$C
SNO Collaboration, :, M. Abreu, A. Allega, M. R. Anderson, S. Andringa, D. M. Asner, D. J. Auty, A. Bacon, T. Baltazar, F. Bar\~ao, N. Barros, R. Bayes, E. W. Beier, A. Bialek, S. D. Biller, E. Caden, M. Chen, S. Cheng, B. Cleveland, D. Cookman, J. Corning, S. DeGraw, R. Dehghani, J. Deloye, M. M. Depatie, F. Di Lodovico, C. Dima, J. Dittmer, K. H. Dixon, M. S. Esmaeilian, E. Falk, N. Fatemighomi, R. Ford, A. Gaur, O. I. Go…

First Observation of Solar Neutrino Interactions on $^{13}$C
The SNO+ Collaboration reports the first evidence of $^{8}\text{B}$ solar neutrinos interacting on $^{13}\text{C}$ nuclei. The charged current interaction proceeds through $^{13}\text{C} + ν_e \rightarrow {}^{13}\text{N} + e^-$ which is followed, with a 10 minute half-life, by ${}^{13}\text{N} \rightarrow {}^{13}\text{C} + e^+ +ν_e .$ The detection strategy is based on the delayed coincidence between the electron and the positron. Evidence for the charged current signal is presented with a si…

@adulau@infosec.exchange
2025-07-08 08:57:00

VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification.
This paper presents VLAI, a transformer-based model that predicts software vulnerability severity levels directly from text descriptions. Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities and achieves over 82% accuracy in predicting severity categories, enabling faster and more consistent triage ahead of manual CVSS scoring. The model and dataset are open-source and integrated…

VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification
This paper presents VLAI, a transformer-based model that predicts software vulnerability severity levels directly from text descriptions. Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities and achieves over 82% accuracy in predicting severity categories, enabling faster and more consistent triage ahead of manual CVSS scoring. The model and dataset are open-source and integrated into the Vulnerability-Lookup service.

@arXiv_csDB_bot@mastoxiv.page
2025-07-30 08:10:42

Benchmarking Filtered Approximate Nearest Neighbor Search Algorithms on Transformer-based Embedding Vectors
Patrick Iff, Paul Bruegger, Marcin Chrapek, Maciej Besta, Torsten Hoefler
https://arxiv.org/abs/2507.21989

Benchmarking Filtered Approximate Nearest Neighbor Search Algorithms on Transformer-based Embedding Vectors
Advances in embedding models for text, image, audio, and video drive progress across multiple domains, including retrieval-augmented generation, recommendation systems, vehicle/person reidentification, and face recognition. Many applications in these domains require an efficient method to retrieve items that are close to a given query in the embedding space while satisfying a filter condition based on the item's attributes, a problem known as Filtered Approximate Nearest Neighbor Search (FANNS)…

@arXiv_mathCO_bot@mastoxiv.page
2025-08-28 09:37:01

Cubic vertex-transitive graphs of girth seven
Maru\v{s}a Lek\v{s}e, Micael Toledo
https://arxiv.org/abs/2508.19880 https://arxiv.org/pdf/2508.19880

Cubic vertex-transitive graphs of girth seven
In this paper we classify cubic vertex-transitive graphs of girth $7$, based on their signature. Such a graph is either a truncation of an arc-transitive dihedral scheme on a $7$-regular graph, the skeleton of a rotary map of type $\{7,3\}$, a member of an infinite family of Cayley graphs, or is one of the of the generalised Petersen graphs $\text{Pet}(13,5)$, $\text{Pet}(15,4)$, $\text{Pet}(17,4)$ or the Coxeter graph. We show that for a cubic vertex-transitive graphs $Γ$ of girth $7$, if eve…

@cark@social.tchncs.de
2025-06-24 19:59:08

Im Podcast @… habe ich erfahren, dass das #LLM Grok3 aktuell noch einigermaßen #based ist, bald wohl aber von seinem Sponsor Musk "auf Linie" gebracht wird.

Bild der Erde aus dem All. Darüber der folgende Text als Overlay:

Unterschiede des KI-Vorschlages zur aktuellen UN-Charta (1945)

1. Kein Vetorecht. Alle Mitglieder rotieren

2. Präventive Verteidigung ist unter strenger Kontrolle des Sicherheitsrats erlaubt

3. Souveränität eingeschränkt, bei schweren Menschenrechtsverletzungen

4. Konfliktprävention durch Frühwarnsysteme, Diplomatie und gezielte Sanktionen

5. Verpflichtung zu Klimazielen, globalem Mindesteinkommen und Krisenfonds fördern…

@arXiv_csCV_bot@mastoxiv.page
2025-09-05 10:14:41

TEn-CATS: Text-Enriched Audio-Visual Video Parsing with Multi-Scale Category-Aware Temporal Graph
Yaru Chen, Faegheh Sardari, Peiliang Zhang, Ruohao Guo, Yang Xiang, Zhenbo Li, Wenwu Wang
https://arxiv.org/abs/2509.04086

TEn-CATS: Text-Enriched Audio-Visual Video Parsing with Multi-Scale Category-Aware Temporal Graph
Audio-Visual Video Parsing (AVVP) task aims to identify event categories and their occurrence times in a given video with weakly supervised labels. Existing methods typically fall into two categories: (i) designing enhanced architectures based on attention mechanism for better temporal modeling, and (ii) generating richer pseudo-labels to compensate for the absence of frame-level annotations. However, the first type methods treat noisy segment-level pseudo labels as reliable supervision and the…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-03 11:56:23

A Correspondence-Driven Approach for Bilevel Decision-making with Nonconvex Lower-Level Problems
Xiaotian Jiang, Jiaxiang Li, Mingyi Hong, Shuzhong Zhang
https://arxiv.org/abs/2509.01148

A Correspondence-Driven Approach for Bilevel Decision-making with Nonconvex Lower-Level Problems
We consider bilevel optimization problems with general nonconvex lower-level objectives and show that the classical hyperfunction-based formulation is unsettled, since the global minimizer of the lower-level problem is generally unattainable. To address this issue, we propose a correspondence-driven hyperfunction $ϕ^{\text{cd}}$. In this formulation, the follower is modeled not as a rational agent always attaining a global minimizer, but as an algorithm-based bounded rational agent whose decis…

@arXiv_eessIV_bot@mastoxiv.page
2025-07-02 09:13:39

Anti-aliasing Algorithm Based on Three-dimensional Display Image
Ziyang Liu, Xingchen Xiao, Yueyang Xu
https://arxiv.org/abs/2507.00527 https://

Anti-aliasing Algorithm Based on Three-dimensional Display Image
3D-display technology has been a promising emerging area with potential to be the core of next-generation display technology. When directly observing unprocessed images and text through a naked-eye 3D display device, severe distortion and jaggedness will be displayed, which will make the display effect much worse. In this work, we try to settle down such degradation with spatial and frequency processing, furthermore, we make efforts to extract degenerate function of columnar lens array thus fun…

@arXiv_csCV_bot@mastoxiv.page
2025-09-05 10:20:21

DUDE: Diffusion-Based Unsupervised Cross-Domain Image Retrieval
Ruohong Yang, Peng Hu, Yunfan Li, Xi Peng
https://arxiv.org/abs/2509.04193 https://arxiv.or…

DUDE: Diffusion-Based Unsupervised Cross-Domain Image Retrieval
Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images of the same category across diverse domains without relying on annotations. Existing UCIR methods, which align cross-domain features for the entire image, often struggle with the domain gap, as the object features critical for retrieval are frequently entangled with domain-specific styles. To address this challenge, we propose DUDE, a novel UCIR method building upon feature disentanglement. In brief, DUDE leverages a text-…

@arXiv_csCL_bot@mastoxiv.page
2025-09-01 09:01:43

Enhancing Robustness of Autoregressive Language Models against Orthographic Attacks via Pixel-based Approach
Han Yang, Jian Lan, Yihong Liu, Hinrich Sch\"utze, Thomas Seidl
https://arxiv.org/abs/2508.21206

Enhancing Robustness of Autoregressive Language Models against Orthographic Attacks via Pixel-based Approach
Autoregressive language models are vulnerable to orthographic attacks, where input text is perturbed with characters from multilingual alphabets, leading to substantial performance degradation. This vulnerability primarily stems from the out-of-vocabulary issue inherent in subword tokenizers and their embeddings. To address this limitation, we propose a pixel-based generative language model that replaces the text-based embeddings with pixel-based representations by rendering words as individual…

@arXiv_csHC_bot@mastoxiv.page
2025-07-29 11:04:51

Beyond QWERTY: A pressure-based text input approach for XR that enables a touch-typing like experience
Fabian R\"ucker, Torben Storch
https://arxiv.org/abs/2507.20741 https…

Beyond QWERTY: A pressure-based text input approach for XR that enables a touch-typing like experience
Text input in extended reality (XR) applications remains inefficient and tedious. Most solutions are derived from the traditional keyboard layout, yet fail to translate its positive characteristics to the spatial digital realm. This limits the productive use of immersive technologies. In this work, we analyze physical keyboard input to identify key characteristics that facilitate its comfort, touch typing and high typing speeds. Building on these findings, we propose a novel pressure- based tex…

@arXiv_csSE_bot@mastoxiv.page
2025-07-04 08:46:21

Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool Selection
Martin Obaidi, Marc Herrmann, Jil Kl\"under, Kurt Schneider
https://arxiv.org/abs/2507.02137

Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool Selection
Software development relies heavily on text-based communication, making sentiment analysis a valuable tool for understanding team dynamics and supporting trustworthy AI-driven analytics in requirements engineering. However, existing sentiment analysis tools often perform inconsistently across datasets from different platforms, due to variations in communication style and content. In this study, we analyze linguistic and statistical features of 10 developer communication datasets from five pla…

@arXiv_csSD_bot@mastoxiv.page
2025-07-04 08:28:31

JoyTTS: LLM-based Spoken Chatbot With Voice Cloning
Fangru Zhou, Jun Zhao, Guoxin Wang
https://arxiv.org/abs/2507.02380 https://arxiv…

JoyTTS: LLM-based Spoken Chatbot With Voice Cloning
JoyTTS is an end-to-end spoken chatbot that combines large language models (LLM) with text-to-speech (TTS) technology, featuring voice cloning capabilities. This project is built upon the open-source MiniCPM-o and CosyVoice2 models and trained on 2000 hours of conversational data. We have also provided the complete training code to facilitate further development and optimization by the community. On the testing machine seed-tts-zh, it achieves a SS (speaker similarity) score of 0.73 and a WER (…

@arXiv_eessAS_bot@mastoxiv.page
2025-09-04 09:18:41

An Effective Strategy for Modeling Score Ordinality and Non-uniform Intervals in Automated Speaking Assessment
Tien-Hong Lo, Szu-Yu Chen, Yao-Ting Sung, Berlin Chen
https://arxiv.org/abs/2509.03372

An Effective Strategy for Modeling Score Ordinality and Non-uniform Intervals in Automated Speaking Assessment
A recent line of research on automated speaking assessment (ASA) has benefited from self-supervised learning (SSL) representations, which capture rich acoustic and linguistic patterns in non-native speech without underlying assumptions of feature curation. However, speech-based SSL models capture acoustic-related traits but overlook linguistic content, while text-based SSL models rely on ASR output and fail to encode prosodic nuances. Moreover, most prior arts treat proficiency levels as nomina…

@arXiv_csAI_bot@mastoxiv.page
2025-07-01 11:25:33

Pok\'eAI: A Goal-Generating, Battle-Optimizing Multi-agent System for Pokemon Red
Zihao Liu, Xinhang Sui, Yueran Song, Siwen Wang
https://arxiv.org/abs/2506.23689

PokéAI: A Goal-Generating, Battle-Optimizing Multi-agent System for Pokemon Red
We introduce PokéAI, the first text-based, multi-agent large language model (LLM) framework designed to autonomously play and progress through Pokémon Red. Our system consists of three specialized agents-Planning, Execution, and Critique-each with its own memory bank, role, and skill set. The Planning Agent functions as the central brain, generating tasks to progress through the game. These tasks are then delegated to the Execution Agent, which carries them out within the game environment. Up…

@arXiv_csCL_bot@mastoxiv.page
2025-09-03 14:30:23

CMRAG: Co-modality-based document retrieval and visual question answering
Wang Chen, Guanqiang Qi, Weikang Li, Yang Li
https://arxiv.org/abs/2509.02123 https://

CMRAG: Co-modality-based document retrieval and visual question answering
Retrieval-Augmented Generation (RAG) has become a core paradigm in document question answering tasks. However, existing methods have limitations when dealing with multimodal documents: one category of methods relies on layout analysis and text extraction, which can only utilize explicit text information and struggle to capture images or unstructured content; the other category treats document segmentation as visual input and directly passes it to visual language models (VLMs) for processing, ye…

@Techmeme@techhub.social
2025-08-26 11:10:47

Spotify launches a DM feature to let users share audio and send text-based messages, available to mobile users over 16 years old in "select markets" this week (Jess Weatherbed/The Verge)
https://www.theverge.com/news/765771/spotify-messages-dms-audio…

Spotify is adding DMs
Spotify’s direct messaging feature allows Free and Premium users to share music, podcast, and audiobook recommendations with each other without leaving the app.

@arXiv_csCV_bot@mastoxiv.page
2025-09-03 15:02:53

TeRA: Rethinking Text-driven Realistic 3D Avatar Generation
Yanwen Wang, Yiyu Zhuang, Jiawei Zhang, Li Wang, Yifei Zeng, Xun Cao, Xinxin Zuo, Hao Zhu
https://arxiv.org/abs/2509.02466

TeRA: Rethinking Text-driven Realistic 3D Avatar Generation
In this paper, we rethink text-to-avatar generative models by proposing TeRA, a more efficient and effective framework than the previous SDS-based models and general large 3D generative models. Our approach employs a two-stage training strategy for learning a native 3D avatar generative model. Initially, we distill a decoder to derive a structured latent space from a large human reconstruction model. Subsequently, a text-controlled latent diffusion model is trained to generate photorealistic 3D…

@arXiv_csCR_bot@mastoxiv.page
2025-08-27 09:56:53

The Double-edged Sword of LLM-based Data Reconstruction: Understanding and Mitigating Contextual Vulnerability in Word-level Differential Privacy Text Sanitization
Stephen Meisenbacher, Alexandra Klymenko, Andreea-Elena Bodea, Florian Matthes
https://arxiv.org/abs/2508.18976

The Double-edged Sword of LLM-based Data Reconstruction: Understanding and Mitigating Contextual Vulnerability in Word-level Differential Privacy Text Sanitization
Differentially private text sanitization refers to the process of privatizing texts under the framework of Differential Privacy (DP), providing provable privacy guarantees while also empirically defending against adversaries seeking to harm privacy. Despite their simplicity, DP text sanitization methods operating at the word level exhibit a number of shortcomings, among them the tendency to leave contextual clues from the original texts due to randomization during sanitization $\unicode{x2013}$…

@arXiv_csSE_bot@mastoxiv.page
2025-09-03 07:48:42

LLM-based Triplet Extraction for Automated Ontology Generation in Software Engineering Standards
Songhui Yue
https://arxiv.org/abs/2509.00140 https://arxiv…

LLM-based Triplet Extraction for Automated Ontology Generation in Software Engineering Standards
Ontologies have supported knowledge representation and whitebox reasoning for decades; thus, the automated ontology generation (AOG) plays a crucial role in scaling their use. Software engineering standards (SES) consist of long, unstructured text (with high noise) and paragraphs with domain-specific terms. In this setting, relation triple extraction (RTE), together with term extraction, constitutes the first stage toward AOG. This work proposes an open-source large language model (LLM)-assiste…

@arXiv_csCL_bot@mastoxiv.page
2025-09-05 10:06:41

Decoding the Poetic Language of Emotion in Korean Modern Poetry: Insights from a Human-Labeled Dataset and AI Modeling
Iro Lim, Haein Ji, Byungjun Kim
https://arxiv.org/abs/2509.03932

Decoding the Poetic Language of Emotion in Korean Modern Poetry: Insights from a Human-Labeled Dataset and AI Modeling
This study introduces KPoEM (Korean Poetry Emotion Mapping) , a novel dataset for computational emotion analysis in modern Korean poetry. Despite remarkable progress in text-based emotion classification using large language models, poetry-particularly Korean poetry-remains underexplored due to its figurative language and cultural specificity. We built a multi-label emotion dataset of 7,662 entries, including 7,007 line-level entries from 483 poems and 615 work-level entries, annotated with 44 f…

@arXiv_eessAS_bot@mastoxiv.page
2025-09-04 09:03:51

A Study on Zero-Shot Non-Intrusive Speech Intelligibility for Hearing Aids Using Large Language Models
Ryandhimas E. Zezario, Dyah A. M. G. Wisnu, Hsin-Min Wang, Yu Tsao
https://arxiv.org/abs/2509.03021

A Study on Zero-Shot Non-Intrusive Speech Intelligibility for Hearing Aids Using Large Language Models
This work focuses on zero-shot non-intrusive speech assessment for hearing aids (HA) using large language models (LLMs). Specifically, we introduce GPT-Whisper-HA, an extension of GPT-Whisper, a zero-shot non-intrusive speech assessment model based on LLMs. GPT-Whisper-HA is designed for speech assessment for HA, incorporating MSBG hearing loss and NAL-R simulations to process audio input based on each individual's audiogram, two automatic speech recognition (ASR) modules for audio-to-text repr…

@arXiv_csCL_bot@mastoxiv.page
2025-08-04 10:00:20

MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection under Cloaking Perturbations
Qiyao Xue, Yuchen Dou, Ryan Shi, Xiang Lorraine Li, Wei Gao
https://arxiv.org/abs/2508.00760

MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection under Cloaking Perturbations
Hate speech detection on Chinese social networks presents distinct challenges, particularly due to the widespread use of cloaking techniques designed to evade conventional text-based detection systems. Although large language models (LLMs) have recently improved hate speech detection capabilities, the majority of existing work has concentrated on English datasets, with limited attention given to multimodal strategies in the Chinese context. In this study, we propose MMBERT, a novel BERT-based m…

@arXiv_csIR_bot@mastoxiv.page
2025-07-28 08:31:21

Agent0: Leveraging LLM Agents to Discover Multi-value Features from Text for Enhanced Recommendations
Bla\v{z} \v{S}krlj, Beno\^it Guilleminot, Andra\v{z} Tori
https://arxiv.org/abs/2507.18993

Agent0: Leveraging LLM Agents to Discover Multi-value Features from Text for Enhanced Recommendations
Large language models (LLMs) and their associated agent-based frameworks have significantly advanced automated information extraction, a critical component of modern recommender systems. While these multitask frameworks are widely used in code generation, their application in data-centric research is still largely untapped. This paper presents Agent0, an LLM-driven, agent-based system designed to automate information extraction and feature construction from raw, unstructured text. Categorical f…

@arXiv_csSD_bot@mastoxiv.page
2025-08-26 08:26:46

RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer
Neeraj Matiyali, Siddharth Srivastava, Gaurav Sharma
https://arxiv.org/abs/2508.17031 https:/…

RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer
We propose a method for the task of text-conditioned speech insertion, i.e. inserting a speech sample in an input speech sample, conditioned on the corresponding complete text transcript. An example use case of the task would be to update the speech audio when corrections are done on the corresponding text transcript. The proposed method follows a transformer-based non-autoregressive approach that allows speech insertions of variable lengths, which are dynamically determined during inference, b…

@arXiv_csAI_bot@mastoxiv.page
2025-07-01 11:13:53

Assessing GPTZero's Accuracy in Identifying AI vs. Human-Written Essays
Selin Dik, Osman Erdem, Mehmet Dik
https://arxiv.org/abs/2506.23517 https://

Assessing GPTZero's Accuracy in Identifying AI vs. Human-Written Essays
As the use of AI tools by students has become more prevalent, instructors have started using AI detection tools like GPTZero and QuillBot to detect AI written text. However, the reliability of these detectors remains uncertain. In our study, we focused mostly on the success rate of GPTZero, the most-used AI detector, in identifying AI-generated texts based on different lengths of randomly submitted essays: short (40-100 word count), medium (100-350 word count), and long (350-800 word count). We…

@arXiv_csHC_bot@mastoxiv.page
2025-09-03 11:50:43

Dissecting Atomic Facts: Visual Analytics for Improving Fact Annotations in Language Model Evaluation
Manuel Schmidt, Daniel A. Keim, Frederik L. Dennig
https://arxiv.org/abs/2509.01460

Dissecting Atomic Facts: Visual Analytics for Improving Fact Annotations in Language Model Evaluation
Factuality evaluation of large language model (LLM) outputs requires decomposing text into discrete "atomic" facts. However, existing definitions of atomicity are underspecified, with empirical results showing high disagreement among annotators, both human and model-based, due to unresolved ambiguity in fact decomposition. We present a visual analytics concept to expose and analyze annotation inconsistencies in fact extraction. By visualizing semantic alignment, granularity and referential depe…

@arXiv_csCL_bot@mastoxiv.page
2025-09-05 10:21:31

Explicit and Implicit Data Augmentation for Social Event Detection
Congbo Ma, Yuxia Wang, Jia Wu, Jian Yang, Jing Du, Zitai Qiu, Qing Li, Hu Wang, Preslav Nakov
https://arxiv.org/abs/2509.04202

Explicit and Implicit Data Augmentation for Social Event Detection
Social event detection involves identifying and categorizing important events from social media, which relies on labeled data, but annotation is costly and labor-intensive. To address this problem, we propose Augmentation framework for Social Event Detection (SED-Aug), a plug-and-play dual augmentation framework, which combines explicit text-based and implicit feature-space augmentation to enhance data diversity and model robustness. The explicit augmentation utilizes large language models to e…

@arXiv_csSD_bot@mastoxiv.page
2025-08-04 09:13:11

AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation
Le Wang, Jun Wang, Feng Deng, Chen Zhang, Kun Gai, Di Zhang
https://arxiv.org/abs/2508.00733

AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation
We present AudioGen-Omni - a unified approach based on multimodal diffusion transformers (MMDit), capable of generating high-fidelity audio, speech, and songs coherently synchronized with the input video. AudioGen-Omni introduces a novel joint training paradigm that seamlessly integrates large-scale video-text-audio corpora, enabling a model capable of generating semantically rich, acoustically diverse audio conditioned on multimodal inputs and adaptable to a wide range of audio generation task…

@arXiv_csCL_bot@mastoxiv.page
2025-07-02 10:09:10

Capsule Network-Based Semantic Intent Modeling for Human-Computer Interaction
Shixiao Wang, Yifan Zhuang, Runsheng Zhang, Zhijun Song
https://arxiv.org/abs/2507.00540

Capsule Network-Based Semantic Intent Modeling for Human-Computer Interaction
This paper proposes a user semantic intent modeling algorithm based on Capsule Networks to address the problem of insufficient accuracy in intent recognition for human-computer interaction. The method represents semantic features in input text through a vectorized capsule structure. It uses a dynamic routing mechanism to transfer information across multiple capsule layers. This helps capture hierarchical relationships and part-whole structures between semantic entities more effectively. The mod…

@arXiv_csCV_bot@mastoxiv.page
2025-08-29 10:31:11

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Yibin Wang, Zhimin Li, Yuhang Zang, Yujie Zhou, Jiazi Bu, Chunyu Wang, Qinglin Lu, Cheng Jin, Jiaqi Wang
https://arxiv.org/abs/2508.20751

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Recent advancements highlight the importance of GRPO-based reinforcement learning methods and benchmarking in enhancing text-to-image (T2I) generation. However, current methods using pointwise reward models (RM) for scoring generated images are susceptible to reward hacking. We reveal that this happens when minimal score differences between images are amplified after normalization, creating illusory advantages that drive the model to over-optimize for trivial gains, ultimately destabilizing the…

@arXiv_eessAS_bot@mastoxiv.page
2025-07-03 08:38:30

SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech
Cheng Zhuangfei, Zhang Guangyan, Tu Zehai, Song Yangyang, Mao Shuiyang, Jiao Xiaoqi, Li Jingyu, Guo Yiwen, Wu Jiasong
https://arxiv.org/abs/2507.01348

SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech
Foreign accent conversion (FAC) in speech processing remains a challenging task. Building on the remarkable success of large language models (LLMs) in Text-to-Speech (TTS) tasks, this study investigates the adaptation of LLM-based techniques for FAC, which we term SpeechAccentLLM. At the core of this framework, we introduce SpeechCodeVAE, the first model to integrate connectionist temporal classification (CTC) directly into codebook discretization for speech content tokenization. This novel arc…

@arXiv_csHC_bot@mastoxiv.page
2025-07-04 07:42:01

PAL: Designing Conversational Agents as Scalable, Cooperative Patient Simulators for Palliative-Care Training
Neil K. R. Sehgal, Hita Kambhamettu, Allen Chang, Andrew Zhu, Lyle Ungar, Sharath Chandra Guntuku
https://arxiv.org/abs/2507.02122

PAL: Designing Conversational Agents as Scalable, Cooperative Patient Simulators for Palliative-Care Training
Effective communication in serious illness and palliative care is essential but often under-taught due to limited access to training resources like standardized patients. We present PAL (Palliative Assisted Learning-bot), a conversational system that simulates emotionally nuanced patient interactions and delivers structured feedback grounded in an existing empathy-based framework. PAL supports text and voice modalities and is designed to scaffold clinical skill-building through repeated, low-co…

@arXiv_csSD_bot@mastoxiv.page
2025-07-02 08:56:30

AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio embedding Sequences
Minoru Kishi, Ryosuke Sakai, Shinnosuke Takamichi, Yusuke Kanamori, Yuki Okamoto
https://arxiv.org/abs/2507.00475

AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio embedding Sequences
We propose a novel objective evaluation metric for synthesized audio in text-to-audio (TTA), aiming to improve the performance of TTA models. In TTA, subjective evaluation of the synthesized sound is an important, but its implementation requires monetary costs. Therefore, objective evaluation such as mel-cepstral distortion are used, but the correlation between these objective metrics and subjective evaluation values is weak. Our proposed objective evaluation metric, AudioBERTScore, calculates …

@arXiv_csIR_bot@mastoxiv.page
2025-06-17 09:40:39

T-TExTS (Teaching Text Expansion for Teacher Scaffolding): Enhancing Text Selection in High School Literature through Knowledge Graph-Based Recommendation
Nirmal Gelal, Chloe Snow, Ambyr Rios, Hande K\"u\c{c}\"uk McGinty
https://arxiv.org/abs/2506.12075

T-TExTS (Teaching Text Expansion for Teacher Scaffolding): Enhancing Text Selection in High School Literature through Knowledge Graph-Based Recommendation
The implementation of transformational pedagogy in secondary education classrooms requires a broad multiliteracy approach. Due to limited planning time and resources, high school English Literature teachers often struggle to curate diverse, thematically aligned literature text sets. This study addresses the critical need for a tool that provides scaffolds for novice educators in selecting literature texts that are diverse -- in terms of genre, theme, subtheme, and author -- yet similar in conte…

@arXiv_csCV_bot@mastoxiv.page
2025-07-04 10:20:51

FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models
Yuxuan Wang, Tianwei Cao, Huayu Zhang, Zhongjiang He, Kongming Liang, Zhanyu Ma
https://arxiv.org/abs/2507.02714

FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models
Image generation has achieved remarkable progress with the development of large-scale text-to-image models, especially diffusion-based models. However, generating human images with plausible details, such as faces or hands, remains challenging due to insufficient supervision of local regions during training. To address this issue, we propose FairHuman, a multi-objective fine-tuning approach designed to enhance both global and local generation quality fairly. Specifically, we first construct thr…

@arXiv_csCV_bot@mastoxiv.page
2025-08-04 13:23:15

Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/4]:
- CPCL: Cross-Modal Prototypical Contrastive Learning for Weakly Supervised Text-based Person Retri...
Xinpeng Zhao, Yanwei Zheng, Chuanlin Lan, Xiaowei Zhang, Bowen Huang, Jibin Yang, Dongxiao Yu

@arXiv_csHC_bot@mastoxiv.page
2025-07-02 09:04:50

Examining the Social Communication and Community Engagement of Autistic Adults through an Asynchronous Focus Group
Blade Frisch, Betts Peters, Keith Vertanen
https://arxiv.org/abs/2507.00202

Examining the Social Communication and Community Engagement of Autistic Adults through an Asynchronous Focus Group
Purpose: Little research has explored the communication needs of autistic adults and how their needs differ from those of other disabled populations. Augmentative and Alternative Communication (AAC) can support these communication needs, but more guidance is needed on how to design AAC to support this population. Materials and Methods: We conducted an online, asynchronous, text-based focus group with five autistic adults to explore their social communication and community engagement and how A…

@arXiv_csCL_bot@mastoxiv.page
2025-08-29 10:19:31

Addressing Tokenization Inconsistency in Steganography and Watermarking Based on Large Language Models
Ruiyi Yan, Yugo Murawaki
https://arxiv.org/abs/2508.20718 https://

Addressing Tokenization Inconsistency in Steganography and Watermarking Based on Large Language Models
Large language models have significantly enhanced the capacities and efficiency of text generation. On the one hand, they have improved the quality of text-based steganography. On the other hand, they have also underscored the importance of watermarking as a safeguard against malicious misuse. In this study, we focus on tokenization inconsistency (TI) between Alice and Bob in steganography and watermarking, where TI can undermine robustness. Our investigation reveals that the problematic tokens…

@arXiv_csCL_bot@mastoxiv.page
2025-08-29 10:16:11

Leveraging Generative Models for Real-Time Query-Driven Text Summarization in Large-Scale Web Search
Zeyu Xiong, Yixuan Nan, Li Gao, Hengzhu Tang, Shuaiqiang Wang, Junfeng Wang, Dawei Yin
https://arxiv.org/abs/2508.20559

Leveraging Generative Models for Real-Time Query-Driven Text Summarization in Large-Scale Web Search
In the dynamic landscape of large-scale web search, Query-Driven Text Summarization (QDTS) aims to generate concise and informative summaries from textual documents based on a given query, which is essential for improving user engagement and facilitating rapid decision-making. Traditional extractive summarization models, based primarily on ranking candidate summary segments, have been the dominant approach in industrial applications. However, these approaches suffer from two key limitations: 1)…

@arXiv_csCL_bot@mastoxiv.page
2025-08-01 10:14:21

MUST-RAG: MUSical Text Question Answering with Retrieval Augmented Generation
Daeyong Kwon, SeungHeon Doh, Juhan Nam
https://arxiv.org/abs/2507.23334 https://

MUST-RAG: MUSical Text Question Answering with Retrieval Augmented Generation
Recent advancements in Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains. While they exhibit strong zero-shot performance on various tasks, LLMs' effectiveness in music-related applications remains limited due to the relatively small proportion of music-specific knowledge in their training data. To address this limitation, we propose MusT-RAG, a comprehensive framework based on Retrieval Augmented Generation (RAG) to adapt general-purpose LLMs for tex…

@arXiv_csCV_bot@mastoxiv.page
2025-07-29 12:15:51

Security Tensors as a Cross-Modal Bridge: Extending Text-Aligned Safety to Vision in LVLM
Shen Li, Liuyi Yao, Wujia Niu, Lan Zhang, Yaliang Li
https://arxiv.org/abs/2507.20994 h…

Security Tensors as a Cross-Modal Bridge: Extending Text-Aligned Safety to Vision in LVLM
Large visual-language models (LVLMs) integrate aligned large language models (LLMs) with visual modules to process multimodal inputs. However, the safety mechanisms developed for text-based LLMs do not naturally extend to visual modalities, leaving LVLMs vulnerable to harmful image inputs. To address this cross-modal safety gap, we introduce security tensors - trainable input vectors applied during inference through either the textual or visual modality. These tensors transfer textual safety al…

@arXiv_csCL_bot@mastoxiv.page
2025-09-01 09:45:12

AllSummedUp: un framework open-source pour comparer les metriques d'evaluation de resume
Tanguy Herserant, Vincent Guigue
https://arxiv.org/abs/2508.21389 https://

AllSummedUp: un framework open-source pour comparer les metriques d'evaluation de resume
This paper investigates reproducibility challenges in automatic text summarization evaluation. Based on experiments conducted across six representative metrics ranging from classical approaches like ROUGE to recent LLM-based methods (G-Eval, SEval-Ex), we highlight significant discrepancies between reported performances in the literature and those observed in our experimental setting. We introduce a unified, open-source framework, applied to the SummEval dataset and designed to support fair and…

@arXiv_csCV_bot@mastoxiv.page
2025-08-26 12:31:56

Sealing The Backdoor: Unlearning Adversarial Text Triggers In Diffusion Models Using Knowledge Distillation
Ashwath Vaithinathan Aravindan, Abha Jha, Matthew Salaway, Atharva Sandeep Bhide, Duygu Nur Yaldiz
https://arxiv.org/abs/2508.18235

Sealing The Backdoor: Unlearning Adversarial Text Triggers In Diffusion Models Using Knowledge Distillation
Text-to-image diffusion models have revolutionized generative AI, but their vulnerability to backdoor attacks poses significant security risks. Adversaries can inject imperceptible textual triggers into training data, causing models to generate manipulated outputs. Although text-based backdoor defenses in classification models are well-explored, generative models lack effective mitigation techniques against. We address this by selectively erasing the model's learned associations between adversa…

@arXiv_csCL_bot@mastoxiv.page
2025-07-21 09:44:40

An Enhanced Model-based Approach for Short Text Clustering
Enhao Cheng, Shoujia Zhang, Jianhua Yin, Xuemeng Song, Tian Gan, Liqiang Nie
https://arxiv.org/abs/2507.13793

An Enhanced Model-based Approach for Short Text Clustering
Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook. Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches. This task is inherently challenging due to the sparse, large-scale, and high-dimensional characteristics of the short text data. Furthermore, the computational intensity required by representation learning significantly increa…

@arXiv_csCV_bot@mastoxiv.page
2025-07-23 10:31:22

Enhancing Remote Sensing Vision-Language Models Through MLLM and LLM-Based High-Quality Image-Text Dataset Generation
Yiguo He, Junjie Zhu, Yiying Li, Xiaoyu Zhang, Chunping Qiu, Jun Wang, Qiangjuan Huang, Ke Yang
https://arxiv.org/abs/2507.16716

Enhancing Remote Sensing Vision-Language Models Through MLLM and LLM-Based High-Quality Image-Text Dataset Generation
The application of Vision-language foundation models (VLFMs) to remote sensing (RS) imagery has garnered significant attention due to their superior capability in various downstream tasks. A key challenge lies in the scarcity of high-quality, large-scale, image-text paired training data. Recently, several works introduced extensive image-text datasets for RS and trained their VLFMs. However, due to the rudimentary methods used for generating captions, the quality of datasets is suboptimal, requ…

@arXiv_csCL_bot@mastoxiv.page
2025-09-01 09:49:32

Reasoning-Intensive Regression
Diane Tchuindjo, Omar Khattab
https://arxiv.org/abs/2508.21762 https://arxiv.org/pdf/2508.21762

Reasoning-Intensive Regression
AI researchers and practitioners increasingly apply large language models (LLMs) to what we call reasoning-intensive regression (RiR), i.e. deducing subtle numerical properties from text. Unlike standard language regression tasks, e.g. for sentiment or similarity, RiR often appears instead in ad-hoc problems like rubric-based scoring or domain-specific retrieval, where much deeper analysis of text is required while only limited task-specific training data and computation are available. We cast …

@arXiv_csCL_bot@mastoxiv.page
2025-08-01 10:20:21

Arabic Hate Speech Identification and Masking in Social Media using Deep Learning Models and Pre-trained Models Fine-tuning
Salam Thabet Doghmash, Motaz Saad
https://arxiv.org/abs/2507.23661

Arabic Hate Speech Identification and Masking in Social Media using Deep Learning Models and Pre-trained Models Fine-tuning
Hate speech identification in social media has become an increasingly important issue in recent years. In this research, we address two problems: 1) to detect hate speech in Arabic text, 2) to clean a given text from hate speech. The meaning of cleaning here is replacing each bad word with stars based on the number of letters for each word. Regarding the first problem, we conduct several experiments using deep learning models and transformers to determine the best model in terms of the F1 score…

@arXiv_csCL_bot@mastoxiv.page
2025-08-29 10:21:21

GUARD: Glocal Uncertainty-Aware Robust Decoding for Effective and Efficient Open-Ended Text Generation
Yuanhao Ding, Esteban Garces Arias, Meimingwei Li, Julian Rodemann, Matthias A{\ss}enmacher, Danlu Chen, Gaojuan Fan, Christian Heumann, Chongsheng Zhang
https://arxiv.org/abs/2508.20757

GUARD: Glocal Uncertainty-Aware Robust Decoding for Effective and Efficient Open-Ended Text Generation
Open-ended text generation faces a critical challenge: balancing coherence with diversity in LLM outputs. While contrastive search-based decoding strategies have emerged to address this trade-off, their practical utility is often limited by hyperparameter dependence and high computational costs. We introduce GUARD, a self-adaptive decoding method that effectively balances these competing objectives through a novel "Glocal" uncertainty-driven framework. GUARD combines global entropy estimates wi…

@arXiv_csCL_bot@mastoxiv.page
2025-09-03 14:37:23

An Ensemble Classification Approach in A Multi-Layered Large Language Model Framework for Disease Prediction
Ali Hamdi, Malak Mohamed, Rokaia Emad, Khaled Shaban
https://arxiv.org/abs/2509.02446

An Ensemble Classification Approach in A Multi-Layered Large Language Model Framework for Disease Prediction
Social telehealth has made remarkable progress in healthcare by allowing patients to post symptoms and participate in medical consultations remotely. Users frequently post symptoms on social media and online health platforms, creating a huge repository of medical data that can be leveraged for disease classification. Large language models (LLMs) such as LLAMA3 and GPT-3.5, along with transformer-based models like BERT, have demonstrated strong capabilities in processing complex medical text. In…

@arXiv_csCV_bot@mastoxiv.page
2025-08-18 09:54:00

Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models
Erez Meoded
https://arxiv.org/abs/2508.11499 https://arxiv.org/pdf/25…

Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models
Historical handwritten text recognition (HTR) is essential for unlocking the cultural and scholarly value of archival documents, yet digitization is often hindered by scarce transcriptions, linguistic variation, and highly diverse handwriting styles. In this study, we apply TrOCR, a state-of-the-art transformer-based HTR model, to 16th-century Latin manuscripts authored by Rudolf Gwalther. We investigate targeted image preprocessing and a broad suite of data augmentation techniques, introducing…

@arXiv_csCL_bot@mastoxiv.page
2025-07-17 10:08:30

Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding
Feng Xiao, Jicong Fan
https://arxiv.org/abs/2507.12295 https://

Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding
Text anomaly detection is a critical task in natural language processing (NLP), with applications spanning fraud detection, misinformation identification, spam detection and content moderation, etc. Despite significant advances in large language models (LLMs) and anomaly detection algorithms, the absence of standardized and comprehensive benchmarks for evaluating the existing anomaly detection methods on text data limits rigorous comparison and development of innovative approaches. This work pe…

@arXiv_csCL_bot@mastoxiv.page
2025-07-28 09:57:21

Detection of Adverse Drug Events in Dutch clinical free text documents using Transformer Models: benchmark study
Rachel M. Murphy (Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, The Netherlands), Nishant Mishra (Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, The Netherlands), Nicolette F. de Keizer (Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, T…

Detection of Adverse Drug Events in Dutch clinical free text documents using Transformer Models: benchmark study
In this study, we set a benchmark for adverse drug event (ADE) detection in Dutch clinical free text documents using several transformer models, clinical scenarios and fit-for-purpose performance measures. We trained a Bidirectional Long Short-Term Memory (Bi-LSTM) model and four transformer-based Dutch and/or multilingual encoder models (BERTje, RobBERT, MedRoBERTa.nl, and NuNER) for the tasks of named entity recognition (NER) and relation classification (RC) using 102 richly annotated Dutch I…

@arXiv_csCL_bot@mastoxiv.page
2025-08-22 10:02:21

Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models
Tobias Schreieder, Tim Schopf, Michael F\"arber
https://arxiv.org/abs/2508.15396

Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models
The increasing adoption of large language models (LLMs) has been accompanied by growing concerns regarding their reliability and trustworthiness. As a result, a growing body of research focuses on evidence-based text generation with LLMs, aiming to link model outputs to supporting evidence to ensure traceability and verifiability. However, the field is fragmented due to inconsistent terminology, isolated evaluation practices, and a lack of unified benchmarks. To bridge this gap, we systematical…

@arXiv_csCL_bot@mastoxiv.page
2025-07-02 10:22:50

ProxAnn: Use-Oriented Evaluations of Topic Models and Document Clustering
Alexander Hoyle, Lorena Calvo-Bartolom\'e, Jordan Boyd-Graber, Philip Resnik
https://arxiv.org/abs/2507.00828

ProxAnn: Use-Oriented Evaluations of Topic Models and Document Clustering
Topic model and document-clustering evaluations either use automated metrics that align poorly with human preferences or require expert labels that are intractable to scale. We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners' real-world usage of models. Annotators -- or an LLM-based proxy -- review text items assigned to a topic or cluster, infer a category for the group, then apply that category to other documents. Using this p…

@arXiv_csCL_bot@mastoxiv.page
2025-09-01 08:06:03

TrInk: Ink Generation with Transformer Network
Zezhong Jin, Shubhang Desai, Xu Chen, Biyi Fang, Zhuoyi Huang, Zhe Li, Chong-Xin Gan, Xiao Tu, Man-Wai Mak, Yan Lu, Shujie Liu
https://arxiv.org/abs/2508.21098

TrInk: Ink Generation with Transformer Network
In this paper, we propose TrInk, a Transformer-based model for ink generation, which effectively captures global dependencies. To better facilitate the alignment between the input text and generated stroke points, we introduce scaled positional embeddings and a Gaussian memory mask in the cross-attention module. Additionally, we design both subjective and objective evaluation pipelines to comprehensively assess the legibility and style consistency of the generated handwriting. Experiments demon…

@arXiv_csCL_bot@mastoxiv.page
2025-07-25 10:06:42

Restoring Rhythm: Punctuation Restoration Using Transformer Models for Bangla, a Low-Resource Language
Md Obyedullahil Mamun, Md Adyelullahil Mamun, Arif Ahmad, Md. Imran Hossain Emu
https://arxiv.org/abs/2507.18448

Restoring Rhythm: Punctuation Restoration Using Transformer Models for Bangla, a Low-Resource Language
Punctuation restoration enhances the readability of text and is critical for post-processing tasks in Automatic Speech Recognition (ASR), especially for low-resource languages like Bangla. In this study, we explore the application of transformer-based models, specifically XLM-RoBERTa-large, to automatically restore punctuation in unpunctuated Bangla text. We focus on predicting four punctuation marks: period, comma, question mark, and exclamation mark across diverse text domains. To address the…

@arXiv_csCL_bot@mastoxiv.page
2025-07-28 09:51:01

AutoPCR: Automated Phenotype Concept Recognition by Prompting
Yicheng Tao, Yuanhao Huang, Jie Liu
https://arxiv.org/abs/2507.19315 https://arxiv.org/pdf/25…

AutoPCR: Automated Phenotype Concept Recognition by Prompting
Phenotype concept recognition (CR) is a fundamental task in biomedical text mining, enabling applications such as clinical diagnostics and knowledge graph construction. However, existing methods often require ontology-specific training and struggle to generalize across diverse text types and evolving biomedical terminology. We present AutoPCR, a prompt-based phenotype CR method that does not require ontology-specific training. AutoPCR performs CR in three stages: entity extraction using a hybri…

@arXiv_csCL_bot@mastoxiv.page
2025-08-21 08:07:29

Confidence Estimation for Text-to-SQL in Large Language Models
Sepideh Entezari Maleki, Mohammadreza Pourreza, Davood Rafiei
https://arxiv.org/abs/2508.14056 https://

Confidence Estimation for Text-to-SQL in Large Language Models
Confidence estimation for text-to-SQL aims to assess the reliability of model-generated SQL queries without having access to gold answers. We study this problem in the context of large language models (LLMs), where access to model weights and gradients is often constrained. We explore both black-box and white-box confidence estimation strategies, evaluating their effectiveness on cross-domain text-to-SQL benchmarks. Our evaluation highlights the superior performance of consistency-based methods…

@arXiv_csCL_bot@mastoxiv.page
2025-09-01 07:39:52

Granite Embedding R2 Models
Parul Awasthy, Aashka Trivedi, Yulong Li, Meet Doshi, Riyaz Bhat, Vignesh P, Vishwajeet Kumar, Yushu Yang, Bhavani Iyer, Abraham Daniels, Rudra Murthy, Ken Barker, Martin Franz, Madison Lee, Todd Ward, Salim Roukos, David Cox, Luis Lastras, Jaydeep Sen, Radu Florian
https://arxiv.org/abs/2508.21085

Granite Embedding R2 Models
We introduce the Granite Embedding R2 models, a comprehensive family of high-performance English encoder-based embedding models engineered for enterprise-scale dense retrieval applications. Building upon our first-generation release, these models deliver substantial improvements, including 16x expanded context length (8,192 tokens), state-of-the-art performance across diverse retrieval domains - text, code, long-document search, multi-turn conversational, and tabular data - and measurable speed…

@arXiv_csCL_bot@mastoxiv.page
2025-08-20 07:31:39

Fair Play in the Newsroom: Actor-Based Filtering Gender Discrimination in Text Corpora
Stefanie Urchs, Veronika Thurner, Matthias A{\ss}enmacher, Christian Heumann, Stephanie Thiemichen
https://arxiv.org/abs/2508.13169

Fair Play in the Newsroom: Actor-Based Filtering Gender Discrimination in Text Corpora
Large language models are increasingly shaping digital communication, yet their outputs often reflect structural gender imbalances that originate from their training data. This paper presents an extended actor-level pipeline for detecting and mitigating gender discrimination in large-scale text corpora. Building on prior work in discourse-aware fairness analysis, we introduce new actor-level metrics that capture asymmetries in sentiment, syntactic agency, and quotation styles. The pipeline supp…

Tootfinder

Opt-in global Mastodon full text search. Join the index!