Tootfinder

@arXiv_csCV_bot@mastoxiv.page
2025-09-17 10:55:20

TexTAR : Textual Attribute Recognition in Multi-domain and Multi-lingual Document Images
Rohan Kumar, Jyothi Swaroopa Jinka, Ravi Kiran Sarvadevabhatla
https://arxiv.org/abs/2509.13151

TexTAR : Textual Attribute Recognition in Multi-domain and Multi-lingual Document Images
Recognizing textual attributes such as bold, italic, underline and strikeout is essential for understanding text semantics, structure, and visual presentation. These attributes highlight key information, making them crucial for document analysis. Existing methods struggle with computational efficiency or adaptability in noisy, multilingual settings. To address this, we introduce TexTAR, a multi-task, context-aware Transformer for Textual Attribute Recognition (TAR). Our novel data selection pip…

@arXiv_eessAS_bot@mastoxiv.page
2025-07-14 07:54:42

RawTFNet: A Lightweight CNN Architecture for Speech Anti-spoofing
Yang Xiao, Ting Dang, Rohan Kumar Das
https://arxiv.org/abs/2507.08227 https://

RawTFNet: A Lightweight CNN Architecture for Speech Anti-spoofing
Automatic speaker verification (ASV) systems are often affected by spoofing attacks. Recent transformer-based models have improved anti-spoofing performance by learning strong feature representations. However, these models usually need high computing power. To address this, we introduce RawTFNet, a lightweight CNN model designed for audio signals. The RawTFNet separates feature processing along time and frequency dimensions, which helps to capture the fine-grained details of synthetic speech. W…

@arXiv_csSD_bot@mastoxiv.page
2025-08-07 08:54:14

ESDD 2026: Environmental Sound Deepfake Detection Challenge Evaluation Plan
Han Yin, Yang Xiao, Rohan Kumar Das, Jisheng Bai, Ting Dang
https://arxiv.org/abs/2508.04529 https://…

ESDD 2026: Environmental Sound Deepfake Detection Challenge Evaluation Plan
Recent advances in audio generation systems have enabled the creation of highly realistic and immersive soundscapes, which are increasingly used in film and virtual reality. However, these audio generators also raise concerns about potential misuse, such as generating deceptive audio content for fake videos and spreading misleading information. Existing datasets for environmental sound deepfake detection (ESDD) are limited in scale and audio types. To address this gap, we have proposed EnvSDD, …

@arXiv_physicsoptics_bot@mastoxiv.page
2025-06-26 08:11:10

Machine-Learning-Assisted Photonic Device Development: A Multiscale Approach from Theory to Characterization
Yuheng Chen, Alexander Montes McNeil, Taehyuk Park, Blake A. Wilson, Vaishnavi Iyer, Michael Bezick, Jae-Ik Choi, Rohan Ojha, Pravin Mahendran, Daksh Kumar Singh, Geetika Chitturi, Peigang Chen, Trang Do, Alexander V. Kildishev, Vladimir M. Shalaev, Michael Moebius, Wenshan Cai, Yongmin Liu, Alexandra Boltasseva

Machine-Learning-Assisted Photonic Device Development: A Multiscale Approach from Theory to Characterization
Photonic device development (PDD) has achieved remarkable success in designing and implementing new devices for controlling light across various wavelengths, scales, and applications, including telecommunications, imaging, sensing, and quantum information processing. PDD is an iterative, five-step process that consists of: i) deriving device behavior from design parameters, ii) simulating device performance, iii) finding the optimal candidate designs from simulations, iv) fabricating the optima…

@arXiv_eessAS_bot@mastoxiv.page
2025-08-07 08:44:03

Multilingual Source Tracing of Speech Deepfakes: A First Benchmark
Xi Xuan, Yang Xiao, Rohan Kumar Das, Tomi Kinnunen
https://arxiv.org/abs/2508.04143 https://

Multilingual Source Tracing of Speech Deepfakes: A First Benchmark
Recent progress in generative AI has made it increasingly easy to create natural-sounding deepfake speech from just a few seconds of audio. While these tools support helpful applications, they also raise serious concerns by making it possible to generate convincing fake speech in many languages. Current research has largely focused on detecting fake speech, but little attention has been given to tracing the source models used to generate it. This paper introduces the first benchmark for multili…

@arXiv_eessAS_bot@mastoxiv.page
2025-08-27 12:04:58

Replaced article(s) found for eess.AS. https://arxiv.org/list/eess.AS/new
[1/1]:
- DG-SED: Domain Generalization for Sound Event Detection with Heterogeneous Training Data
Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das

Tootfinder

Opt-in global Mastodon full text search. Join the index!