Tootfinder

@arXiv_csCR_bot@mastoxiv.page
2025-06-23 09:22:59

Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy
Bishwajit Prasad Gond, Rajneekant, Pushkar Kishore, Durga Prasad Mohapatra
https://arxiv.org/abs/2506.16224

Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy
This paper investigates the application of natural language processing (NLP)-based n-gram analysis and machine learning techniques to enhance malware classification. We explore how NLP can be used to extract and analyze textual features from malware samples through n-grams, contiguous string or API call sequences. This approach effectively captures distinctive linguistic patterns among malware and benign families, enabling finer-grained classification. We delve into n-gram size selection, featu…

@arXiv_csCL_bot@mastoxiv.page
2025-06-17 09:33:51

Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
Hao Xu, Jiacheng Liu, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi
https://arxiv.org/abs/2506.12229

Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
Language models are trained mainly on massive text data from the Internet, and it becomes increasingly important to understand this data source. Exact-match search engines enable searching in large text corpora -- counting string appearances and retrieving the enclosing documents -- yet the high storage overhead hinders their application on Internet-scale data. We present Infini-gram mini, an efficient and scalable system that can make petabyte-level text corpora searchable. Based on the FM-ind…

@arXiv_mathQA_bot@mastoxiv.page
2025-08-22 08:00:11

The Haar State values of monomials and a method to pick orthonormal bases on $O(U_q(3))$
Ting Lu
https://arxiv.org/abs/2508.15346 https://arxiv.org/pdf/250…

The Haar State values of monomials and a method to pick orthonormal bases on $O(U_q(3))$
In this paper, we investigate the evaluation problem of the Haar state on the quantum group $O(U_q(n))$ ($n\ge 3$) which is a $q$-deformation of the Haar measure on the Lie group $U(n)$. The relation between the Haar state values of monomials on $O(U_q(n))$ is studied. On $O(U_q(3))$, the Haar state values of monomials are explicitly computed and these values are expressed as a finite summation of rational polynomials in $q$. As an application, we compute the Gram matrices of the irreducible co…

@arXiv_csCL_bot@mastoxiv.page
2025-06-19 08:16:24

Oldies but Goldies: The Potential of Character N-grams for Romanian Texts
Dana Lupsa, Sanda-Maria Avram
https://arxiv.org/abs/2506.15650 https://

Oldies but Goldies: The Potential of Character N-grams for Romanian Texts
This study addresses the problem of authorship attribution for Romanian texts using the ROST corpus, a standard benchmark in the field. We systematically evaluate six machine learning techniques: Support Vector Machine (SVM), Logistic Regression (LR), k-Nearest Neighbors (k-NN), Decision Trees (DT), Random Forests (RF), and Artificial Neural Networks (ANN), employing character n-gram features for classification. Among these, the ANN model achieved the highest performance, including perfect clas…

@arXiv_qbiobm_bot@mastoxiv.page
2025-06-17 11:42:50

In Vitro Antibacterial activity of hexane, Chloroform and methanolic extracts of different parts of Acronychia pedunculata grown in Sri Lanka
R. D. Nimantha Karunathilaka, Athige Rajith Niloshan Silva, Chathuranga Bharathee Ranaweera, D. M. R. K. Dissanayake, N. R. M. Nelumdeniya, Ranjith Pathirana, W. D. Ratnasooriya
https://ar…

In Vitro Antibacterial activity of hexane, Chloroform and methanolic extracts of different parts of Acronychia pedunculata grown in Sri Lanka
This study accessed the antibacterial potential in vitro of hexane, chloroform and methanol extracts made from leaves, stem bark, flowers, seeds or roots of Sri Lankan grown Acronychia pedunculata plant against two Gram positive bacteria, Staphylococus aureus (ATCC 25923) and Bacilus cereus (ATCC 11778), and two Gram negative bacteria, Pseudomonas aeruginosa (ATCC 9027) and Escherichia coli (ATCC 35218), using agar disc diffusion bioassay technique. The results showed that none the of the extra…

@arXiv_csCL_bot@mastoxiv.page
2025-07-14 09:48:52

Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization
Itai Mondshine, Tzuf Paz-Argaman, Reut Tsarfaty
https://arxiv.org/abs/2507.08342

Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization
Automatic n-gram based metrics such as ROUGE are widely used for evaluating generative tasks such as summarization. While these metrics are considered indicative (even if imperfect) of human evaluation for English, their suitability for other languages remains unclear. To address this, we systematically assess evaluation metrics for generation both n-gram-based and neural based to evaluate their effectiveness across languages and tasks. Specifically, we design a large-scale evaluation suite acr…

@arXiv_csCR_bot@mastoxiv.page
2025-07-09 08:12:22

Disappearing Ink: Obfuscation Breaks N-gram Code Watermarks in Theory and Practice
Gehao Zhang, Eugene Bagdasarian, Juan Zhai, Shiqing Ma
https://arxiv.org/abs/2507.05512

Disappearing Ink: Obfuscation Breaks N-gram Code Watermarks in Theory and Practice
Distinguishing AI-generated code from human-written code is becoming crucial for tasks such as authorship attribution, content tracking, and misuse detection. Based on this, N-gram-based watermarking schemes have emerged as prominent, which inject secret watermarks to be detected during the generation. However, their robustness in code content remains insufficiently evaluated. Most claims rely solely on defenses against simple code transformations or code optimizations as a simulation of atta…

@arXiv_eessAS_bot@mastoxiv.page
2025-05-30 07:22:13

NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding
Vladimir Bataev, Andrei Andrusenko, Lilit Grigoryan, Aleksandr Laptev, Vitaly Lavrukhin, Boris Ginsburg
https://arxiv.org/abs/2505.22857

NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding
Statistical n-gram language models are widely used for context-biasing tasks in Automatic Speech Recognition (ASR). However, existing implementations lack computational efficiency due to poor parallelization, making context-biasing less appealing for industrial use. This work rethinks data structures for statistical n-gram language models to enable fast and parallel operations for GPU-optimized inference. Our approach, named NGPU-LM, introduces customizable greedy decoding for all major ASR mod…

@arXiv_csSD_bot@mastoxiv.page
2025-08-14 09:16:52

Analysis of Domain Shift across ASR Architectures via TTS-Enabled Separation of Target Domain and Acoustic Conditions
Tina Raissi, Nick Rossenbach, Ralf Schl\"uter
https://arxiv.org/abs/2508.09868

Analysis of Domain Shift across ASR Architectures via TTS-Enabled Separation of Target Domain and Acoustic Conditions
We analyze automatic speech recognition (ASR) modeling choices under domain mismatch, comparing classic modular and novel sequence-to-sequence (seq2seq) architectures. Across the different ASR architectures, we examine a spectrum of modeling choices, including label units, context length, and topology. To isolate language domain effects from acoustic variation, we synthesize target domain audio using a text-to-speech system trained on LibriSpeech. We incorporate target domain n-gram and neural …

@lysander07@sigmoid.social
2025-05-28 05:10:40

Last week, we continued our #ISE2025 lecture on distributional semantics with the introduction of neural language models (NLMs) and compared them to traditional statistical n-gram models.
Benefits of NLMs:
- Capturing Long-Range Dependencies
- Computational and Statistical Tractability
- Improved Generalisation
- Higher Accuracy
@…

The image illustrates the architecture of a Neural Language Model, specifically focusing on Word Vectors II - Neural Language Models. It is part of a presentation on Natural Language Processing, created by the Karlsruhe Institute of Technology (KIT) and FIZ Karlsruhe, as indicated by their logos in the top right corner.

The diagram shows a neural network processing an input word embedding, represented by the phrase "to be or not to." The input is transformed into a d-sized vector representatio…

@arXiv_hepex_bot@mastoxiv.page
2025-08-05 08:38:10

Commissioning of the NUCLEUS Experiment at the Technical University of Munich
H. Abele, G. Angloher, B. Arnold, M. Atzori Corona, A. Bento, E. Bossio, F. Buchsteiner, J. Burkhart, F. Cappella, M. Cappelli, N. Casali, R. Cerulli, A. Cruciani, G. Del Castello, M. del Gallo Roccagiovine, S. Dorer, A. Erhart, M. Friedl, S. Fichtinger, V. M. Ghete, M. Giammei, C. Goupy, D. Hauff, F. Jeanneau, E. Jericha, M. Kaznacheeva, H. Kluck, A. Langenk\"amper, T. Lasserre, D. Lhuillier, M. Mancuso…

Commissioning of the NUCLEUS Experiment at the Technical University of Munich
The NUCLEUS experiment aims to detect coherent elastic neutrino-nucleus scattering of reactor antineutrinos on CaWO$_4$ targets in the fully coherent regime, using gram-scale cryogenic calorimeters. The experimental apparatus will be installed at the Chooz nuclear power plant in France, in the vicinity of two 4.25 GW$_{\text{th}}$ reactor cores. This work presents results from the commissioning of an essential version of the experiment at the shallow Underground Laboratory of the Technical Univ…

@arXiv_csCL_bot@mastoxiv.page
2025-07-03 10:16:00

Decision-oriented Text Evaluation
Yu-Shiang Huang, Chuan-Ju Wang, Chung-Chi Chen
https://arxiv.org/abs/2507.01923 https://arxiv.org/p…

Decision-oriented Text Evaluation
Natural language generation (NLG) is increasingly deployed in high-stakes domains, yet common intrinsic evaluation methods, such as n-gram overlap or sentence plausibility, weakly correlate with actual decision-making efficacy. We propose a decision-oriented framework for evaluating generated text by directly measuring its influence on human and large language model (LLM) decision outcomes. Using market digest texts--including objective morning summaries and subjective closing-bell analyses--as…

Tootfinder

Opt-in global Mastodon full text search. Join the index!