Tootfinder

@macandi@social.heise.de
2025-06-06 13:56:00

Staatliche Überwachung: Apple legt erstmals Anfragen zu Push-Tokens offen
Die Push-Server von Apple und Google verschicken täglich Abermilliarden von Mitteilungen. Das weckt Begehrlichkeiten bei Behörden – auch in Deutschland.

Staatliche Überwachung: Apple legt erstmals Anfragen zu Push-Tokens offen
Die Push-Server von Apple und Google verschicken täglich Abermilliarden von Mitteilungen. Das weckt Begehrlichkeiten bei Behörden – auch in Deutschland.

@chpietsch@fedifreu.de
2025-06-08 09:50:33

This story of an ex-Googler is well worth reading. After some personal observations, @… asks interesting questions:
What is crypto mining if not a textbook Captain Planet villain scheme—to kill and raze and destroy for nothing but imaginary tokens proving that you did lots of killing and razing and …

Deep in Mordor where the shadows lie: Dystopian stories of my time as a Googler
I will do something I normally never do here, and make my first ever blog post on the topic of, long sigh: tech. I’ve already talked abou...

@Techmeme@techhub.social
2025-07-02 21:05:40

OpenAI says the tokenized OpenAI shares Robinhood has started offering are not equity: "We did not partner with Robinhood...and do not endorse it" (MacKenzie Sigalos/CNBC)
https://www.cnbc.com/2025/07/02/openai-robinhood-tokens.html

OpenAI says Robinhood's tokens aren't equity in the company
OpenAI said it was not involved in Robinhood’s launch of tokenized shares in Europe, warning users the assets were issued without its approval.

@arXiv_csCL_bot@mastoxiv.page
2025-07-04 09:52:11

MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
Purbesh Mitra, Sennur Ulukus
https://arxiv.org/abs/2507.02851 https://a…

MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
Recent advancements in the reasoning capabilities of large language models (LLMs) show that employing group relative policy optimization (GRPO) algorithm for reinforcement learning (RL) training allows the models to use more thinking/reasoning tokens for generating better responses. However, LLMs can generate only a finite amount of tokens while maintaining attention to the previously generated tokens. This limit, also known as the context size of an LLM, is a bottleneck in LLM reasoning with a…

@UP8@mastodon.social
2025-07-02 14:50:26

⛓️‍💥 Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations
#ai #llm

Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations
Modern tokenizers employ deterministic algorithms to map text into a single "canonical" token sequence, yet the same string can be encoded as many non-canonical tokenizations using the tokenizer vocabulary. In this work, we investigate the robustness of LMs to text encoded with non-canonical tokenizations entirely unseen during training. Surprisingly, when evaluated across 20 benchmarks, we find that instruction-tuned models retain up to 93.4% of their original performance when given a randomly…

@arXiv_csCR_bot@mastoxiv.page
2025-06-06 09:33:46

This https://arxiv.org/abs/2410.11295 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…

BRC20 Pinning Attack
BRC20 tokens are a type of non-fungible asset on the Bitcoin network. They allow users to embed customised content within Bitcoin's satoshis. The token frenzy reached a market size of US\$2.811\,b (2023Q3--2025Q1). However, this intuitive design has not undergone serious security scrutiny. We present the first analysis of BRC20's \emph{transfer} mechanism and identify a new attack vector. A typical BRC20 transfer involves two "bundled" on-chain transactions with different fee levels: the firs…

@arXiv_mathCO_bot@mastoxiv.page
2025-06-06 07:25:27

Nim on Integer Partitions and Hyperrectangles
Eric Gottlieb, Matja\v{z} Krnc, Peter Mur\v{s}i\v{c}
https://arxiv.org/abs/2506.04991 https://

Nim on Integer Partitions and Hyperrectangles
We describe PNim and RNim, two variants of Nim in which piles of tokens are replaced with integer partitions or hyperrectangles. In PNim, the players choose one of the integer partitions and remove a positive number of rows or a positive number of columns from the Young diagram of that partition. In RNim, players choose one of the hyperrectangles and reduce one of its side lengths. For PNim, we find a tight upper bound for the Sprague-Grundy values of partitions and characterize partitions wi…

@arXiv_csRO_bot@mastoxiv.page
2025-06-03 08:06:25

FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens
Yiming Zhong, Yumeng Liu, Chuyang Xiao, Zemin Yang, Youzhuo Wang, Yufei Zhu, Ye Shi, Yujing Sun, Xinge Zhu, Yuexin Ma
https://arxiv.org/abs/2506.01583

FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens
Learning effective visuomotor policies for robotic manipulation is challenging, as it requires generating precise actions while maintaining computational efficiency. Existing methods remain unsatisfactory due to inherent limitations in the essential action representation and the basic network architectures. We observe that representing actions in the frequency domain captures the structured nature of motion more effectively: low-frequency components reflect global movement patterns, while high-…

@frankel@mastodon.top
2025-06-04 08:11:00

Spring Secret Starter: Managing #Secrets in Your #SpringBoot App
https://lucas-fern…

Spring Secret Starter: Managing Secrets in Your Spring Boot App
In today’s cloud-native world, managing secrets (API keys, database credentials, tokens, etc.) securely is non-negotiable. Yet, developers often struggle with balancing security and simplicity when…

@arXiv_csAI_bot@mastoxiv.page
2025-06-05 09:47:48

This https://arxiv.org/abs/2506.02867 has been replaced.
link: https://scholar.google.com/scholar?q=a

Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
Large reasoning models (LRMs) have demonstrated impressive capabilities in complex problem-solving, yet their internal reasoning mechanisms remain poorly understood. In this paper, we investigate the reasoning trajectories of LRMs from an information-theoretic perspective. By tracking how mutual information (MI) between intermediate representations and the correct answer evolves during LRM reasoning, we observe an interesting MI peaks phenomenon: the MI at specific generative steps exhibits a s…

@arXiv_csCY_bot@mastoxiv.page
2025-06-03 07:22:39

TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
Hyundong Jin, Sicheol Sung, Shinwoo Park, SeungYeop Baik, Yo-Sub Han
https://arxiv.org/abs/2506.00089

TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
The reasoning, writing, text-editing, and retrieval capabilities of proprietary large language models (LLMs) have advanced rapidly, providing users with an ever-expanding set of functionalities. However, this growing utility has also led to a serious societal concern: the over-reliance on LLMs. In particular, users increasingly delegate tasks such as homework, assignments, or the processing of sensitive documents to LLMs without meaningful engagement. This form of over-reliance and misuse is em…

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:21:02

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, Junyang Lin
https://arx…

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful approach to enhancing the reasoning capabilities of Large Language Models (LLMs), while its mechanisms are not yet well understood. In this work, we undertake a pioneering exploration of RLVR through the novel perspective of token entropy patterns, comprehensively analyzing how different tokens influence reasoning performance. By examining token entropy patterns in Chain-of-Thought (CoT) reasoning, we observe that o…

@matthiasott@mastodon.social
2025-05-28 09:18:55

What are you using to manage the #design #tokens in your projects or organization?
Style Dictionary? Diez? Your own solution?

@arXiv_eessSP_bot@mastoxiv.page
2025-07-03 08:29:10

Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach
Hao Wei, Wanli Ni, Wen Wang, Wenjun Xu, Dusit Niyato, Ping Zhang
https://arxiv.org/abs/2507.01728

Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach
This letter proposes UniToCom, a unified token communication paradigm that treats tokens as the fundamental units for both processing and wireless transmission. Specifically, to enable efficient token representations, we propose a generative information bottleneck (GenIB) principle, which facilitates the learning of tokens that preserve essential information while supporting reliable generation across multiple modalities. By doing this, GenIB-based tokenization is conducive to improving the com…

@arXiv_csLG_bot@mastoxiv.page
2025-07-04 10:19:51

Fast and Simplex: 2-Simplicial Attention in Triton
Aurko Roy, Timothy Chou, Sai Surya Duvvuri, Sijia Chen, Jiecao Yu, Xiaodong Wang, Manzil Zaheer, Rohan Anil
https://arxiv.org/abs/2507.02754

Fast and Simplex: 2-Simplicial Attention in Triton
Recent work has shown that training loss scales as a power law with both model size and the number of tokens, and that achieving compute-optimal models requires scaling model size and token count together. However, these scaling laws assume an infinite supply of data and apply primarily in compute-bound settings. As modern large language models increasingly rely on massive internet-scale datasets, the assumption that they are compute-bound is becoming less valid. This shift highlights the need …

@arXiv_csIR_bot@mastoxiv.page
2025-07-02 09:35:09

EARN: Efficient Inference Acceleration for LLM-based Generative Recommendation by Register Tokens
Chaoqun Yang, Xinyu Lin, Wenjie Wang, Yongqi Li, Teng Sun, Xianjing Han, Tat-Seng Chua
https://arxiv.org/abs/2507.00715

EARN: Efficient Inference Acceleration for LLM-based Generative Recommendation by Register Tokens
Large Language Model-based generative recommendation (LLMRec) has achieved notable success, but it suffers from high inference latency due to massive computational overhead and memory pressure of KV Cache. Existing KV Cache reduction methods face critical limitations: cache compression offers marginal acceleration given recommendation tasks' short decoding steps, while prompt compression risks discarding vital interaction history. Through systematic analysis of attention patterns in LLMRec, we …

@arXiv_csMM_bot@mastoxiv.page
2025-06-04 07:23:38

Controllable Text-to-Speech Synthesis with Masked-Autoencoded Style-Rich Representation
Yongqi Wang, Chunlei Zhang, Hangting Chen, Zhou Zhao, Dong Yu
https://arxiv.org/abs/2506.02997

Controllable Text-to-Speech Synthesis with Masked-Autoencoded Style-Rich Representation
Controllable TTS models with natural language prompts often lack the ability for fine-grained control and face a scarcity of high-quality data. We propose a two-stage style-controllable TTS system with language models, utilizing a quantized masked-autoencoded style-rich representation as an intermediary. In the first stage, an autoregressive transformer is used for the conditional generation of these style-rich tokens from text and control signals. The second stage generates codec tokens from b…

@Techmeme@techhub.social
2025-07-03 18:50:47

Founders Fund-backed Ondo Finance and Pantera Capital launch a $250M fund to invest in real-world asset tokenization projects via equity stakes and tokens (Lucinda Shen/Axios)
https://www.axios.com/pro/fintech-deals/2025/07/03/ondo-…

Exclusive: Ondo and Pantera Capital earmark $250 million for real-world asset tokenization
Tokenization lags only stablecoins in crypto activity.

@arXiv_csCE_bot@mastoxiv.page
2025-06-02 07:15:45

Singularity Protocol for Cross Chain AMM without Intermediate Tokens or Bridges
Sumit Vohra
https://arxiv.org/abs/2505.24337 https://…

Singularity Protocol for Cross Chain AMM without Intermediate Tokens or Bridges
Automated Market Makers (AMMs) are decentralized exchange protocols that provide continuous access to token liquidity without the need for order books or traditional market makers. However, this innovation has failed to scale when it comes to cross-chain swaps. Modern cross-chain swaps employ double-sided AMMs, which are not only inefficient due to liquidity fragmentation but also require an intermediate token. This introduces inherent volatility risk as well as blockchain and bridging risk, es…

@arXiv_csCV_bot@mastoxiv.page
2025-07-02 14:34:53

Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/4]:
- The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Info...
Li, Shi, Gao, Liu, Wang, Chen, Liu, Zhao, Wang, Metaxas

@arXiv_csDC_bot@mastoxiv.page
2025-06-04 07:21:29

Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism
Yuhao Shen, Junyi Shen, Quan Kong, Tianyu Liu, Yao Lu, Cong Wang
https://arxiv.org/abs/2506.01979

Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism
Recently, speculative decoding (SD) has emerged as a promising technique to accelerate LLM inference by employing a small draft model to propose draft tokens in advance, and validating them in parallel with the large target model. However, the existing SD methods still remain fundamentally constrained by their serialized execution, which causes the mutual waiting bubbles between the draft and target models. To address this challenge, we draw inspiration from branch prediction in modern processo…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-30 08:40:30

DiffSoundStream: Efficient Speech Tokenization via Diffusion Decoding
Yang Yang, Yunpeng Li, George Sung, Shao-Fu Shih, Craig Dooley, Alessio Centazzo, Ramanan Rajeswaran
https://arxiv.org/abs/2506.22362

DiffSoundStream: Efficient Speech Tokenization via Diffusion Decoding
Token-based language modeling is a prominent approach for speech generation, where tokens are obtained by quantizing features from self-supervised learning (SSL) models and extracting codes from neural speech codecs, generally referred to as semantic tokens and acoustic tokens. These tokens are often modeled autoregressively, with the inference speed being constrained by the token rate. In this work, we propose DiffSoundStream, a solution that improves the efficiency of speech tokenization in n…

@primonatura@mstdn.social
2025-06-22 13:00:56

"Some AI prompts could cause 50 times more CO₂ emissions than others, researchers find"
#AI #ArtificialIntelligence #Technology

Some AI prompts could cause 50 times more CO₂ emissions than others, researchers find
No matter which questions we ask an AI, the model will come up with an answer. To produce this information—regardless of whether the answer is correct or not—the model uses tokens. Tokens are words or parts of words that are converted into a string of numbers that can be processed by the LLM.

@arXiv_csCC_bot@mastoxiv.page
2025-06-30 07:31:39

Nets-within-Nets through the Lens of Data Nets
Francesco Di Cosmo, Soumodev Mal, Tephilla Prince
https://arxiv.org/abs/2506.22344 https://

Nets-within-Nets through the Lens of Data Nets
Elementary Object Systems (EOSs) are a model in the nets-within-nets (NWNs) paradigm, where tokens in turn can host standard Petri nets. We study the complexity of the reachability problem of EOSs when subjected to non-deterministic token losses. It is known that this problem is equivalent to the coverability problem with no lossiness of conservative EOSs (cEOSs). We precisely characterize cEOS coverability into the framework of data nets, whose tokens carry data from an infinite domain. Specif…

@arXiv_qfinTR_bot@mastoxiv.page
2025-07-04 07:58:51

A Midsummer Meme's Dream: Investigating Market Manipulations in the Meme Coin Ecosystem
Alberto Maria Mongardini, Alessandro Mei
https://arxiv.org/abs/2507.01963

A Midsummer Meme's Dream: Investigating Market Manipulations in the Meme Coin Ecosystem
From viral jokes to a billion-dollar phenomenon, meme coins have become one of the most popular segments in cryptocurrency markets. Unlike utility-focused crypto assets like Bitcoin or Ethereum, meme coins derive value primarily from community sentiment, making them vulnerable to manipulation. This study presents a cross-chain analysis of the meme coin ecosystem, examining 34,988 tokens across Ethereum, BNB Smart Chain, Solana, and Base. We characterize the tokenomics of meme coins and track th…

@ytm@social.linux.pizza
2025-04-28 07:34:48

I have released LLama2.c64 - an LLM running on a C64 with 2MB REU. It runs the Llama2 LLM architecture, using the tokenizer and weights from the Tinystories 260K model.
It's a storytelling model that tries its best to spin your prompt into a story, as if told by a kindergarten child. It will generate one output token about every 8 minutes.
…

Prompt: There was a
Output: There was a big car who lived in a big tree. The car was being a little girl named Lily. She had a car with a big ball and she wanted to play with it. She went to the park

Model information screen and parameters for transformer inference: temperature, top-p, output tokens. The estimated time to generate 60 tokens is about 8h

GitHub - ytmytm/llama2.c64: Inference Llama 2 in several files of pure C but on a C64
Inference Llama 2 in several files of pure C but on a C64 - ytmytm/llama2.c64

@arXiv_csSD_bot@mastoxiv.page
2025-06-03 07:31:50

MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
Yakun Song, Jiawei Chen, Xiaobin Zhuang, Chenpeng Du, Ziyang Ma, Jian Wu, Jian Cong, Dongya Jia, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen
https://arxiv.org/abs/2506.00385

MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
Neural audio codecs have made significant strides in efficiently mapping raw audio waveforms into discrete token representations, which are foundational for contemporary audio generative models. However, most existing codecs are optimized primarily for reconstruction quality, often at the expense of the downstream modelability of the encoded tokens. Motivated by the need to overcome this bottleneck, we introduce $\textbf{MagiCodec}$, a novel single-layer, streaming Transformer-based audio codec…

@UP8@mastodon.social
2025-07-02 15:30:43

📉 Token That’s Literally USELESS Is Crypto’s Latest Meme Cult
https://www.coindesk.com/markets/2025/06/18/token-that-s-literally-useless-is-crypto-s-latest-meme-cult

Looking for the Next Dogecoin (DOGE), SHIB, PEPE? Useless Might Just be The Bet
In a flat market, where most tokens promise the moon and deliver a tweet, USELESS has found its niche: no promises, no pretenses — just a meme that’s worth tens of millions.

@arXiv_csAI_bot@mastoxiv.page
2025-06-03 17:52:46

This https://arxiv.org/abs/2502.13943 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence
Current approaches for training Process Reward Models (PRMs) often involve breaking down responses into multiple reasoning steps using rule-based techniques, such as using predefined placeholder tokens or setting the reasoning step's length into a fixed size. These approaches overlook the fact that specific words do not typically mark true decision points in a text. To address this, we propose AdaptiveStep, a method that divides reasoning steps based on the model's confidence in predicting the …

@arXiv_eessSY_bot@mastoxiv.page
2025-07-03 09:58:10

Vision-Aided ISAC in Low-Altitude Economy Networks via De-Diffused Visual Priors
Yulan Gao, Ziqiang Ye, Zhonghao Lyu, Ming Xiao, Yue Xiao, Ping Yang, Agata Manolova
https://arxiv.org/abs/2507.01574

Vision-Aided ISAC in Low-Altitude Economy Networks via De-Diffused Visual Priors
Emerging low-altitude economy networks (LAENets) require agile and privacy-preserving resource control under dynamic agent mobility and limited infrastructure support. To meet these challenges, we propose a vision-aided integrated sensing and communication (ISAC) framework for UAV-assisted access systems, where onboard masked De-Diffusion models extract compact semantic tokens, including agent type, activity class, and heading orientation, while explicitly suppressing sensitive visual content. …

@arXiv_csLG_bot@mastoxiv.page
2025-06-03 21:53:50

This https://arxiv.org/abs/2505.19669 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…

Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling
Zero-shot streaming text-to-speech is an important research topic in human-computer interaction. Existing methods primarily use a lookahead mechanism, relying on future text to achieve natural streaming speech synthesis, which introduces high processing latency. To address this issue, we propose SMLLE, a streaming framework for generating high-quality speech frame-by-frame. SMLLE employs a Transducer to convert text into semantic tokens in real time while simultaneously obtaining duration align…

@metacurity@infosec.exchange
2025-06-24 10:05:24

Big drama over Discord warning a no-code platform called BotGhost that it will be kicked off unless it finds a new way to operate, due to a security vulnerability that has now been fixed.
https://update.botghost.com/#tldr-what-s-happening-with-botghost-and-discord…

Discord is Threatening to Shutdown BotGhost: The Enshittification of Discord.
Discord has issued BotGhost with a formal breach warning and given us an ultimatum: find a completely new way to operate without using bot tokens by July 14, 2025, or the platform will be shut down. The catch? That alternative does not exist, and Discord has offered no guidance, no support, and no path forward.

@arXiv_eessSP_bot@mastoxiv.page
2025-07-02 08:48:49

Quantize-Sample-and-Verify: LLM Acceleration via Adaptive Edge-Cloud Speculative Decoding
Guangyi Zhang, Yunlong Cai, Guanding Yu, Petar Popovski, Osvaldo Simeone
https://arxiv.org/abs/2507.00605

Quantize-Sample-and-Verify: LLM Acceleration via Adaptive Edge-Cloud Speculative Decoding
In edge-cloud speculative decoding (SD), edge devices equipped with small language models (SLMs) generate draft tokens that are verified by large language models (LLMs) in the cloud. A key bottleneck in such systems is the limited communication bandwidth between edge and cloud, which necessitates quantization of the information transmitted about generated tokens. In this work, we introduce a novel quantize-sample (Q-S) strategy that provably preserves the output distribution of the cloud-based …

@arXiv_csIR_bot@mastoxiv.page
2025-06-04 13:35:23

This https://arxiv.org/abs/2502.09891 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…

ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) has proven effective in integrating external knowledge into large language models (LLMs) for solving question-answer (QA) tasks. The state-of-the-art RAG approaches often use the graph data as the external data since they capture the rich semantic information and link relationships between entities. However, existing graph-based RAG approaches cannot accurately identify the relevant information from the graph and also consume large numbers of tokens in the o…

@arXiv_econTH_bot@mastoxiv.page
2025-06-03 16:14:20

This https://arxiv.org/abs/2505.22784 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_eco…

Split the Yield, Share the Risk: Pricing, Hedging and Fixed rates in DeFi
We present the first formal treatment of \emph{yield tokenization}, a mechanism that decomposes yield-bearing assets into principal and yield components to facilitate risk transfer and price discovery in decentralized finance (DeFi). We propose a model that characterizes yield token dynamics using stochastic differential equations. We derive a no-arbitrage pricing framework for yield tokens, enabling their use in hedging future yield volatility and managing interest rate risk in decentralized l…

@arXiv_csCR_bot@mastoxiv.page
2025-06-04 07:34:58

When Blockchain Meets Crawlers: Real-time Market Analytics in Solana NFT Markets
Chengxin Shen, Zhongwen Li, Xiaoqi Li, Zongwei Li
https://arxiv.org/abs/2506.02892

When Blockchain Meets Crawlers: Real-time Market Analytics in Solana NFT Markets
In this paper, we design and implement a web crawler system based on the Solana blockchain for the automated collection and analysis of market data for popular non-fungible tokens (NFTs) on the chain. Firstly, the basic information and transaction data of popular NFTs on the Solana chain are collected using the Selenium tool. Secondly, the transaction records of the Magic Eden trading market are thoroughly analyzed by combining them with the Scrapy framework to examine the price fluctuations an…

@arXiv_csGR_bot@mastoxiv.page
2025-05-29 07:18:11

RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination
Chong Zeng, Yue Dong, Pieter Peers, Hongzhi Wu, Xin Tong
https://arxiv.org/abs/2505.21925

RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination
We present RenderFormer, a neural rendering pipeline that directly renders an image from a triangle-based representation of a scene with full global illumination effects and that does not require per-scene training or fine-tuning. Instead of taking a physics-centric approach to rendering, we formulate rendering as a sequence-to-sequence transformation where a sequence of tokens representing triangles with reflectance properties is converted to a sequence of output tokens representing small patc…

@arXiv_csRO_bot@mastoxiv.page
2025-07-03 09:55:50

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Yifan Zhong, Fengshuo Bai, Shaofei Cai, Xuchuan Huang, Zhang Chen, Xiaowei Zhang, Yuanfei Wang, Shaoyang Guo, Tianrui Guan, Ka Nam Lui, Zhiquan Qi, Yitao Liang, Yuanpei Chen, Yaodong Yang
https://arxiv.org/abs/2507.01925

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
The remarkable advancements of vision and language foundation models in multimodal understanding, reasoning, and generation has sparked growing efforts to extend such intelligence to the physical world, fueling the flourishing of vision-language-action (VLA) models. Despite seemingly diverse approaches, we observe that current VLA models can be unified under a single framework: vision and language inputs are processed by a series of VLA modules, producing a chain of \textit{action tokens} that …

@arXiv_csCL_bot@mastoxiv.page
2025-06-30 10:22:00

HyperCLOVA X THINK Technical Report
NAVER Cloud HyperCLOVA X Team
https://arxiv.org/abs/2506.22403 https://arxiv.org/pdf/2506.22403…

HyperCLOVA X THINK Technical Report
We introduce HyperCLOVA X THINK, the first reasoning-focused large language model in the HyperCLOVA X family, pre-trained on roughly $6$ trillion high-quality Korean, and English tokens, augmented with targeted synthetic Korean data. It was implemented as a compute-memory-balanced Peri-LN Transformer scaled with $μ$P, pre-trained through a three-stage curriculum that expands the context window to $128$K tokens, and post-trained via supervised fine-tuning with Reinforcement Learning from Verifi…

@arXiv_csDC_bot@mastoxiv.page
2025-06-10 07:40:22

Addressing tokens dynamic generation, propagation, storage and renewal to secure the GlideinWMS pilot based jobs and system
Bruno Moreira Coimbra, Marco Mambelli
https://arxiv.org/abs/2506.07379

Addressing tokens dynamic generation, propagation, storage and renewal to secure the GlideinWMS pilot based jobs and system
GlideinWMS has been one of the first middleware in the WLCG community to transition from X.509 to support also tokens. The first step was to get from the prototype in 2019 to using tokens in production in 2022. This paper will present the challenges introduced by the wider adoption of tokens and the evolution plans for securing the pilot infrastructure of GlideinWMS and supporting the new requirements. In the last couple of years, the GlideinWMS team supported the migration of experiments and r…

@arXiv_csCY_bot@mastoxiv.page
2025-07-02 07:39:09

Intellectual Property Rights and Entrepreneurship in the NFT Ecosystem: Legal Frameworks, Business Models, and Innovation Opportunities
Pranav Darshan, Rohan J S, Raghuveer Rajesh, Ruchitha M, Sanika Kamath, Manas M N
https://arxiv.org/abs/2507.00172

Intellectual Property Rights and Entrepreneurship in the NFT Ecosystem: Legal Frameworks, Business Models, and Innovation Opportunities
Non Fungible Tokens have changed digital ownership and how creators earn money. Between 2021 and 2024, the market value exceeded 40 billion. However, the fast growth of the NFT ecosystem has revealed serious issues in managing intellectual property rights. There is a lot of confusion about the difference between owning an NFT and owning the copyright for the underlying content. This research looks at the gap between traditional copyright laws and blockchain-based transactions. We use a mixed me…

@cdarwin@c.im
2025-06-15 01:12:42

Trump has promoted his private businesses in unprecedented ways for a sitting president, presenting conflicts of interest.
His ventures into crypto, in particular, have drawn scrutiny because he has simultaneously moved to create a more friendly regulatory environment for the industry.
The $57 million from his stake in the crypto firm World Liberty Financial came from the sale of digital tokens.
That provides a glimpse into Trump’s earnings from cryptocurrency ventures t…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-05 09:43:37

This https://arxiv.org/abs/2406.05298 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…

Spectral Codecs: Improving Non-Autoregressive Speech Synthesis with Spectrogram-Based Audio Codecs
Historically, most speech models in machine-learning have used the mel-spectrogram as a speech representation. Recently, discrete audio tokens produced by neural audio codecs have become a popular alternate speech representation for speech synthesis tasks such as text-to-speech (TTS). However, the data distribution produced by such codecs is too complex for some TTS models to predict, typically requiring large autoregressive models to get good quality. Most existing audio codecs use Residual Ve…

@arXiv_csLG_bot@mastoxiv.page
2025-06-03 08:21:21

$IF-GUIDE$: Influence Function-Guided Detoxification of LLMs
Zachary Coalson, Juhan Bae, Nicholas Carlini, Sanghyun Hong
https://arxiv.org/abs/2506.01790 h…

$IF-GUIDE$: Influence Function-Guided Detoxification of LLMs
We study how training data contributes to the emergence of toxic behaviors in large-language models. Most prior work on reducing model toxicity adopts $reactive$ approaches, such as fine-tuning pre-trained (and potentially toxic) models to align them with human values. In contrast, we propose a $proactive$ approach$-$IF-Guide$-$which leverages influence functions to identify harmful tokens within any training data and suppress their impact during training. To this end, we first show that standa…

@Techmeme@techhub.social
2025-06-26 15:40:48

UAE-based Aqua 1 Foundation buys $100M worth of tokens from Trump's World Liberty Financial, becoming its largest individual investor ahead of Justin Sun (Muyao Shen/Bloomberg)
https://www.bloomberg.com/news/articles/20

@arXiv_qfinRM_bot@mastoxiv.page
2025-06-10 18:23:40

This https://arxiv.org/abs/2503.01148 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_qfi…

Dynamic spillovers and investment strategies across artificial intelligence ETFs, artificial intelligence tokens, and green markets
This paper investigates the risk spillovers among AI ETFs, AI tokens, and green markets using the R2 decomposition method. We reveal several key insights. First, the overall transmission connectedness index (TCI) closely aligns with the contemporaneous TCI, while the lagged TCI is significantly lower. Second, AI ETFs and clean energy act as risk transmitters, whereas AI tokens and green bond function as risk receivers. Third, AI tokens are difficult to hedge and provide limited hedging ability …

@arXiv_csGT_bot@mastoxiv.page
2025-05-29 07:18:17

Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives
Ander Artola Velasco, Stratis Tsirtsis, Nastaran Okati, Manuel Gomez-Rodriguez
https://arxiv.org/abs/2505.21627

Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives
State-of-the-art large language models require specialized hardware and substantial energy to operate. As a consequence, cloud-based services that provide access to large language models have become very popular. In these services, the price users pay for an output provided by a model depends on the number of tokens the model uses to generate it -- they pay a fixed price per token. In this work, we show that this pricing mechanism creates a financial incentive for providers to strategize and mi…

@arXiv_csSD_bot@mastoxiv.page
2025-06-03 07:35:22

FUSE: Universal Speech Enhancement using Multi-Stage Fusion of Sparse Compression and Token Generation Models for the URGENT 2025 Challenge
Nabarun Goswami, Tatsuya Harada
https://arxiv.org/abs/2506.00809

FUSE: Universal Speech Enhancement using Multi-Stage Fusion of Sparse Compression and Token Generation Models for the URGENT 2025 Challenge
We propose a multi-stage framework for universal speech enhancement, designed for the Interspeech 2025 URGENT Challenge. Our system first employs a Sparse Compression Network to robustly separate sources and extract an initial clean speech estimate from noisy inputs. This is followed by an efficient generative model that refines speech quality by leveraging self-supervised features and optimizing a masked language modeling objective on acoustic tokens derived from a neural audio codec. In the f…

@arXiv_csIR_bot@mastoxiv.page
2025-06-03 16:58:01

This https://arxiv.org/abs/2505.19189 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…

POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval
Although Multi-Vector Retrieval (MVR) has achieved the state of the art on many information retrieval (IR) tasks, its performance highly depends on how to decompose queries into smaller pieces, say phrases or tokens. However, optimizing query decomposition for MVR performance is not end-to-end differentiable. Even worse, jointly solving this problem and training the downstream retrieval-based systems, say RAG systems could be highly inefficient. To overcome these challenges, we propose Performa…

@arXiv_csSE_bot@mastoxiv.page
2025-06-10 17:15:09

This https://arxiv.org/abs/2505.14759 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…

LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models
Large Language Models for code often entail significant computational complexity, which grows significantly with the length of the input code sequence. We propose LeanCode for code simplification to reduce training and prediction time, leveraging code contexts in utilizing attention scores to represent the tokens' importance. We advocate for the selective removal of tokens based on the average context-aware attention scores rather than average scores across all inputs. LeanCode uses the attenti…

@arXiv_eessSP_bot@mastoxiv.page
2025-07-04 09:00:31

When Attention is Beneficial for Learning Wireless Resource Allocation Efficiently?
Jia Guo, Chenyang Yang
https://arxiv.org/abs/2507.02427 https://…

When Attention is Beneficial for Learning Wireless Resource Allocation Efficiently?
Owing to the use of attention mechanism to leverage the dependency across tokens, Transformers are efficient for natural language processing. By harnessing permutation properties broadly exist in resource allocation policies, each mapping measurable environmental parameters (e.g., channel matrix) to optimized variables (e.g., precoding matrix), graph neural networks (GNNs) are promising for learning these policies efficiently in terms of scalability and generalizability. To reap the benefits of…

@arXiv_mathCO_bot@mastoxiv.page
2025-06-30 08:28:30

Evasive Random Walks and the Clairvoyant Demon
Aaron Abrams, Henry Landau, Zeph Landau, James Pommersheim, Eric Zaslow
https://arxiv.org/abs/2506.21929 htt…

Evasive Random Walks and the Clairvoyant Demon
A pair of random walks $(R,S)$ on the vertices of a graph $G$ is {\it successful} if two tokens can be scheduled (moving only one token at a time) to travel along $R$ and $S$ without colliding. We consider questions related to P. Winkler's {\it clairvoyant demon problem}, which asks whether for random walks $R$ and $S$ on $G$, $Pr[\ (R,S) \mbox{ is successful }] >0$. We introduce the notion of an {\it evasive} walk on $G$: a walk $S$ so that for a random walk $R$ on $G$, $Pr[\ (R,S) \mbox{ is s…

@arXiv_csOS_bot@mastoxiv.page
2025-06-26 08:55:30

Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU
He Sun, Li Li, Mingjun Xiao, Chengzhong Xu
https://arxiv.org/abs/2506.20187

Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU
Advanced Large Language Models (LLMs) have achieved impressive performance across a wide range of complex and long-context natural language tasks. However, performing long-context LLM inference locally on a commodity GPU (a PC) with privacy concerns remains challenging due to the increasing memory demands of the key-value (KV) cache. Existing systems typically identify important tokens and selectively offload their KV data to GPU and CPU memory. The KV data needs to be offloaded to disk due to …

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:19:55

Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
Pierre-Carl Langlais, Carlos Rosas Hinostroza, Mattia Nee, Catherine Arnett, Pavel Chizhov, Eliot Krzystof Jones, Ir\`ene Girard, David Mach, Anastasia Stasenko, Ivan P. Yamshchikov
https://arxiv.org/abs/2506.01732

Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
Large Language Models (LLMs) are pre-trained on large amounts of data from different sources and domains. These data most often contain trillions of tokens with large portions of copyrighted or proprietary content, which hinders the usage of such models under AI legislation. This raises the need for truly open pre-training data that is compliant with the data security regulations. In this paper, we introduce Common Corpus, the largest open dataset for language model pre-training. The data assem…

@arXiv_csCR_bot@mastoxiv.page
2025-06-23 10:21:20

Centre driven Controlled Evolution of Wireless Virtual Networks based on Broadcast Tokens
Vignesh Babu, Atishay Jain, Kannan Karthik
https://arxiv.org/abs/2506.16615

Centre driven Controlled Evolution of Wireless Virtual Networks based on Broadcast Tokens
In a wireless sensor network, the virtual connectivity between nodes is a function of the keys shared between various nodes. Pre-embedding these key configurations in the nodes would make the network inflexible. On the other hand, permitting subsets of nodes to engage in a common key synthesis phase to create secure distributed connections amongst themselves, would decouple and conceal the information flow from the controlling centre. An intermediate solution is the notion of a centre driven ke…

@arXiv_csLG_bot@mastoxiv.page
2025-06-25 08:48:30

Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
Zihan Wang, Rui Pan, Jiarui Yao, Robert Csordas, Linjie Li, Lu Yin, Jiajun Wu, Tong Zhang, Manling Li, Shiwei Liu
https://arxiv.org/abs/2506.18945

Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
We propose Chain-of-Experts (CoE), a new Mixture-of-Experts (MoE) architecture that introduces sequential expert communication within each layer. Unlike traditional MoE models, where experts operate independently in parallel, CoE processes tokens iteratively across a chain of experts inside a layer. To support dynamic expert selection across iterations, CoE employs a dedicated router at each iteration step within a layer. This design allows tokens to re-evaluate and select different experts dur…

@arXiv_csCV_bot@mastoxiv.page
2025-06-25 10:32:30

Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation
Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, Song Han
https://arxiv.org/abs/2506.19852…

Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation
Recent advances in diffusion models have enabled high-quality video generation, but the additional temporal dimension significantly increases computational costs, making training and inference on long videos prohibitively expensive. In this paper, we identify a phenomenon we term Spatiotemporal Energy Decay in video diffusion models: post-softmax attention scores diminish as spatial and temporal distance between tokens increase, akin to the physical decay of signal or waves over space and time …

@arXiv_csGR_bot@mastoxiv.page
2025-06-24 09:49:40

DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling
Anindita Ghosh, Bing Zhou, Rishabh Dabral, Jian Wang, Vladislav Golyanik, Christian Theobalt, Philipp Slusallek, Chuan Guo
https://arxiv.org/abs/2506.18680

DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling
We present DuetGen, a novel framework for generating interactive two-person dances from music. The key challenge of this task lies in the inherent complexities of two-person dance interactions, where the partners need to synchronize both with each other and with the music. Inspired by the recent advances in motion synthesis, we propose a two-stage solution: encoding two-person motions into discrete tokens and then generating these tokens from music. To effectively capture intricate interactions…

@arXiv_csDB_bot@mastoxiv.page
2025-06-17 09:30:23

CPN-Py: A Python-Based Tool for Modeling and Analyzing Colored Petri Nets
Alessandro Berti, Wil M. P. van der Aalst
https://arxiv.org/abs/2506.12238 https:…

CPN-Py: A Python-Based Tool for Modeling and Analyzing Colored Petri Nets
Colored Petri Nets (CPNs) are an established formalism for modeling processes where tokens carry data. Although tools like CPN Tools and CPN IDE excel at CPN-based simulation, they are often separate from modern data science ecosystems. Meanwhile, Python has become the de facto language for process mining, machine learning, and data analytics. In this paper, we introduce CPN-Py, a Python library that faithfully preserves the core concepts of Colored Petri Nets -- including color sets, timed tok…

@arXiv_csSD_bot@mastoxiv.page
2025-07-01 10:13:53

StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding
Dake Guo, Jixun Yao, Linhan Ma, Wang He, Lei Xie
https://arxiv.org/abs/2506.23986

StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding
Recent advancements in discrete token-based speech generation have highlighted the importance of token-to-waveform generation for audio quality, particularly in real-time interactions. Traditional frameworks integrating semantic tokens with flow matching (FM) struggle with streaming capabilities due to their reliance on a global receptive field. Additionally, directly implementing token-by-token streaming speech generation often results in degraded audio quality. To address these challenges, we…

@Techmeme@techhub.social
2025-06-20 14:26:03

Developers criticize Google for its decision to hide raw reasoning tokens, essential for debugging complex AI workflows, of its flagship model Gemini 2.5 Pro (Ben Dickson/VentureBeat)
https://venturebeat.com/ai/googles-gemin…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-03 16:55:19

This https://arxiv.org/abs/2505.19462 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…

VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
We present VoiceStar, the first zero-shot TTS model that achieves both output duration control and extrapolation. VoiceStar is an autoregressive encoder-decoder neural codec language model, that leverages a novel Progress-Monitoring Rotary Position Embedding (PM-RoPE) and is trained with Continuation-Prompt Mixed (CPM) training. PM-RoPE enables the model to better align text and speech tokens, indicates the target duration for the generated speech, and also allows the model to generate speech w…

@arXiv_csSE_bot@mastoxiv.page
2025-06-24 12:03:30

Your Token Becomes Worthless: Unveiling Rug Pull Schemes in Crypto Token via Code-and-Transaction Fusion Analysis
Hao Wu, Haijun Wang, Shangwang Li, Yin Wu, Ming Fan, Wuxia Jin, Yitao Zhao, Ting Liu
https://arxiv.org/abs/2506.18398

Your Token Becomes Worthless: Unveiling Rug Pull Schemes in Crypto Token via Code-and-Transaction Fusion Analysis
Rug pull scams have emerged as a persistent threat to cryptocurrency, causing significant financial losses. A typical scenario involves scammers deploying honeypot contracts to attract investments, restricting token sales, and draining the funds, which leaves investors with worthless tokens. Current methods either rely on predefined patterns to detect code risks or utilize statistical transaction data to train detection models. However, real-world Rug Pull schemes often involve a complex interp…

@arXiv_csCL_bot@mastoxiv.page
2025-06-18 09:16:21

A Variational Framework for Improving Naturalness in Generative Spoken Language Models
Li-Wei Chen, Takuya Higuchi, Zakaria Aldeneh, Ahmed Hussen Abdelaziz, Alexander Rudnicky
https://arxiv.org/abs/2506.14767

A Variational Framework for Improving Naturalness in Generative Spoken Language Models
The success of large language models in text processing has inspired their adaptation to speech modeling. However, since speech is continuous and complex, it is often discretized for autoregressive modeling. Speech tokens derived from self-supervised models (known as semantic tokens) typically focus on the linguistic aspects of speech but neglect prosodic information. As a result, models trained on these tokens can generate speech with reduced naturalness. Existing approaches try to fix this by…

@arXiv_csDC_bot@mastoxiv.page
2025-05-30 09:51:53

This https://arxiv.org/abs/2505.08944 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csDC_…

Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony
Mixture-of-Experts (MoE) architectures offer the promise of larger model capacity without the prohibitive costs of fully dense designs. However, in real-world inference serving, load skew across experts often leads to suboptimal device utilization and excessive synchronization overheads. This paper introduces Asynchronous Expert Parallelism (AEP), a new paradigm that decouples layer execution from barrier-style synchronization. By dynamically queuing tokens at each layer (referred to as $μ$-qu…

@arXiv_eessSP_bot@mastoxiv.page
2025-06-25 09:21:00

Low-Complexity Semantic Packet Aggregation for Token Communication via Lookahead Search
Seunghun Lee, Jihong Park, Jinho Choi, Hyuncheol Park
https://arxiv.org/abs/2506.19451

Low-Complexity Semantic Packet Aggregation for Token Communication via Lookahead Search
Tokens are fundamental processing units of generative AI (GenAI) and large language models (LLMs), and token communication (TC) is essential for enabling remote AI-generate content (AIGC) and wireless LLM applications. Unlike traditional bits, each of which is independently treated, the semantics of each token depends on its surrounding context tokens. This inter-token dependency makes TC vulnerable to outage channels, where the loss of a single token can significantly distort the original mess…

@arXiv_csIR_bot@mastoxiv.page
2025-07-01 09:44:33

Act-With-Think: Chunk Auto-Regressive Modeling for Generative Recommendation
Yifan Wang, Weinan Gan, Longtao Xiao, Jieming Zhu, Heng Chang, Haozhao Wang, Rui Zhang, Zhenhua Dong, Ruiming Tang, Ruixuan Li
https://arxiv.org/abs/2506.23643

Act-With-Think: Chunk Auto-Regressive Modeling for Generative Recommendation
Generative recommendation (GR) typically encodes behavioral or semantic aspects of item information into discrete tokens, leveraging the standard autoregressive (AR) generation paradigm to make predictions. However, existing methods tend to overlook their intrinsic relationship, that is, the semantic usually provides some reasonable explainability "$\textbf{why}$" for the behavior "$\textbf{what}$", which may constrain the full potential of GR. To this end, we present Chunk AutoRegressive Model…

@arXiv_econTH_bot@mastoxiv.page
2025-05-30 07:22:37

Split the Yield, Share the Risk: Pricing, Hedging and Fixed rates in DeFi
Viraj Nadkarni, Pramod Viswanath
https://arxiv.org/abs/2505.22784 https://…

@arXiv_csSD_bot@mastoxiv.page
2025-06-13 07:59:10

Discrete Audio Tokens: More Than a Survey!
Pooneh Mousavi, Gallil Maimon, Adel Moumen, Darius Petermann, Jiatong Shi, Haibin Wu, Haici Yang, Anastasia Kuznetsova, Artem Ploujnikov, Ricard Marxer, Bhuvana Ramabhadran, Benjamin Elizalde, Loren Lugosch, Jinyu Li, Cem Subakan, Phil Woodland, Minje Kim, Hung-yi Lee, Shinji Watanabe, Yossi Adi, Mirco Ravanelli

Discrete Audio Tokens: More Than a Survey!
Discrete audio tokens are compact representations that aim to preserve perceptual quality, phonetic content, and speaker characteristics while enabling efficient storage and inference, as well as competitive performance across diverse downstream tasks.They provide a practical alternative to continuous features, enabling the integration of speech and audio into modern large language models (LLMs). As interest in token-based audio processing grows, various tokenization methods have emerged, and s…

@arXiv_csMM_bot@mastoxiv.page
2025-06-13 08:07:20

Can Sound Replace Vision in LLaVA With Token Substitution?
Ali Vosoughi, Jing Bi, Pinxin Liu, Yunlong Tang, Chenliang Xu
https://arxiv.org/abs/2506.10416 h…

Can Sound Replace Vision in LLaVA With Token Substitution?
While multimodal systems have achieved impressive advances, they typically rely on text-aligned representations rather than directly integrating audio and visual inputs. This reliance can limit the use of acoustic information in tasks requiring nuanced audio understanding. In response, SoundCLIP explores direct audio-visual integration within multimodal large language models (MLLMs) by substituting CLIP's visual tokens with audio representations and selecting sound-relevant patch tokens in mode…

@arXiv_csRO_bot@mastoxiv.page
2025-06-09 08:29:02

BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning
Hongyi Zhou, Weiran Liao, Xi Huang, Yucheng Tang, Fabian Otto, Xiaogang Jia, Xinkai Jiang, Simon Hilber, Ge Li, Qian Wang, \"Omer Erdin\c{c} Ya\u{g}murlu, Nils Blank, Moritz Reuss, Rudolf Lioutikov
https://arxiv.org/abs/2506.06072

BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning
We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-splines. In contrast to existing action tokenizers based on vector quantization or byte pair encoding, BEAST requires no separate tokenizer training and consistently produces tokens of uniform length, enabling fast action sequence generation via parallel decoding. Leveraging our B-spline formulation, BEAST inherently ensures…

@arXiv_mathCO_bot@mastoxiv.page
2025-06-10 11:06:52

A Note on Reconfiguration Graphs of Cliques
Quan N. Lam, Huu An Phan, Duc A. Hoang
https://arxiv.org/abs/2506.07821 https://arxiv.org…

A Note on Reconfiguration Graphs of Cliques
In a reconfiguration setting, each clique of a graph $G$ is viewed as a set of tokens placed on vertices of $G$ such that no vertex has more than one token and any two tokens are adjacent. Additionally, three well-known reconfiguration rules have been studied in the literature: Token Jumping ($\mathsf{TJ}$, which involves moving a token to any unoccupied vertex), Token Sliding ($\mathsf{TS}$, which involves moving a token to an adjacent unoccupied vertex), and Token Addition/Removal ($\mathsf{T…

@arXiv_csSE_bot@mastoxiv.page
2025-06-10 08:03:42

Information-Theoretic Detection of Unusual Source Code Changes
Adriano Torres, Sebastian Baltes, Christoph Treude, Markus Wagner
https://arxiv.org/abs/2506.06508

Information-Theoretic Detection of Unusual Source Code Changes
The code base of software projects evolves essentially through inserting and removing information to and from the source code. We can measure this evolution via the elements of information - tokens, words, nodes - of the respective representation of the code. In this work, we approach the measurement of the information content of the source code of open-source projects from an information-theoretic standpoint. Our focus is on the entropy of two fundamental representations of code: tokens and ab…

@Techmeme@techhub.social
2025-06-17 11:15:57

China's MiniMax open sources MiniMax-M1, a model to handle complicated productivity tasks that supports 1M input tokens and it says surpasses DeepSeek's R1-0528 (Bloomberg)
https://www.bloomberg.com/news/articles/20

@arXiv_eessAS_bot@mastoxiv.page
2025-06-03 07:34:43

CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
Daiki Takeuchi, Binh Thien Nguyen, Masahiro Yasuda, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada
https://arxiv.org/abs/2506.00800

CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
Automated Audio Captioning (AAC) aims to describe the semantic contexts of general sounds, including acoustic events and scenes, by leveraging effective acoustic features. To enhance performance, an AAC method, EnCLAP, employed discrete tokens from EnCodec as an effective input for fine-tuning a language model BART. However, EnCodec is designed to reconstruct waveforms rather than capture the semantic contexts of general sounds, which AAC should describe. To address this issue, we propose CLAP-…

@arXiv_csIR_bot@mastoxiv.page
2025-06-23 09:20:00

A Simple Contrastive Framework Of Item Tokenization For Generative Recommendation
Penglong Zhai, Yifang Yuan, Fanyi Di, Jie Li, Yue Liu, Chen Li, Jie Huang, Sicong Wang, Yao Xu, Xin Li
https://arxiv.org/abs/2506.16683

A Simple Contrastive Framework Of Item Tokenization For Generative Recommendation
Generative retrieval-based recommendation has emerged as a promising paradigm aiming at directly generating the identifiers of the target candidates. However, in large-scale recommendation systems, this approach becomes increasingly cumbersome due to the redundancy and sheer scale of the token space. To overcome these limitations, recent research has explored the use of semantic tokens as an alternative to ID tokens, which typically leveraged reconstruction-based strategies, like RQ-VAE, to qua…

@arXiv_csDC_bot@mastoxiv.page
2025-06-27 09:05:09

Utility-Driven Speculative Decoding for Mixture-of-Experts
Anish Saxena, Po-An Tsai, Hritvik Taneja, Aamer Jaleel, Moinuddin Qureshi
https://arxiv.org/abs/2506.20675

Utility-Driven Speculative Decoding for Mixture-of-Experts
GPU memory bandwidth is the main bottleneck for low-latency Large Language Model (LLM) inference. Speculative decoding leverages idle GPU compute by using a lightweight drafter to propose K tokens, which the LLM verifies in parallel, boosting token throughput. In conventional dense LLMs, all model weights are fetched each iteration, so speculation adds no latency overhead. Emerging Mixture of Experts (MoE) models activate only a subset of weights per token, greatly reducing data movement. Howev…

@arXiv_csLG_bot@mastoxiv.page
2025-06-24 19:15:10

Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[9/11]:
- Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space...
Yuri Kuratov, Mikhail Arkhipov, Aydar Bulatov, Mikhail Burtsev

@arXiv_csIR_bot@mastoxiv.page
2025-06-30 09:55:10

HLTCOE at LiveRAG: GPT-Researcher using ColBERT retrieval
Kevin Duh, Eugene Yang, Orion Weller, Andrew Yates, Dawn Lawrie
https://arxiv.org/abs/2506.22356 …

HLTCOE at LiveRAG: GPT-Researcher using ColBERT retrieval
The HLTCOE LiveRAG submission utilized the GPT-researcher framework for researching the context of the question, filtering the returned results, and generating the final answer. The retrieval system was a ColBERT bi-encoder architecture, which represents a passage with many dense tokens. Retrieval used a local, compressed index of the FineWeb10-BT collection created with PLAID-X, using a model fine-tuned for multilingual retrieval. Query generation from context was done with Qwen2.5-7B-Instruct…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-03 07:33:41

IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling
Kuan-Po Huang, Shu-wen Yang, Huy Phan, Bo-Ru Lu, Byeonggeun Kim, Sashank Macha, Qingming Tang, Shalini Ghosh, Hung-yi Lee, Chieh-Chi Kao, Chao Wang
https://arxiv.org/abs/2506.00736

IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling
Text-to-audio generation synthesizes realistic sounds or music given a natural language prompt. Diffusion-based frameworks, including the Tango and the AudioLDM series, represent the state-of-the-art in text-to-audio generation. Despite achieving high audio fidelity, they incur significant inference latency due to the slow diffusion sampling process. MAGNET, a mask-based model operating on discrete tokens, addresses slow inference through iterative mask-based parallel decoding. However, its aud…

@Techmeme@techhub.social
2025-06-14 03:10:49

Harvard releases Institutional Books 1.0, a dataset for AI researchers with 242B tokens, from 394M scanned pages and 983K public domain books in 254 languages (Matt O'Brien/Associated Press)
https://apnews.com/article/ai-chatbot-

@arXiv_eessSP_bot@mastoxiv.page
2025-05-30 09:58:06

This https://arxiv.org/abs/2502.18200 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…

Zero-Shot Semantic Communication with Multimodal Foundation Models
Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented manner. However, their reliance on predefined tasks and datasets significantly limits their flexibility and generalizability in practical deployments. Multi-modal foundation models provide a promising solution by generating universal semantic tokens. Inspired by this, we introduce SemCLIP, a zero-shot SemCom framework leveraging the contrasti…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-03 07:35:34

HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement
Amir Hussein, Sameer Khurana, Gordon Wichern, Francois G. Germain, Jonathan Le Roux
https://arxiv.org/abs/2506.00843

HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement
Effective speech representations for spoken language models must balance semantic relevance with acoustic fidelity for high-quality reconstruction. However, existing approaches struggle to achieve both simultaneously. To address this, we introduce Hierarchical Acoustic and Semantic Representation Disentanglement (HASRD, pronounced `hazard'), a framework that factorizes self-supervised learning representations into discrete semantic and acoustic tokens. HASRD assigns the semantic representation …

@arXiv_csLG_bot@mastoxiv.page
2025-06-09 10:13:42

Corrector Sampling in Language Models
Itai Gat, Neta Shaul, Uriel Singer, Yaron Lipman
https://arxiv.org/abs/2506.06215 https://arxiv…

Corrector Sampling in Language Models
Autoregressive language models accumulate errors due to their fixed, irrevocable left-to-right token generation. To address this, we propose a new sampling method called Resample-Previous-Tokens (RPT). RPT mitigates error accumulation by iteratively revisiting and potentially replacing tokens in a window of previously generated text. This method can be integrated into existing autoregressive models, preserving their next-token-prediction quality and speed. Fine-tuning a pretrained 8B parameter …

@arXiv_csDC_bot@mastoxiv.page
2025-06-10 16:22:39

This https://arxiv.org/abs/2409.15104 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csDC_…

PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference
The scaling of transformer-based Large Language Models (LLMs) has significantly expanded their context lengths, enabling applications where inputs exceed 100K tokens. Our analysis of a recent Azure LLM inference trace reveals a highly skewed long-tail distribution of input lengths, with approximately 80% of inputs shorter than 2K tokens. Long inputs constitute only a small fraction. Existing cluster-level LLM scheduling strategies, including First-In-First-Out (FIFO), reservation-based, and pri…

@arXiv_csCR_bot@mastoxiv.page
2025-06-18 08:40:32

Detecting Hard-Coded Credentials in Software Repositories via LLMs
Chidera Biringa, Gokhan Kul
https://arxiv.org/abs/2506.13090 https://

Detecting Hard-Coded Credentials in Software Repositories via LLMs
Software developers frequently hard-code credentials such as passwords, generic secrets, private keys, and generic tokens in software repositories, even though it is strictly advised against due to the severe threat to the security of the software. These credentials create attack surfaces exploitable by a potential adversary to conduct malicious exploits such as backdoor attacks. Recent detection efforts utilize embedding models to vectorize textual credentials before passing them to classifier…

@Techmeme@techhub.social
2025-06-10 20:40:53

OpenAI debuts o3-pro for ChatGPT Pro and Team users and in the API, costing $20/1M input and $80/1M output tokens; Enterprise and Edu will get access next week (Kyle Wiggers/TechCrunch)
https://techcrunch.com/2025/06/10/open

OpenAI releases o3-pro, a souped-up version of its o3 AI reasoning model | TechCrunch
OpenAI has launched o3-pro, an AI model that the company claims is its most capable yet.

@arXiv_csIR_bot@mastoxiv.page
2025-05-30 09:55:38

This https://arxiv.org/abs/2505.21700 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…

Rethinking Chunk Size For Long-Document Retrieval: A Multi-Dataset Analysis
Chunking is a crucial preprocessing step in retrieval-augmented generation (RAG) systems, significantly impacting retrieval effectiveness across diverse datasets. In this study, we systematically evaluate fixed-size chunking strategies and their influence on retrieval performance using multiple embedding models. Our experiments, conducted on both short-form and long-form datasets, reveal that chunk size plays a critical role in retrieval effectiveness -- smaller chunks (64-128 tokens) are optim…

@arXiv_csRO_bot@mastoxiv.page
2025-06-18 08:51:09

AMPLIFY: Actionless Motion Priors for Robot Learning from Videos
Jeremy A. Collins, Lor\'and Cheng, Kunal Aneja, Albert Wilcox, Benjamin Joffe, Animesh Garg
https://arxiv.org/abs/2506.14198

AMPLIFY: Actionless Motion Priors for Robot Learning from Videos
Action-labeled data for robotics is scarce and expensive, limiting the generalization of learned policies. In contrast, vast amounts of action-free video data are readily available, but translating these observations into effective policies remains a challenge. We introduce AMPLIFY, a novel framework that leverages large-scale video data by encoding visual dynamics into compact, discrete motion tokens derived from keypoint trajectories. Our modular approach separates visual motion prediction fr…

@arXiv_eessAS_bot@mastoxiv.page
2025-06-27 08:04:49

CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate
Hankun Wang, Yiwei Guo, Chongtian Shao, Bohan Li, Xie Chen, Kai Yu
https://arxiv.org/abs/2506.21074

CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate
Neural speech codecs have been widely used in audio compression and various downstream tasks. Current mainstream codecs are fixed-frame-rate (FFR), which allocate the same number of tokens to every equal-duration slice. However, speech is inherently non-uniform in temporal information density. As a result, many tokens are wasted on steady-state segments like long vowels and silences. To address this mismatch, we present CodecSlime, a plugin-style method for compressing temporal redundancy throu…

@Techmeme@techhub.social
2025-06-10 18:46:03

OpenAI announces an 80% price drop for its o3 model and a "flex" mode for synchronous processing that charges $5 for input and $20 for output per million tokens (Carl Franzen/VentureBeat)
https://venturebeat.com/ai/openai-anno

@arXiv_csSD_bot@mastoxiv.page
2025-06-24 10:55:20

Smooth Operators: LLMs Translating Imperfect Hints into Disfluency-Rich Transcripts
Duygu Altinok
https://arxiv.org/abs/2506.18510 https://

Smooth Operators: LLMs Translating Imperfect Hints into Disfluency-Rich Transcripts
Accurate detection of disfluencies in spoken language is crucial for enhancing the performance of automatic speech and language processing systems, as well as fostering the development of more inclusive speech and language technologies. Leveraging the growing trend of large language models (LLMs) as versatile learners capable of processing both lexical and non-lexical inputs (e.g., audio and video), we propose a novel approach to transcribing disfluencies as explicit tokens with timestamps, ena…

@arXiv_csCL_bot@mastoxiv.page
2025-06-19 14:23:09

Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[3/4]:
- Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency
Chenlong Wang, Yuanning Feng, Dongping Chen, Zhaoyang Chu, Ranjay Krishna, Tianyi Zhou

Tootfinder

Opt-in global Mastodon full text search. Join the index!