đź’° Supports multiple dimensions and quantization options - binary 512d version outperforms OpenAI-v3-large while reducing vector database costs by 99.48%
🔍 Processes entire documents in single pass to generate chunk embeddings enriched with document-level context
🎯 Less sensitive to chunking strategies compared to traditional context-agnostic embedding models
At Berlin Buzzwords, Alessio Vertemati and Andrea Ponti discussed how different document parsing and chunking strategies affect the performance of RAG pipelines.
Watch the full session: https://youtu.be/OMyEklV0G0E?si=P1GSTVShZYEKr7A5
--
Save the Date - Berlin Buzzword…
Can LLMs Replace Humans During Code Chunking?
Christopher Glasz, Emily Escamilla, Eric O. Scott, Anand Patel, Jacob Zimmer, Colin Diggs, Michael Doyle, Scott Rosen, Nitin Naik, Justin F. Brunelle, Samruddhi Thaker, Parthav Poudel, Arun Sridharan, Amit Madan, Doug Wendt, William Macke, Thomas Schill
https://arxiv.org/abs/2506.198…
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
Yilin Zhang, Xinran Zhao, Zora Zhiruo Wang, Chenyang Yang, Jiayi Wei, Tongshuang Wu
https://arxiv.org/abs/2506.15655
Reinforcement Learning with Action Chunking
Qiyang Li, Zhiyuan Zhou, Sergey Levine
https://arxiv.org/abs/2507.07969 https://arxiv.org/pdf/2507.07969 https://arxiv.org/html/2507.07969
arXiv:2507.07969v1 Announce Type: new
Abstract: We present Q-chunking, a simple yet effective recipe for improving reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks. Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning. Effective exploration and sample-efficient learning remain central challenges in this setting, as it is not obvious how the offline data should be utilized to acquire a good exploratory policy. Our key insight is that action chunking, a technique popularized in imitation learning where sequences of future actions are predicted rather than a single action at each timestep, can be applied to temporal difference (TD)-based RL methods to mitigate the exploration challenge. Q-chunking adopts action chunking by directly running RL in a 'chunked' action space, enabling the agent to (1) leverage temporally consistent behaviors from offline data for more effective online exploration and (2) use unbiased $n$-step backups for more stable and efficient TD learning. Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.
toXiv_bot_toot
Knowledge Compression via Question Generation: Enhancing Multihop Document Retrieval without Fine-tuning
Anvi Alex Eponon, Moein Shahiki-Tash, Ildar Batyrshin, Christian E. Maldonado-Sifuentes, Grigori Sidorov, Alexander Gelbukh
https://arxiv.org/abs/2506.13778
Many chatbots and search engines use Retrieval-Augmented Generation, but they often underperform compared to ChatGPT, frustrating users. At Berlin Buzzwords, Lewin von Saldern discussed how poor chunking strategies contribute to this issue and showcased improved techniques for building more reliable RAG systems.
Watch the full session: https://
Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward
Jiarui Yang, Bin Zhu, Jingjing Chen, Yu-Gang Jiang
https://arxiv.org/abs/2508.11143
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Sukjun Hwang, Brandon Wang, Albert Gu
https://arxiv.org/abs/2507.07955 https://arxiv.org/pdf/2507.07955 https://arxiv.org/html/2507.07955
arXiv:2507.07955v1 Announce Type: new
Abstract: Despite incredible progress in language models (LMs) in recent years, largely resulting from moving away from specialized models designed for specific tasks to general models based on powerful architectures (e.g. the Transformer) that learn everything from raw data, pre-processing steps such as tokenization remain a barrier to true end-to-end foundation models. We introduce a collection of new techniques that enable a dynamic chunking mechanism which automatically learns content -- and context -- dependent segmentation strategies learned jointly with the rest of the model. Incorporating this into an explicit hierarchical network (H-Net) allows replacing the (implicitly hierarchical) tokenization-LM-detokenization pipeline with a single model learned fully end-to-end. When compute- and data- matched, an H-Net with one stage of hierarchy operating at the byte level outperforms a strong Transformer language model operating over BPE tokens. Iterating the hierarchy to multiple stages further increases its performance by modeling multiple levels of abstraction, demonstrating significantly better scaling with data and matching a token-based Transformer of twice its size. H-Nets pretrained on English show significantly increased character-level robustness, and qualitatively learn meaningful data-dependent chunking strategies without any heuristics or explicit supervision. Finally, the H-Net's improvement over tokenized pipelines is further increased in languages and modalities with weaker tokenization heuristics, such as Chinese and code, or DNA sequences (nearly 4x improvement in data efficiency over baselines), showing the potential of true end-to-end models that learn and scale better from unprocessed data.
toXiv_bot_toot
How can you determine if your RAG system is functioning properly? At #bbuzz 2025, Roman Grebennikov presented a real-world case study on assessing the effectiveness of RAG in production. He discussed challenges such as messy data, chunking errors, and unexpected chatbot behaviour, while also sharing tools to confidently measure quality.
Watch the full session:
CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning
Dongchi Huang, Zhirui Fang, Tianle Zhang, Yihang Li, Lin Zhao, Chunhe Xia
https://arxiv.org/abs/2508.02219
Optimizing RAG Pipelines for Arabic: A Systematic Analysis of Core Components
Jumana Alsubhi, Mohammad D. Alahmadi, Ahmed Alhusayni, Ibrahim Aldailami, Israa Hamdine, Ahmad Shabana, Yazeed Iskandar, Suhayb Khayyat
https://arxiv.org/abs/2506.06339