An Open-Source HW-SW Co-Development Framework Enabling Efficient Multi-Accelerator Systems
Ryan Albert Antonio, Joren Dumoulin, Xiaoling Yi, Josse Van Delm, Yunhao Deng, Guilherme Paim, Marian Verhelst
https://arxiv.org/abs/2508.14582
TokenSmith: Streamlining Data Editing, Search, and Inspection for Large-Scale Language Model Training and Interpretability
Mohammad Aflah Khan, Ameya Godbole, Johnny Tian-Zheng Wei, Ryan Wang, James Flemings, Krishna Gummadi, Willie Neiswanger, Robin Jia
https://arxiv.org/abs/2507.19419

TokenSmith: Streamlining Data Editing, Search, and Inspection for Large-Scale Language Model Training and Interpretability
Understanding the relationship between training data and model behavior during pretraining is crucial, but existing workflows make this process cumbersome, fragmented, and often inaccessible to researchers. We present TokenSmith, an open-source library for interactive editing, inspection, and analysis of datasets used in Megatron-style pretraining frameworks such as GPT-NeoX, Megatron, and NVIDIA NeMo. TokenSmith supports a wide range of operations including searching, viewing, ingesting, expor…
High-Resolution Directional Depth Electrodes: Open-Source FEM Lead-Field Modeling, Characterization, and Validation
Takfarinas Medani, Jace Willis, Chris Wright, Yash Vakilna, Ryan Shores, Raymundo Cassani, Anand Joshi, Richard Leahy, John Seymour, John Mosher
https://arxiv.org/abs/2508.13212