Tootfinder

@jtk@infosec.exchange
2025-11-29 16:08:44

The full Weekend Reads report is on vacation this week, it will return next week. But we don't want to leave you with nothing. Here is one piece that might have made it into the top 5:
* The Input Stack on Linux
https://venam.net/blog/unix/2025/11/27/input_device…

The Input Stack on Linux
Let's explore and deobfuscate the input stack on Linux. Our aim is to understand its components and what each does. Input handling can be divided into two parts, separated by a common layer. We’ll try to make sense of all this, one thing at a time, with a logical and coherent approach.

@arXiv_csCL_bot@mastoxiv.page
2025-10-10 11:05:49

Single layer tiny Co$^4$ outpaces GPT-2 and GPT-BERT
Noor Ul Zain, Mohsin Raza, Ahsan Adeel
https://arxiv.org/abs/2510.08404 https://arxiv.org/pdf/2510.084…

Single layer tiny Co$^4$ outpaces GPT-2 and GPT-BERT
We show that a tiny Co$^4$ machine(Adeel,2025) with a single layer, two heads, and 8M parameters, operating at an approximate cost of $O(N)$ (where $N$ is the number of input tokens), outpaces the BabyLM Challenge baselines GPT-2 (124M, 12 layers, $O(N^2))$ and GPT-BERT (30M, 12 layers, $O(N^2))$ in just two epochs, while both are trained for ten. Co$^4$ achieves orders-of-magnitude greater training efficiency on 10M tokens, demonstrating highly sample efficient pretraining. Using the BabyLM ch…

@arXiv_csCE_bot@mastoxiv.page
2025-10-03 07:53:21

ShapeGen3DCP: A Deep Learning Framework for Layer Shape Prediction in 3D Concrete Printing
Giacomo Rizzieri, Federico Lanteri, Liberato Ferrara, Massimiliano Cremonesi
https://arxiv.org/abs/2510.02009 …

ShapeGen3DCP: A Deep Learning Framework for Layer Shape Prediction in 3D Concrete Printing
This work introduces ShapeGen3DCP, a deep learning framework for fast and accurate prediction of filament cross-sectional geometry in 3D Concrete Printing (3DCP). The method is based on a neural network architecture that takes as input both material properties in the fluid state (density, yield stress, plastic viscosity) and process parameters (nozzle diameter, nozzle height, printing and flow velocities) to directly predict extruded layer shapes. To enhance generalization, some inputs are refo…

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:07:29

Learning What's Missing: Attention Dispersion and EMA Stabilization in Length Generalization
P\'al Zs\'amboki, Benjamin Levi, David Ansel Josef Smith, Mitansh Kagalwala, Arlington Kell, Samuel Liechty, Cong Wang
https://arxiv.org/abs/2510.08341

Learning What's Missing: Attention Dispersion and EMA Stabilization in Length Generalization
We study length generalization in transformers through the set complement task, where a model must predict a uniform distribution over tokens absent from an input sequence -- an ability central to board-game style reasoning. Our main theoretical result establishes two statements. First, we prove tight bounds on embedding and value dimensions for single-layer attention-only transformers. Second, we show that if such a model achieves balanced logit displacement at lengths 1 and 2, then it must ge…

@arXiv_csCL_bot@mastoxiv.page
2025-10-14 13:16:18

Deconstructing Attention: Investigating Design Principles for Effective Language Modeling
Huiyin Xue, Nafise Sadat Moosavi, Nikolaos Aletras
https://arxiv.org/abs/2510.11602 htt…

Deconstructing Attention: Investigating Design Principles for Effective Language Modeling
The success of Transformer language models is widely credited to their dot-product attention mechanism, which interweaves a set of key design principles: mixing information across positions (enabling multi-token interactions), sequence-dependent activations (where attention weights adapt to each input), a specific mathematical form (dot-product similarities plus softmax weighting), and coupling of queries and keys to evolving hidden states (grounding attention in the current layer). However, th…

@arXiv_eessAS_bot@mastoxiv.page
2025-10-14 08:49:18

Phase Aware Ear-Conditioned Learning for Multi-Channel Binaural Speaker Separation
Ruben Johnson Robert Jeremiah, Peyman Goli, Steven van de Par
https://arxiv.org/abs/2510.11366

Phase Aware Ear-Conditioned Learning for Multi-Channel Binaural Speaker Separation
Separating competing speech in reverberant environments requires models that preserve spatial cues while maintaining separation efficiency. We present a Phase-aware Ear-conditioned speaker Separation network using eight microphones (PEASE-8) that consumes complex STFTs and directly introduces a raw-STFT input to the early decoder layer, bypassing the entire encoder pathway to improve reconstruction. The model is trained end-to-end with an SI-SDR-based objective against direct-path ear targets, …

@arXiv_eessSP_bot@mastoxiv.page
2025-10-01 10:09:08

Neural Network State-Space Estimators
Minxing Sun, Li Miao, Qingyu Shen, Yao Mao, Qiliang Bao
https://arxiv.org/abs/2509.25959 https://arxiv.org/pdf/2509.2…

Neural Network State-Space Estimators
Classical state estimation algorithms rely on predefined target's state-space model, which complicates model derivation and limits adaptability when system dynamics change. Neural network based estimators offer a data-driven alternative, but rarely fuse classical estimation theory into their structure and demand large, pre-computed training sets. To overcome these limitations, we propose a unified state-space structure without target's state-space model and treats both the input-layer activatio…

@beeb@hachyderm.io
2025-12-13 19:51:54

Content warning: Advent of Code Day 11

Day 11 of #AdventOfCode is a classical graph problem like we're used to from previous years.
Unlike previously, I immediately thought of checking what the graph looked like with a visualization tool. Luckily, `petgraph` allows to export a graphviz file which can be then used to visualize the nodes and edges.
From that, it was clear that a few nodes were acting as "bridges" between largers subnets of nodes with no particular arrangement besides being directed towards the next "bridge" layer. Those bridge layers comprised 4 to 5 nodes in my input, and were the only ones with more than 6 incoming edges, so I used that as my filter criterion.
To gather them, I sorted the graph in topological order and chunked them by their position offset compared to the previous node. When doing this, all the nodes from a bridge layer end up being at most 20 positions away from the previous node in the sorted list.
Finally, I progressed through each subnet, collecting information about how many paths lead to each one of the end layer's nodes. By multiplying with all the paths leading to each start layer's node, we get the overall total number of paths.
#AoC #AoC2025 #AdventOfCode2025 #RustLang #rust

Tootfinder

Opt-in global Mastodon full text search. Join the index!