
2025-06-04 06:43:20
Here's an odd effect (stumbled on by accident). The blue loss curve is from a well-tuned BERT baseline (from the "cramming"paper).
The only thing I changed for the orange is to put a residual connection around each transformer block and to multiply the output of the block by a scalar parameter initialized to 0.
I'm surprised that has such a substantial impact. Not just on the performance, but on the shape of the loss curve.
First EM simulations on the switch logic board after sorting out some licensing issues with the new Sonnet version.
V19 is a lot faster than v18, it doubles the thread cap from 8 to 16 and also replaces the legacy SSE based matrix solver with an AVX-based version (geee, i wonder where they might have got that from...).
First test is the BGA launch for LC0_PHY0_LANE1_TX, a 5 Gbps QSGMII link but also representative of some of the 25G SERDES.
Return loss is better than -13 dB…
Diversity in Hydrogen-rich Envelope Mass of Type II Supernovae. (III). The mass-loss and evolutionary pathways of the red supergiant progenitors
Qiliang Fang, Takashi J. Moriya, Keiichi Maeda, Andris Dorozsmai, Javier Silva-Farf\'an
https://arxiv.org/abs/2507.14665
This https://arxiv.org/abs/2504.11284 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…