A few days back I tweeted and blogged about the brand-new RFC 9839, and also published the first draft of tiny Go library to help enforce the subsets defined in the RFC. Got lots of useful input on the library and have progressed it enough to do a v0.8.0 release: https://github.com/timbray/rfc9839…
from my link log —
libu8ident: Unicode security guidelines for programming language identifiers.
https://github.com/rurban/libu8ident
saved 2025-02-13
«Unicode is good. If you’re designing a data structure or protocol that has text fields, they should contain #Unicode characters encoded in #UTF8. There’s another question, though: “Which Unicode characters?” The answer is “Not all of them, please exclude some.”
This issue keeps coming up, so [
Three small announcements:
1. RFC 9839, a guide to which Unicode characters you should never use: https://www.rfc-editor.org/rfc/rfc9839.html
2. Blog piece with background and context, “RFC 9839 and Bad Unicode”:
Thermodynamics in a split Hilbert space: Quantum impurity at the edge of a one-dimensional superconductor
Pradip Kattel, Abay Zhakenov, Natan Andrei
https://arxiv.org/abs/2508.19330
Taylor$\unicode{x2013}$Aris dispersion of active particles in oscillatory channel flow
Bohan Wang, Weiquan Jiang, Li Zeng, Zi Wu, Ping Wang
https://arxiv.org/abs/2507.18241 http…
unicodelang: Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.
This network has 868 nodes and 1255 edges.
Tags: Informational, Relatedness, Weighted
Interesting… 🤔 "ICU4X - Solving i18n for client-side and resource-constrained environments" https://icu4x.unicode.org/
> Why ICU4X?
> Small and fast
> ICU4X floats like a butterfly and stings like a bee
😅🦋🐝
from my link log —
The modern text rendering pipeline: unicode, bidi, segmentation, shaping, …
https://www.newroadoldway.com/text1.html
saved 2025-06-24
Microcanonical simulated annealing: Massively parallel Monte Carlo simulations with sporadic random-number generation
M. Bernaschi, L. A. Fernandez, I. Gonz\'alez-Adalid Pemart\'in, E. Marinari, V. Martin-Mayor, G. Parisi, F. Ricci-Tersenghi, J. J. Ruiz-Lorenzo, D. Yllanes
https://arxiv.org/abs/2506.16240
It has been 0 days since the last unicode fuckup ....
Man, 2025 und es gibt grosse Firmen die noch immer kein sauberes konsistentes Encoding von nicht-ASCII Zeichen hinbekommen.
MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding
Renjie Li, Ruijie Ye, Mingyang Wu, Hao Frank Yang, Zhiwen Fan, Hezhen Hu, Zhengzhong Tu
https://arxiv.org/abs/2507.12463
Breaking the Baryon Density$\unicode{x2013}$Hubble Constant Degeneracy in Fast Radio Burst Applications with Associated Gravitational Waves
Joscha N. Jahns-Schindler, Laura G. Spitler
https://arxiv.org/abs/2508.14434
The Double-edged Sword of LLM-based Data Reconstruction: Understanding and Mitigating Contextual Vulnerability in Word-level Differential Privacy Text Sanitization
Stephen Meisenbacher, Alexandra Klymenko, Andreea-Elena Bodea, Florian Matthes
https://arxiv.org/abs/2508.18976
If your initial thought on reading about the "Initial Teaching Alphabet" is to check on Unicode status, please see: https://www.unicode.org/L2/L2025/25010-script-wg-report.pdf
The Unicode character 🗿 (U 1F5FF) is named "Moyai" which I thought it was typo mistake for "Moai" which are the stone statues in Easter Island, Chile.
Turns "Moyai" are statues in Niijima, Japan which were inspired on the ones from Easter Island.
This makes me a little disappointed but it makes me very happy that the Japanese like our statues.
#chile
Poset-Markov Channels: Capacity via Group Symmetry
Eray Unsal Atay, Eitan Levin, Venkat Chandrasekaran, Victoria Kostina
https://arxiv.org/abs/2506.19305 h…
unicodelang: Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.
This network has 868 nodes and 1255 edges.
Tags: Informational, Relatedness, Weighted
oh you know, just being pedantic about the unicode standard in the wee hours.
as you do.
@… There’s some weird shit in the dusty corners of Unicode: https://www.unicode.org/charts/PDF/U1FB00.pdf
haha, i like to poison my personal data - among others - by using a random combination of unicode homoglyphs. this is a new result on a package i got delivered.
OSTRICH2: Solver for Complex String Constraints
Matthew Hague, Denghang Hu, Artur Je\.z, Anthony W. Lin, Oliver Markgraf, Philipp R\"ummer, Zhilin Wu
https://arxiv.org/abs/2506.14363
Replaced article(s) found for math.MG. https://arxiv.org/list/math.MG/new
[1/1]:
- A note on Erd\H{o}s matrices and Marcus\unicode{x2013}Ree inequality
Aman Kushwaha, Raghavendra Tripathi
Slip electron flow in GaAs microscale constrictions
Daniil I. Sarypov, Dmitriy A. Pokhabov, Arthur G. Pogosov, Evgeny Yu. Zhdanov, Andrey A. Shevyrin, Alexander A. Shklyaev, Askhat K. Bakarov
https://arxiv.org/abs/2506.10276
unicodelang: Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.
This network has 868 nodes and 1255 edges.
Tags: Informational, Relatedness, Weighted
Measurement of the Dispersion$\unicode{x2013}$Galaxy Cross-Power Spectrum with the Second CHIME/FRB Catalog
Haochen Wang, Kiyoshi Masui, Shion Andrew, Emmanuel Fonseca, B. M. Gaensler, R. C. Joseph, Victoria M. Kaspi, Bikash Kharel, Adam E. Lanman, Calvin Leung, Lluis Mas-Ribas, Juan Mena-Parra, Kenzie Nimmo, Aaron B. Pearlman, Ue-Li Pen, J. Xavier Prochaska, Ryan Raikman, Kaitlyn Shin, Seth R. Siegel, Kendrick M. Smith, Ingrid H. Stairs
unicodelang: Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.
This network has 868 nodes and 1255 edges.
Tags: Informational, Relatedness, Weighted
unicodelang: Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.
This network has 868 nodes and 1255 edges.
Tags: Informational, Relatedness, Weighted
unicodelang: Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.
This network has 868 nodes and 1255 edges.
Tags: Informational, Relatedness, Weighted
unicodelang: Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.
This network has 868 nodes and 1255 edges.
Tags: Informational, Relatedness, Weighted
unicodelang: Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.
This network has 868 nodes and 1255 edges.
Tags: Informational, Relatedness, Weighted
Iion-motion simulations of a plasma-wakefield experiment at FLASHForward
D. Kalvik, P. Drobniak, F. Pe\~na, C. A. Lindstr{\o}m, J. Beinortaite, L. Boulton, P. Caminal, J. Garland, G. Loisch, J. B. Svensson, M. Th\'evenet, S. Wesch, J. Wood, J. Osterhoff R. D'Arcy, S. Diederichs
https://arxiv.org/abs/2505.24299
unicodelang: Languages spoken by country (2015)
A bipartite network of languages and the countries in which they are spoken, as estimated by Unicode. Edges are weighted by the proportion of the given country's population that is literate in a particular language.
This network has 868 nodes and 1255 edges.
Tags: Informational, Relatedness, Weighted