So one of the authors is Nicholas Carlini, who works for Anthropic. This is basically an ad for the three letter agencies to use Claude. It massively over-promises compared to what the actual paper says.
But, it is important. First, this is really about silencing people. The threat of identification is designed to make people afraid to talk online. There's a massive asymmetry between the fascists and the people. The fascists are weird racists and pedophiles who are obsessed with control. No one likes them. No one likes their ideas, because their ideas are creepy and bad.
When they talk about their ideas, that people should be murdered or kidnaped based on their skin color, that there should be a national dress code, that people's sex lives should be monitored, that children should be treated like objects that are owned by the parent (specifically, one parent), that people with different skin color or uteri should be considered as livestock, people fucking hate it because it's awful. When we talk about our ideas, that everyone should be able to eat and take care of themselves, that people who can't take care of themselves should be taken care of, that we should live in a society that values life, that we should live in harmony with nature, people like those ideas. When fascists out us for talking about those ideas, people support us. When we out people who are working as fascist goons those people have to face social consequences.
Everyone hates these people. The US government is currently less popular than it has ever been. The only way they can keep power is by making everyone think that they aren't extraordinarily unpopular. The only way to do that, the way authoritarian have always done it, is to make everyone afraid to talk.
But, yes, what this paper is saying is actually kind of bad. It looks like people who don't take any precautions at all in separating identities can be identified about 30% of the time (based on the results). It's unclear how this will actually work in the real world. Larger corpses will probably have more data, making connecting things easier.
This isn't as good as a human trying to dox someone. It's not going to work as well. It may only work in a small number of cases. There will be false positives (just like there are with people doing the work). It's probably not cheaper than hiring people. But it does mean that you can just dump money into a machine that has no ethical framework and get data out. That's the point. It's hard to find humans who will do evil shit like help dictatorships target human rights activists, but if a machine can do it for twice the price then it's a better deal for the dictatorship.
For most people, you just shouldn't care. This isn't for you. As long as you keep doing what you're doing, and you can keep everyone else doing what they're doing, then there aren't enough resources to actually target you. Even if they know who you are, there are just too many people who hate them and too few goons.
For people who might actually be targeted, there are a lot of things. First, keep in mind what you're putting into anonymous accounts. Any feature that's connected to your real life is a feature that can be extracted to identify you. This has always been true, it just may be easier to find now. Your identities should be totally siloed. It's also harder to identify you if you're writing anonymously as a collective. Collectives are better anyway because they can help check your thinking. When you write as a collective, you can help clean up each other's personal details and language. A collective develops its own voice, which is distinct from individual contributors. If you do this, and you also present your work as being from one "person," then it becomes even harder for anyone (systems or individuals) to really figure it out.
I'm not going to do a full deep dive on this because I just don't have time, but your existing threat model should *already cover these threats* if you need to make sure your writing remains anonymous.
This paper doesn't present any novel methodologies. It just extracts a bunch of features, which a human would extract as notes, and tries to correlate those between identities, which is how human researchers work. Linguistic forensics were mentioned (not by name) in the paper, but the actual methodology doesn't actually seem to use them.
So a thing with less ethics can do a worse job for more money (when adjusted for the real, not investor deflated, price of tokens). It's worth knowing. It's not the end of the world, but it is a good reminder to check your threat model and make sure it's up to date.