I realise on the fediverse this is maybe asking for a flaming, but yesterday out of sheer curiosity I tried Claude for a simpleish coding task that I'd been putting off (largely inspired by @… 's latest on #theclimatebrink). The performance of Claude was seriously impressive. I am convinced the AI cycle is more than hype (and have been for a while), the chatbots have been a huge attention hogger, misleadingly so, while the serious work has been done elsewhere. (We are developing ML tools to supplement parts of our climate model workflows).
Now I'm wondering if there is any serious EU competition to Anthropic? - Mistral's codestral perhaps?
Because this kind of performance changes everything and we can't afford to lag behind...
#AIcoding #ML
Edit: here is the climate brink post I mentioned
https://www.theclimatebrink.com/p/the-ai-augmented-scientist
Trying to figure out how to lock down my CI runners as much as possible.
They need to be able to reach out to GitHub to HTTPS check out the source code, but I can't easily add a firewall rule for "can only clone this one repo from github".
Ideas beyond "open port 443 outbound to the entire internet"?
Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[2/5]:
- POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
Li, Cui, Wang, Ge, Huang, Li, Peng, Lu, Tashi, Wang, Dang
https://arxiv.org/abs/2511.09232 https://mastoxiv.page/@arXiv_csCL_bot/115541846907664054
- Beyond Elicitation: Provision-based Prompt Optimization for Knowledge-Intensive Tasks
Yunzhe Xu, Zhuosheng Zhang, Zhe Liu
https://arxiv.org/abs/2511.10465 https://mastoxiv.page/@arXiv_csCL_bot/115547607561282911
- $\pi$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
Dong Liu, Yanxuan Yu
https://arxiv.org/abs/2511.10696 https://mastoxiv.page/@arXiv_csCL_bot/115564418836654965
- Based on Data Balancing and Model Improvement for Multi-Label Sentiment Classification Performanc...
Zijin Su, Huanzhu Lyu, Yuren Niu, Yiming Liu
https://arxiv.org/abs/2511.14073 https://mastoxiv.page/@arXiv_csCL_bot/115575715073023141
- HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning
Alexis Correa-Guill\'en, Carlos G\'omez-Rodr\'iguez, David Vilares
https://arxiv.org/abs/2511.15355 https://mastoxiv.page/@arXiv_csCL_bot/115581410328165116
- Towards Hyper-Efficient RAG Systems in VecDBs: Distributed Parallel Multi-Resolution Vector Search
Dong Liu, Yanxuan Yu
https://arxiv.org/abs/2511.16681 https://mastoxiv.page/@arXiv_csCL_bot/115603508442305146
- Estonian WinoGrande Dataset: Comparative Analysis of LLM Performance on Human and Machine Transla...
Marii Ojastu, Hele-Andra Kuulmets, Aleksei Dorkin, Marika Borovikova, Dage S\"arg, Kairit Sirts
https://arxiv.org/abs/2511.17290 https://mastoxiv.page/@arXiv_csCL_bot/115604083224487885
- A Systematic Study of In-the-Wild Model Merging for Large Language Models
O\u{g}uz Ka\u{g}an Hitit, Leander Girrbach, Zeynep Akata
https://arxiv.org/abs/2511.21437 https://mastoxiv.page/@arXiv_csCL_bot/115621178703846052
- CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer
Lavish Bansal, Naman Mishra
https://arxiv.org/abs/2512.02711 https://mastoxiv.page/@arXiv_csCL_bot/115655090475535157
- Multilingual Medical Reasoning for Question Answering with Large Language Models
Pietro Ferrazzi, Aitor Soroa, Rodrigo Agerri
https://arxiv.org/abs/2512.05658 https://mastoxiv.page/@arXiv_csCL_bot/115683267711014189
- OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Convers...
Albrecht, Lehmann, Poltermann, Rudolph, Steigerwald, Stieler
https://arxiv.org/abs/2512.09804 https://mastoxiv.page/@arXiv_csCL_bot/115700409397020978
- Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, an...
Hanyu Cai, Binqi Shen, Lier Jin, Lan Hu, Xiaojing Fan
https://arxiv.org/abs/2512.12812 https://mastoxiv.page/@arXiv_csCL_bot/115729149622659403
- Beg to Differ: Understanding Reasoning-Answer Misalignment Across Languages
Ovalle, Ross, Ruder, Williams, Ullrich, Ibrahim, Sagun
https://arxiv.org/abs/2512.22712 https://mastoxiv.page/@arXiv_csCL_bot/115808161882146194
- Activation Steering for Masked Diffusion Language Models
Adi Shnaidman, Erin Feiglin, Osher Yaari, Efrat Mentel, Amit Levi, Raz Lapid
https://arxiv.org/abs/2512.24143 https://mastoxiv.page/@arXiv_csCL_bot/115819533211103315
- JMedEthicBench: A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese L...
Liu, Li, Niu, Zhang, Xun, Hou, Wang, Iwasawa, Matsuo, Hatakeyama-Sato
https://arxiv.org/abs/2601.01627 https://mastoxiv.page/@arXiv_csCL_bot/115847901607405421
- FACTUM: Mechanistic Detection of Citation Hallucination in Long-Form RAG
Dassen, Kotula, Murray, Yates, Lawrie, Kayi, Mayfield, Duh
https://arxiv.org/abs/2601.05866 https://mastoxiv.page/@arXiv_csCL_bot/115881545684182376
- {\dag}DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems
Zabir Al Nazi, Shubhashis Roy Dipta, Sudipta Kar
https://arxiv.org/abs/2601.06853 https://mastoxiv.page/@arXiv_csCL_bot/115887753245730019
- Symphonym: Universal Phonetic Embeddings for Cross-Script Name Matching
Stephen Gadd
https://arxiv.org/abs/2601.06932 https://mastoxiv.page/@arXiv_csCL_bot/115887767008671765
- LLMs versus the Halting Problem: Revisiting Program Termination Prediction
Sultan, Armengol-Estape, Kesseli, Vanegue, Shahaf, Adi, O'Hearn
https://arxiv.org/abs/2601.18987 https://mastoxiv.page/@arXiv_csCL_bot/115972010510378715
- MuVaC: A Variational Causal Framework for Multimodal Sarcasm Understanding in Dialogues
Diandian Guo, Fangfang Yuan, Cong Cao, Xixun Lin, Chuan Zhou, Hao Peng, Yanan Cao, Yanbing Liu
https://arxiv.org/abs/2601.20451 https://mastoxiv.page/@arXiv_csCL_bot/115977891530875024
toXiv_bot_toot
Premier Mai/May Day, 1974, Montréal
Source: Fonds La Presse, Bibliothèques et Archives nationales du Québec
#PremierMai
Anthropic is racing to contain the fallout after accidentally leaking Claude Code source code, issuing copyright takedown requests to remove 8,000 copies (Sam Schechner/Wall Street Journal)
https://www.wsj.com/tech/ai/anthropic-races-to-co…
Many of my fellow Filipinos hoped that Cardinal Tagle would become the first far eastern and Filipino pope in last year’s #conclave, but a new book claims that he was never a serious contender.
“Tagle, while charismatic and from the Philippines [...] was considered not to be a strong enough administrator, the book says. He also suffered harsh social media attacks which damaged his candidacy. …
Noch ein paar der zuletzt hier besonders häufig geteilten #News:
Bund veröffentlicht lizenzfreie KI-Werkzeuge
Noch ein paar der zuletzt hier besonders häufig geteilten #News:
Bund veröffentlicht lizenzfreie KI-Werkzeuge
Source: Google plans to release a screenless Fitbit band later this year; it will include basic features and require a paid subscription for more functionality (Samantha Kelly/Bloomberg)
https://www.bloomberg.com/news/articles/20