2026-03-30 13:40:13
Fun playing ARC-AGI-3 , puzzles that the most advanced AI-models can only solve for 1% 😀
Illustrates how AI models look extremely smart but are at the same time quite dumb.
#AI
Fun playing ARC-AGI-3 , puzzles that the most advanced AI-models can only solve for 1% 😀
Illustrates how AI models look extremely smart but are at the same time quite dumb.
#AI
Technology and Responsibility: Reflections on the New Tasks of Ethics
Hans Jonas (1973)
#etica
Anthropic adds dynamic workflows to Claude Code, enabling hundreds of subagents to run in parallel for complex engineering tasks such as framework migrations (Claude)
https://claude.com/blog/introducing-dynamic-workflows-in-claude-code
🐱 CuaBot is a co-op computer-use CLI that gives any coding agent a seamless sandbox — individual windows appear natively on your desktop with H.265, shared clipboard & audio, plus built-in support for agent-browser and iOS/Android devices
📊 Cua-Bench evaluates computer-use agents on OSWorld, ScreenSpot, Windows Arena & custom tasks, with trajectory export for training and RL environments
Inside the rise and fall of Sora, whose team worked separately from OpenAI's core research team, as OpenAI shuts down Sora and redirects compute to other tasks (Wall Street Journal)
https://www.wsj.com/tech/ai/the-sudden-fall-of-openais-mo…
To me, Banks is the first prophet of Judith and Octavia's Jihad.
> One of the most important tasks in setting up and running a stable and internally content civilisation is finding an acceptable balance between the desire for freedom of choice in one's actions (and the freedom from mortal fear in one's life) and the need to feel that even in a society so self-correctingly Utopian one is still contributing something.
A Few Notes on the Culture, by Iain M Banks
A case study of using AI to aid election coverage at Bay City News, starting in 2024 and refining later, aiming to reduce journalists' manual, repetitive tasks (Ciara Zavala/Local News Matters)
https://localnewsmatters.org/2026/04/2
Essentially, we seem to know that super intelligence can't exist due to computability constraints. (You can't build any computer big or powerful enough to actually be the AGI god they're trying to create.) But I don't know if we have any idea what the limit is before that.
LLMs (and SLMs) are very efficient at very specific tasks, far better than humans. But they are not better for a lot of tasks. It may just always be prohibitively expensive to use LLMs for certain tasks.
We've already run in to training ceilings where additional training decreases performance. We've run in to security problems (like guard rails basically can't ever actually exist because there are infinitely many ways around them, and those ways can literally just be solved for. See https://llm-attacks.org/)
🖥️ #Cua is open-source infrastructure for Computer-Use Agents — build, benchmark & deploy #AI agents that see screens, click buttons & complete tasks autonomously across full desktops #opensource
Compressing Transformer Language Models via Matrix Product Operator Decomposition: A Case Study on PicoGPT
Younes Javanmard, Tanmoy Pandit, Masoud Mardani
https://arxiv.org/abs/2603.28534 https://arxiv.org/pdf/2603.28534 https://arxiv.org/html/2603.28534
arXiv:2603.28534v1 Announce Type: new
Abstract: Transformer-based language models achieve strong performance across NLP tasks, but their quadratic parameter scaling with hidden dimension makes deployment on resource-constrained hardware expensive. We study Matrix Product Operator (MPO) decomposition as a principled compression method for transformers. MPO factorises weight matrices into chains of low-rank cores, with approximation quality controlled by the bond dimension chi. We replace every nn.Linear layer in PicoGPT, a GPT-2-style character-level language model with about 1M parameters, with an MPOLinear module parameterised as an MPO chain. Cores are initialised either by TT-SVD from pretrained dense weights or from random initialisation, and trained using standard PyTorch autograd without a custom backward pass. We derive balanced factorisation schemes for the five distinct weight shapes in PicoGPT and evaluate bond dimensions chi in {4, 8, 16, 32} on Tiny Shakespeare. MPO compression achieves up to 13x compression per transformer block at chi = 4. At chi = 16, the model uses 191,872 parameters instead of 1,020,224 while retaining 97.7% of baseline token accuracy (51.6% vs 52.8%). Reconstruction error follows the expected trend and is lower for three-site than two-site factorisations at the same bond dimension. The chi = 8 model gives the best accuracy per parameter, exceeding the dense baseline by 2.7x on this metric. These results show that MPO parameterisation is a practical and theoretically grounded alternative to low-rank methods and unstructured pruning for transformer compression.
toXiv_bot_toot
Xiaomi open sources MiMo-V2.5 and MiMo-V2.5-Pro under the MIT License, saying both models are among the most efficient available for agentic "claw" tasks (Carl Franzen/VentureBeat)
https://venturebeat.com/ai/open-source…
I sometimes write scripts in Windows #PowerShell because it's a programming tool that is widely available in my environment without requiring any additional installs. I find the syntax to be clunky, but it's capable.
As I've noted before, I'm required to use Copilot at work, so I use it for tasks like this to burn tokens and earn work termination deferment points.
1/x…
#superproductivity app is great. There aren't many apps I can run on my locked down computer at work. But this one is possible to sync via webdav so I installed a minimal webdav just to syncronize the json and md file the app generates. It work flawlessly! I have finally found a way to take my todo's between work and home.
Age makes remembering things more and more tr…
Google I/O showed how the industry has seized upon LLMs as the "future" of software. But that couldn't be further from the truth! It is an intensification of the old way. The same features as before, only MORE and FASTER.
It's the same mediocre vision execs have had for 40 years: give customers more tasks to do on your platform, sell them tools to solve the problems you created.
But the conditions that made this business model possible are collapsing.
The other one I truly love is GitUp (https://gitup.co). Its visualization handles certain specific tasks better than anything else — tasks where I’m more concerned about the shape of the commit graph than the contents of individual commits.
Because of the way it does live updates of repo state and offers a whole-commit-graph-level undo, I’ll sometimes keep it open in the background while doing some fiddly thing in another tool (Fork, CLI, whatever) just so I can see what the ^*@# is happening.
Alas, its lack of support for commit signing means I use it less and less.
It's amazing how bad I am at estimating how long simple tasks* will take.
I had a nearly finished manuscript. I just needed to tidy a few things in it and then upload for review
Two hours tops I told my family.
5.5 hours later...
#AcademicChatter
Structural-Ambiguity-Aware Translation from Natural Language to Signal Temporal Logic
Kosei Fushimi, Kazunobu Serizawa, Junya Ikemoto, Kazumune Hashimoto
https://arxiv.org/abs/2603.28426 https://arxiv.org/pdf/2603.28426 https://arxiv.org/html/2603.28426
arXiv:2603.28426v1 Announce Type: new
Abstract: Signal Temporal Logic (STL) is widely used to specify timed and safety-critical tasks for cyber-physical systems, but writing STL formulas directly is difficult for non-expert users. Natural language (NL) provides a convenient interface, yet its inherent structural ambiguity makes one-to-one translation into STL unreliable. In this paper, we propose an \textit{ambiguity-preserving} method for translating NL task descriptions into STL candidate formulas. The key idea is to retain multiple plausible syntactic analyses instead of forcing a single interpretation at the parsing stage. To this end, we develop a three-stage pipeline based on Combinatory Categorial Grammar (CCG): ambiguity-preserving $n$-best parsing, STL-oriented template-based semantic composition, and canonicalization with score aggregation. The proposed method outputs a deduplicated set of STL candidates with plausibility scores, thereby explicitly representing multiple possible formal interpretations of an ambiguous instruction. In contrast to existing one-best NL-to-logic translation methods, the proposed approach is designed to preserve attachment and scope ambiguity. Case studies on representative task descriptions demonstrate that the method generates multiple STL candidates for genuinely ambiguous inputs while collapsing unambiguous or canonically equivalent derivations to a single STL formula.
toXiv_bot_toot
Triomics, which is building an AI-powered platform to help oncologists automate data-heavy tasks, raised a $22M Series B, following a $15M Series A in 2024 (Marina Temkin/TechCrunch)
https://techcrunch.com/2026/05/27/triomics-nabs-22m-to-b…
I will be so glad when the members are back from leave and we get the new person up-to-speed. Multiple Impact Assessments for several Change Requests and Request For Change. Regular on-call tasks. Forty-eight (and count) security requests. Multiple questions on several subjects for the system.
I AM NOT APOLOGIZING to anyone about the speed I am working at after ten hours of work (and at least four more to go). I have a triage list and if you are not in the top sixteen items on the…
Sources: Amazon has shut down an internal leaderboard that tracked employees' use of AI tools after workers tried to boost their scores with needless tasks (Rafe Rosner-Uddin/Financial Times)
https://www.ft.com/content/b1a62a7f-6df5-4c90-94ce-64ce9c9961b6
…
from my link log —
Async Rust: futures, tasks, wakers; oh my!
https://msarmi9.github.io/posts/async-rust/
saved 2021-06-22 https://d…
Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[2/5]:
- POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
Li, Cui, Wang, Ge, Huang, Li, Peng, Lu, Tashi, Wang, Dang
https://arxiv.org/abs/2511.09232 https://mastoxiv.page/@arXiv_csCL_bot/115541846907664054
- Beyond Elicitation: Provision-based Prompt Optimization for Knowledge-Intensive Tasks
Yunzhe Xu, Zhuosheng Zhang, Zhe Liu
https://arxiv.org/abs/2511.10465 https://mastoxiv.page/@arXiv_csCL_bot/115547607561282911
- $\pi$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
Dong Liu, Yanxuan Yu
https://arxiv.org/abs/2511.10696 https://mastoxiv.page/@arXiv_csCL_bot/115564418836654965
- Based on Data Balancing and Model Improvement for Multi-Label Sentiment Classification Performanc...
Zijin Su, Huanzhu Lyu, Yuren Niu, Yiming Liu
https://arxiv.org/abs/2511.14073 https://mastoxiv.page/@arXiv_csCL_bot/115575715073023141
- HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning
Alexis Correa-Guill\'en, Carlos G\'omez-Rodr\'iguez, David Vilares
https://arxiv.org/abs/2511.15355 https://mastoxiv.page/@arXiv_csCL_bot/115581410328165116
- Towards Hyper-Efficient RAG Systems in VecDBs: Distributed Parallel Multi-Resolution Vector Search
Dong Liu, Yanxuan Yu
https://arxiv.org/abs/2511.16681 https://mastoxiv.page/@arXiv_csCL_bot/115603508442305146
- Estonian WinoGrande Dataset: Comparative Analysis of LLM Performance on Human and Machine Transla...
Marii Ojastu, Hele-Andra Kuulmets, Aleksei Dorkin, Marika Borovikova, Dage S\"arg, Kairit Sirts
https://arxiv.org/abs/2511.17290 https://mastoxiv.page/@arXiv_csCL_bot/115604083224487885
- A Systematic Study of In-the-Wild Model Merging for Large Language Models
O\u{g}uz Ka\u{g}an Hitit, Leander Girrbach, Zeynep Akata
https://arxiv.org/abs/2511.21437 https://mastoxiv.page/@arXiv_csCL_bot/115621178703846052
- CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer
Lavish Bansal, Naman Mishra
https://arxiv.org/abs/2512.02711 https://mastoxiv.page/@arXiv_csCL_bot/115655090475535157
- Multilingual Medical Reasoning for Question Answering with Large Language Models
Pietro Ferrazzi, Aitor Soroa, Rodrigo Agerri
https://arxiv.org/abs/2512.05658 https://mastoxiv.page/@arXiv_csCL_bot/115683267711014189
- OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Convers...
Albrecht, Lehmann, Poltermann, Rudolph, Steigerwald, Stieler
https://arxiv.org/abs/2512.09804 https://mastoxiv.page/@arXiv_csCL_bot/115700409397020978
- Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, an...
Hanyu Cai, Binqi Shen, Lier Jin, Lan Hu, Xiaojing Fan
https://arxiv.org/abs/2512.12812 https://mastoxiv.page/@arXiv_csCL_bot/115729149622659403
- Beg to Differ: Understanding Reasoning-Answer Misalignment Across Languages
Ovalle, Ross, Ruder, Williams, Ullrich, Ibrahim, Sagun
https://arxiv.org/abs/2512.22712 https://mastoxiv.page/@arXiv_csCL_bot/115808161882146194
- Activation Steering for Masked Diffusion Language Models
Adi Shnaidman, Erin Feiglin, Osher Yaari, Efrat Mentel, Amit Levi, Raz Lapid
https://arxiv.org/abs/2512.24143 https://mastoxiv.page/@arXiv_csCL_bot/115819533211103315
- JMedEthicBench: A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese L...
Liu, Li, Niu, Zhang, Xun, Hou, Wang, Iwasawa, Matsuo, Hatakeyama-Sato
https://arxiv.org/abs/2601.01627 https://mastoxiv.page/@arXiv_csCL_bot/115847901607405421
- FACTUM: Mechanistic Detection of Citation Hallucination in Long-Form RAG
Dassen, Kotula, Murray, Yates, Lawrie, Kayi, Mayfield, Duh
https://arxiv.org/abs/2601.05866 https://mastoxiv.page/@arXiv_csCL_bot/115881545684182376
- {\dag}DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems
Zabir Al Nazi, Shubhashis Roy Dipta, Sudipta Kar
https://arxiv.org/abs/2601.06853 https://mastoxiv.page/@arXiv_csCL_bot/115887753245730019
- Symphonym: Universal Phonetic Embeddings for Cross-Script Name Matching
Stephen Gadd
https://arxiv.org/abs/2601.06932 https://mastoxiv.page/@arXiv_csCL_bot/115887767008671765
- LLMs versus the Halting Problem: Revisiting Program Termination Prediction
Sultan, Armengol-Estape, Kesseli, Vanegue, Shahaf, Adi, O'Hearn
https://arxiv.org/abs/2601.18987 https://mastoxiv.page/@arXiv_csCL_bot/115972010510378715
- MuVaC: A Variational Causal Framework for Multimodal Sarcasm Understanding in Dialogues
Diandian Guo, Fangfang Yuan, Cong Cao, Xixun Lin, Chuan Zhou, Hao Peng, Yanan Cao, Yanbing Liu
https://arxiv.org/abs/2601.20451 https://mastoxiv.page/@arXiv_csCL_bot/115977891530875024
toXiv_bot_toot
EyeBrain: Left and Right Brain Lateralization Activity Classification Through Pupil Diameter and Fixation Duration
Ko Watanabe, Pooja Pol, Nicolas Gro{\ss}mann, Shoya Ishimaru, Andreas Dengel
https://arxiv.org/abs/2604.23562 https://arxiv.org/pdf/2604.23562 https://arxiv.org/html/2604.23562
arXiv:2604.23562v1 Announce Type: new
Abstract: The relationship between brain lateralization and cognitive functions is well-documented. The left hemisphere primarily handles tasks such as language and arithmetic, while the right hemisphere is involved in creative activities like drawing and music perception. Eye-tracking technology has shown the potential to reveal cognitive states by measuring ocular metrics such as pupil diameter and fixation duration. However, the ability to distinguish lateralized brain activity using these ocular metrics remains underexplored. Here, we demonstrate that pupil diameter and fixation duration can effectively classify left and right brain hemisphere activities. We obtained a considerably high classification performance, with an F1 score of 0.894. The results suggest that ocular metrics are robust indicators of lateralized brain activity and can be applied in cognitive monitoring and neurorehabilitation. Our future work expands on this by integrating these methods into real-time applications EyeBrain, potentially broadening their use across various cognitive and neurological domains.
toXiv_bot_toot
well, die you already create #OpenSCAD models using agentic #generativeAI How did it turn out? All at once or step by step? What is your reciept?
I did this router wall mount and overall it worked out well when split in little steps of how I would do this. When I prompted just one single b…
@… do you think folks are using it in a separate profile? The marketed tasks all mention access to automate private info
From all the AI stuff thats been talked about here at KotlinConf this is the most honest slide
#Kotlin #KotlinConf #AI
Automating Computational Chemistry Workflows via OpenClaw and Domain-Specific Skills
Mingwei Ding, Chen Huang, Yibo Hu, Yifan Li, Zitian Lu, Xingtai Yu, Duo Zhang, Wenxi Zhai, Tong Zhu, Qiangqiang Gu, Jinzhe Zeng
https://arxiv.org/abs/2603.25522 https://arxiv.org/pdf/2603.25522 https://arxiv.org/html/2603.25522
arXiv:2603.25522v1 Announce Type: new
Abstract: Automating multistep computational chemistry tasks remains challenging because reasoning, workflow specification, software execution, and high-performance computing (HPC) execution are often tightly coupled. We demonstrate a decoupled agent-skill design for computational chemistry automation leveraging OpenClaw. Specifically, OpenClaw provides centralized control and supervision; schema-defined planning skills translate scientific goals into executable task specifications; domain skills encapsulate specific computational chemistry procedures; and DPDispatcher manages job execution across heterogeneous HPC environments. In a molecular dynamics (MD) case study of methane oxidation, the system completed cross-tool execution, bounded recovery from runtime failures, and reaction network extraction, illustrating a scalable and maintainable approach to multistep computational chemistry automation.
toXiv_bot_toot
Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design
Bin Zhu, Qianghuai Jia, Tian Lan, Junyang Ren, Feng Gu, Feihu Jiang, Longyue Wang, Zhao Xu, Weihua Luo
https://arxiv.org/abs/2603.28376 https://arxiv.org/pdf/2603.28376 https://arxiv.org/html/2603.28376
arXiv:2603.28376v1 Announce Type: new
Abstract: Deep research agents autonomously conduct open-ended investigations, integrating complex information retrieval with multi-step reasoning across diverse sources to solve real-world problems. To sustain this capability on long-horizon tasks, reliable verification is critical during both training and inference. A major bottleneck in existing paradigms stems from the lack of explicit verification mechanisms in QA data synthesis, trajectory construction, and test-time scaling. Errors introduced at each stage propagate downstream and degrade the overall agent performance. To address this, we present Marco DeepResearch, a deep research agent optimized with a verification-centric framework design at three levels: \textbf{(1)~QA Data Synthesis:} We introduce verification mechanisms to graph-based and agent-based QA synthesis to control question difficulty while ensuring answers are unique and correct; \textbf{(2)~Trajectory Construction:} We design a verification-driven trajectory synthesis method that injects explicit verification patterns into training trajectories; and \textbf{(3)~Test-time scaling:} We use Marco DeepResearch itself as a verifier at inference time and effectively improve performance on challenging questions. Extensive experimental results demonstrate that our proposed Marco DeepResearch agent significantly outperforms 8B-scale deep research agents on most challenging benchmarks, such as BrowseComp and BrowseComp-ZH. Crucially, under a maximum budget of 600 tool calls, Marco DeepResearch even surpasses or approaches several 30B-scale agents, like Tongyi DeepResearch-30B.
toXiv_bot_toot
It's taken me most of the day, but I am finally down to fewer than 250 emails in my inbox. Loads of different tasks dealt with too - I guess I should go to bed now, but I also really need to finish this paper now...
#AcademicChatter
Google unveils Continue On, a new feature in Android 17 that will let users move tasks between Android devices, similar to Apple's Handoff feature (Ben Schoon/9to5Google)
https://9to5google.com/2026/05/19/android-17s-…
I am going to do some garden tasks today, but a bit later, when it’s warmed up a little.
Trying out Jules, a coding agent from Google that is similar to Claude Code. But it's all hosted: you use a web UI to talk to it, it checks out code from GitHub and runs it in containers. And it operates asynchronously, you give it tasks and it comes back 5-15 minutes later with work done. I like it quite a bit, but there's a question whether the Gemini models are as good for coding as Claude.
"Across a variety of tasks, including mathematical reasoning and reading comprehension, we find that although AI assistance improves performance in the short-term, people perform significantly worse without AI and are more likely to give up."
https://arxiv.org/abs/2604.04721
Beehiiv now lets creators manage their accounts through AI platforms; the first iteration of Beehiiv MCP supports subscriber analysis and SEO optimization (Sara Fischer/Axios)
https://www.axios.com/2026/03/24/beehiiv-creator-ai-chatbot-mcp
I found a solid iOS client for using my LM Studio-hosted models.
The Web Agent is interesting - launches Google in an in-app browser, parses the SERP, and delivers the results of your query back in the chat window - all while you watch what it's doing.
Qwen 3.5 35b performs well with these tasks, even if it's a little slow for interactive tasks on my hardware.
Find the app here:
Two research papers describe how Google's Co-Scientist and nonprofit FutureHouse's AI tools can succeed at drug-retargeting tasks by forming hypotheses (John Timmer/Ars Technica)
https://arstechnica.com/science/2026/05/two-…
Upcoming Linux tasks.
Downloaded and made a bootable USB with the latest #mxlinux. Old laptop backed up and ready to convert.
I have to try out the new version and then how well WINE works and for one of the online games I play. If it all goes well the laptop will be upgraded from Windows 10 to #mxlinux
Et meme der gŸr ondt på grund af dets relevans. #ADHD
Adaptive Block-Scaled Data Types
Jack Cook, Hyemin S. Lee, Kathryn Le, Junxian Guo, Giovanni Traverso, Anantha P. Chandrakasan, Song Han
https://arxiv.org/abs/2603.28765 https://arxiv.org/pdf/2603.28765 https://arxiv.org/html/2603.28765
arXiv:2603.28765v1 Announce Type: new
Abstract: NVFP4 has grown increasingly popular as a 4-bit format for quantizing large language models due to its hardware support and its ability to retain useful information with relatively few bits per parameter. However, the format is not without limitations: recent work has shown that NVFP4 suffers from its error distribution, resulting in large amounts of quantization error on near-maximal values in each group of 16 values. In this work, we leverage this insight to design new Adaptive Block-Scaled Data Types that can adapt to the distribution of their input values. For four-bit quantization, our proposed IF4 (Int/Float 4) data type selects between FP4 and INT4 representations for each group of 16 values, which are then scaled by an E4M3 scale factor as is done with NVFP4. The selected data type is denoted using the scale factor's sign bit, which is currently unused in NVFP4, and we apply the same insight to design formats for other bit-widths, including IF3 and IF6. When used to quantize language models, we find that IF4 outperforms existing 4-bit block-scaled formats, achieving lower loss during quantized training and achieving higher accuracy on many tasks in post-training quantization. We additionally design and evaluate an IF4 Multiply-Accumulate (MAC) unit to demonstrate that IF4 can be implemented efficiently in next-generation hardware accelerators. Our code is available at https://github.com/mit-han-lab/fouroversix.
toXiv_bot_toot
windsurfers: Windsurfers network (1986)
A network of interpersonal contacts among windsurfers in southern California during the Fall of 1986. The edge weights indicate the perception of social affiliations majored by the tasks in which each individual was asked to sort cards with other surfer’s name in the order of closeness.
This network has 43 nodes and 336 edges.
Tags: Social, Offline, Weighted
On the radio, I hear the German research foundation #DFG defend its recent move to allow #AI in project reviews, just with local setups, just for language clarity – lots of reservations.
I then listen to the most recent episode of Mél’s Data Fix podcast. An anonymous guest (🔥) talks about their daily…
Adobe unveils Firefly AI Assistant, which can orchestrate and execute multistep tasks across Creative Cloud apps, available in public beta in the coming weeks (Ivan Mehta/TechCrunch)
https://techcrunch.com/2026/04/15/adobes-new…
Triple Configuration of Brain Networks Based on Recurrent Neural Networks: The Synergistic Effects of Exogenous Stimuli, Task Demands, and Spontaneous Activity
Binghao Yang, Guangzong Chen
https://arxiv.org/abs/2604.23525 https://arxiv.org/pdf/2604.23525 https://arxiv.org/html/2604.23525
arXiv:2604.23525v1 Announce Type: new
Abstract: The foundation of cognitive flexibility and higher-order intelligence lies in the functional structure and activity of brain networks, which can be dynamically configured by both external environments and internal states. However, decoding these dynamics from high-dimensional neural data remains a challenge. In this study, we propose a computational framework using Recurrent Neural Networks (RNNs) with neural dynamic constraints to model source-localized resting-state EEG data from $114$ participants. We aim to clarify the "triple brain network configurations" driven by exogenous and endogenous factors, including external stimuli, information processing tasks, and spontaneous activities. Our model identifies the parietal network as a critical hub supporting these multiple configuration patterns. Furthermore, we reveal that the anterior and posterior parietal regions exhibit distinct functional specializations under different stimulus modalities. By formalizing a triple configuration framework, this work separates latent factors of brain dynamics and underscores the computational significance of parietal regions in orchestrating higher-order intelligence.
toXiv_bot_toot
It‘s always amazing how KPI fail to improve the performance they are expected to improve.
Amazon staff use AI tool for unnecessary tasks to inflate usage scores https://www.ft.com/content/8ee0d3ef-9548-422d-8ff1-ebd48ad4b2ca
Two episodes into the new #taskmaster series, and it's... okay.
It's a low energy group, which is not my style.
I love chaotic, energetic contestants that bomb and blast their way through tasks.
Thus far, that's not this series 🙁
Anthropic rolls out a computer use feature for Claude Cowork and the Claude Code desktop app, in research preview on macOS for Pro and Max subscribers (Blake Stimac/CNET)
https://www.cnet.com/tech/services-and-software/claude-control-your-comput…
Oh joy, great start to my work week.
Sole person for the week doing all tasks. I started at 05:15 and zero issues. Around 08:20 it went into the electronic septic tank. Ten to fifteen minute response time for anything that needed the network.
I did what I know we would be asked. I rebooted the system. Didn't work, and I let the team know through our chat that they may have issues too. Called helpdesk and it took ten minutes for them to remote to my machine. Determined …
There's a new "design is dead, because AI" piece (thinly disguised marketing from Anthropic). But looking past the hype headlines, their claims cover purely production-stage tasks.
When it comes to the work of understanding user needs and evaluating the opportunity space, AI actually makes your thinking worse. Studies show that it alienates you from users and colleagues, and flattens your thinking.
We need more human-centered practice, not less.
Physical Intelligence says its new model, π0.7, can direct robots on tasks they weren't trained on, an "early sign" of generalization, surprising researchers (Connie Loizos/TechCrunch)
https://techcrunch.com/202…
windsurfers: Windsurfers network (1986)
A network of interpersonal contacts among windsurfers in southern California during the Fall of 1986. The edge weights indicate the perception of social affiliations majored by the tasks in which each individual was asked to sort cards with other surfer’s name in the order of closeness.
This network has 43 nodes and 336 edges.
Tags: Social, Offline, Weighted
Meta plans to reduce its reliance on third-party vendors for content moderation, in favor of AI tools that it says are better at spotting scams and other tasks (Kurt Wagner/Bloomberg)
https://www.bloomberg.com/news/articles/20
DoorDash launches Tasks, a new app that pays delivery couriers in some markets to submit video clips and complete other tasks for training AI models (Natalie Lung/Bloomberg)
https://www.bloomberg.com/news/articles/2026-03-…
Cohere launches Transcribe, its first voice model; the 2B-parameter, open-source speech recognition model handles tasks like notetaking and speech analysis (Ivan Mehta/TechCrunch)
https://techcrunch.com/2026/03/26/cohere-launches-a…
NY-based Blossom Health, which makes an "AI copilot" to augment psychiatrists' clinical decisions and automate office tasks, raised $20M in seed and Series A (Lily Mae Lazarus/Fortune)
https://fortune.com/2026/03/26/exclusive-blossom-…
The incredible analytical work of John Burn Murdoch @… along with some other colleagues is one of the main reasons I subscribe to the FT. It's rather expensive but absolutely worth it.
https://fediscience.org/@Ruth_Mottram/116582689708855765
Ruth_Mottram - This is a fascinating and beautifully illustrated analysis, exploring convincingly why birth rates are crashing basically everywhere and while there are certain many factors, the smoking gun is actually a smartphone.
So what to do about it? I think I agree with the conclusions, housing and financial support is one element, equality between sexes in household tasks certainly another, but finally, perhaps our job as parents is to inculcate the habit of socialising with others into our kids, especially when they get to the teenage years.
Why birth rates are falling everywhere all at once - a limited 🎁 https://giftarticle.ft.com/giftarticle/actions/redeem/8bf630d4-6e20-42c7-bb33-e98dd6a01571
OpenRouter raised $113M led by CapitalG, a source says at a $1.3B valuation, and now processes 25T tokens across 400 models weekly, up from 5T six months ago (Michael J. de la Merced/New York Times)
https://www.nytimes.com/2026/05/26/business/dealb…
Q&A with Google SVP James Manyika on AI's ability to automate tasks versus occupations, his optimism about the labor market despite AI-driven layoffs, and more (Casey Newton/Platformer)
https://www.platformer.news/james-manyika-google-ai-jobs-io-2026/
windsurfers: Windsurfers network (1986)
A network of interpersonal contacts among windsurfers in southern California during the Fall of 1986. The edge weights indicate the perception of social affiliations majored by the tasks in which each individual was asked to sort cards with other surfer’s name in the order of closeness.
This network has 43 nodes and 336 edges.
Tags: Social, Offline, Weighted
A growing number of execs are creating AI digital twins to manage tasks; Reid Hoffman says "Reid AI" has delivered 75 addresses and presentations since 2024 (Joann S. Lublin/Wall Street Journal)
https://www.wsj.com/tech/ai/ai-agents-work
Cursor releases Composer 2.5, saying it's better at sustained work on long-running tasks and follows complex instructions more reliably; it's built on Kimi K2.5 (Cursor)
https://cursor.com/blog/composer-2-5
Beehiiv now lets creators manage their accounts through AI platforms; the first iteration of Beehiiv MCP supports subscriber analysis and SEO optimization (Sara Fischer/Axios)
https://www.axios.com/2026/03/24/beehiiv-creator-ai-chatbot-mcp
GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, and GPT-5.5 Pro to Pro, Business, and Enterprise users in ChatGPT (The Verge)
https://www.theverge.com/ai-artificial-intelligence/917612/openai-gpt-5-5-chatgpt…
OpenAI unveils GPT 5.5, intended to be better at completing work without much direction, saying the model "kind of figures it out, deals with ambiguity" (Rachel Metz/Bloomberg)
https://www.bloomberg.com/news/articles/20
OpenAI releases ChatGPT for Clinicians, a tool for medical tasks like documentation and research, free for verified physicians, pharmacists, and more in the US (OpenAI)
https://openai.com/index/making-chatgpt-better-for-clinicians
OpenAI announces workspace agents in ChatGPT, letting teams create Codex-powered shared agents for complex tasks, and says they are "an evolution of GPTs" (OpenAI)
https://openai.com/index/introducing-workspace-agents-in-chatgpt/
Hands-on with Gemini task automation on mobile: it's super impressive despite being very slow and failing at some tasks; it can order food, book Ubers, and more (Allison Johnson/The Verge)
https://www.theverge.com/tech/898282/gemini-task-automation-uber…
Cohere releases Command A , a sparse MoE open model built for agentic tasks, with 218B total and 25B active parameters, its first under the Apache 2.0 license (Carl Franzen/VentureBeat)
https://venturebeat.com/technology/c…
Moonshot introduces Kimi K2.6, an open-weight model that it says shows strong improvements in long-horizon coding tasks, available under a modified MIT License (Kimi AI)
https://www.kimi.com/blog/kimi-k2-6
Google moved some staffers working on Project Mariner, its AI agent that can navigate Chrome and complete tasks on a user's behalf, to higher-priority projects (Maxwell Zeff/Wired)
https://www.wired.com/story/google-shakes-up-project-mariner-team-we…
Alibaba's T-Head unveils the Zhenwu M890 AI chip for training and inference, saying it is particularly suited for agentic tasks, and plans annual upgrades (Luz Ding/Bloomberg)
https://www.bloomberg.com/news/articles/2026-05-20/a…
Meta plans to reduce its reliance on third-party vendors for content moderation, in favor of AI tools that it says are better at spotting scams and other tasks (Kurt Wagner/Bloomberg)
https://www.bloomberg.com/news/articles/20
Cursor launches Composer 2, an AI agent trained solely on coding-related data to perform autonomous, lengthy coding tasks, to compete with Anthropic and OpenAI (Rachel Metz/Bloomberg)
https://www.bloomberg.com/news/articles/20
Khosla-backed robotics startup Genesis AI unveils GENE-26.5, its first model, which can control robotic hands that it designed in-house to do tasks like cooking (Anna Heim/TechCrunch)
https://techcrunch.com/2026/05/06/khosla-backed-…
Alibaba launches Wukong, an enterprise AI platform that coordinates multiple AI agents to handle complex tasks like document editing, currently in beta (Reuters)
https://www.reuters.com/world/asia-pacific/alibaba-launches-new-ai-agent…
Z.ai launches GLM-5-Turbo, a closed-source, faster, and cheaper variant of GLM-5 optimized for agent-driven workflows and OpenClaw-style tasks (Carl Franzen/VentureBeat)
https://venturebeat.com/technology/z-ai-debuts-faster-cheaper…
Alibaba unveils Qwen3.6-35B-A3B, an open-weight MoE model with 35B total and 3B active parameters, saying it rivals larger dense models in agentic coding tasks (Qwen)
https://qwen.ai/blog?id=qwen3.6-35b-a3b
OpenAI updates Agents SDK with native sandboxing and an in-distribution harness for deploying and testing agents on long-horizon tasks (Lucas Ropek/TechCrunch)
https://techcrunch.com/2026/04/15/openai-updates-its-agents-sdk…
Anthropic redesigns Claude Code on desktop, adding a sidebar for managing multiple sessions, a drag-and-drop layout, an integrated terminal, and a file editor (Claude)
https://claude.com/blog/claude-code-desktop-redesign
Anthropic launches a repeatable routines feature for Claude Code as a research preview, allowing developers to schedule and automate software development tasks (Zac Hall/9to5Mac)
https://9to5mac.com/2026/04/14/anthropic-adds-repeat…
Rivian CEO RJ Scaringe's Mind Robotics, which is building AI-powered robots for manufacturing tasks, raised $400M, source says at a $3.4B valuation (Sean McLain/Wall Street Journal)
https://www.wsj.com/business/autos/rivian-
Mythos Preview is the first AI model to complete both of AISI's cyber ranges, which measure models' cyberattack capabilities; GPT-5.5 solved only one of them (AI Security Institute)
https://www.aisi.gov.uk/blog/how-fast-is-autonomous-ai-cyber-capabil…
Google DeepMind details a Gemini-powered mouse pointer that understands what it is pointing at, allowing users to perform tasks without using text-heavy prompts (Google DeepMind)
https://deepmind.google/blog/ai-pointer/
STMicro plans to retrain workers and deploy humanoid robots in its older chip plants for repetitive and physically demanding tasks, aiming to avoid closures (Nathan Vifflin/Reuters)
https://www.reuters.com/business/stmicroelectronics-pla…
Sources: some Amazon employees are using in-house OpenClaw-like tool MeshClaw for unnecessary tasks to inflate AI token use after Amazon set weekly AI targets (Financial Times)
https://www.ft.com/content/8ee0d3ef-9548-422d-8ff1-ebd48ad4b2ca
Intel unveiled its Heracles chip at ISSCC in February, saying it accelerates fully homomorphic encryption tasks up to 5,000x faster than top Intel server CPUs (Samuel K. Moore/IEEE Spectrum)
https://spectrum.ieee.org/fhe-intel
Emil Michael says Google will deploy Gemini AI agents to Pentagon's 3M-strong workforce, initially on unclassified networks for tasks such as creating budgets (Katrina Manson/Bloomberg)
https://www.bloomberg.com/news/articles/20
Microsoft launches Copilot Cowork, integrating Anthropic's Claude Cowork tech into Microsoft 365 and using Work IQ to ground actions in organizational data (Charles Lamanna/Microsoft 365 Blog)
https://www.microsoft.com/en-us/microsoft-
Sources: Apple plans to let users choose from multiple third-party AI models to perform tasks like generating and editing text and images in iOS 27 (Mark Gurman/Bloomberg)
https://www.bloomberg.com/news/articles/20
Google has quietly shut down Project Mariner, its Chrome-browsing AI agent for completing tasks on users' behalf, after highlighting it onstage at I/O 2025 (Max Zeff/@zeffmax)
https://x.com/zeffmax/status/2051824497242833234
How Hollywood support staff are integrating AI into workflows, from mundane tasks to creative development, amid cost-cutting and workload demands (Mia Galuppo/The Hollywood Reporter)
https://www.hollywoodreporter.com/movies/movie-featur…
Enzo Health, whose AI tools help home health and hospice agencies automate tasks like patient intake and documentation review, raised a $20M Series A led by N47 (Brock E.W. Turner/Axios)
https://www.axios.com/pro/health-tech-deals/2026/05/04/e…
Generalist, which raised $140M at a $440M valuation in 2025, releases GEN-1, an AI model to help robots handle high-dexterity tasks typically done by humans (Anna Tong/Forbes)
https://www.forbes.com/sites/annatong/2026