Tootfinder

No exact results. Similar results found.

@mapto@qoto.org
2025-12-11 07:47:09

Today at #CHR2025, I will be presenting our work on the evaluation of the historical adequacy of masked language models (MLMs) for #Latin. There are several models like this, and they represent the current state of the art for a number of downstream tasks, like semantic change and text reuse detection. However, a h…

A poster for the paper that could be found at https://doi.org/10.63744/sLAHYnQdA8fu

@Techmeme@techhub.social
2026-01-10 02:05:50

Documents: OpenAI is asking contractors to upload their work from current or previous jobs to evaluate its models, leaving it to them to scrub confidential info (Wired)
https://www.wired.com/story/openai-contractor-upload-real-work-documents-ai-agents/

OpenAI Is Asking Contractors to Upload Work From Past Jobs to Evaluate the Performance of AI Agents
To prepare AI agents for office work, the company is asking contractors to upload projects from past jobs, leaving it to them to strip out confidential and personally identifiable information.

@arXiv_csGT_bot@mastoxiv.page
2025-12-10 07:58:51

Beyond Revenue and Welfare: Counterfactual Analysis of Spectrum Auctions with Application to Canada's 3800MHz Allocation
Sara Jalili Shani, Kris Joseph, Michael B. McNally, James R. Wright
https://arxiv.org/abs/2512.08106 https://arxiv.org/pdf/2512.08106 https://arxiv.org/html/2512.08106
arXiv:2512.08106v1 Announce Type: new
Abstract: Spectrum auctions are the primary mechanism through which governments allocate scarce radio frequencies, with outcomes that shape competition, coverage, and innovation in telecommunications markets. While traditional models of spectrum auctions often rely on strong equilibrium assumptions, we take a more parsimonious approach by modeling bidders as myopic and straightforward: in each round, firms simply demand the bundle that maximizes their utility given current prices. Despite its simplicity, this model proves effective in predicting the outcomes of Canada's 2023 auction of 3800 MHz spectrum licenses. Using detailed round-by-round bidding data, we estimate bidders' valuations through a linear programming framework and validate that our model reproduces key features of the observed allocation and price evolution. We then use these estimated valuations to simulate a counterfactual auction under an alternative mechanism that incentivizes deployment in rural and remote regions, aligning with one of the key objectives set out in the Canadian Telecommunications Act. The results show that the proposed mechanism substantially improves population coverage in underserved areas. These findings demonstrate that a behavioral model with minimal assumptions is sufficient to generate reliable counterfactual predictions, making it a practical tool for policymakers to evaluate how alternative auction designs may influence future outcomes. In particular, our study demonstrates a method for counterfactual mechanism design, providing a framework to evaluate how alternative auction rules could advance policy goals such as equitable deployment across Canada.
toXiv_bot_toot

@Techmeme@techhub.social
2026-01-06 15:30:38

LMArena, which runs a leaderboard ranking AI models based on their performance, raised $150M at a $1.7B post-money valuation, taking its total funding to $250M (The Information)
https://www.theinformation.com/articles/ai-evaluation-…

AI Evaluation Startup LMArena Valued at $1.7 Billion in New Funding Round
LMArena, astartup that operates a widely cited ranking of AI models based on their performance, has raised $150 million at a valuation of $1.7 billion, including the new money, according to the company. That’s nearly triple the valuation of its seed funding round, announced in May 2025. The ...

@UP8@mastodon.social
2025-11-06 23:51:59

🎲 TextBandit: Evaluating Probabilistic Reasoning in LLMs Through Language-Only Decision Tasks
#llm

Figure 1: Comparison of cumulative regret trends for four LLMs:
(a) Llama-3.1-8B regret trends: Exhibits high cumulative regret, suggesting poor adaptation to feedback over time. (b) Phi-2 regret trends: Maintains consistently high regret levels, indicating limited learning from outcomes (c) Qwen3-4B regret trends: Displays rapid reduction in regret, reflecting strong and consistent decision making (d) Qwen3-8B regret trends : Consistently high regret across prompts, indicating overthinking an…

TextBandit: Evaluating Probabilistic Reasoning in LLMs Through Language-Only Decision Tasks
Large language models (LLMs) have shown to be increasingly capable of performing reasoning tasks, but their ability to make sequential decisions under uncertainty only using natural language remains underexplored. We introduce a novel benchmark in which LLMs interact with multi-armed bandit environments using purely textual feedback, "you earned a token", without access to numerical cues or explicit probabilities, resulting in the model to infer latent reward structures purely off linguistic cu…

@burger_jaap@mastodon.social
2025-12-04 09:53:06

Can you guess which country's EV charging strategy prioritises market-based price signals to guide charging and discharging?
#V2G

(6) Continuously expand the scope of vehicle-grid interaction pilot schemes. Solidly advance the construction of the first batch of vehicle-grid interaction pilot projects, establishing mechanisms for coordinated promotion and tracking evaluation. Fully leverage the guiding role of time-of-use pricing signals in the electricity market, exploring market-based vehicle-grid interaction response models. Organise grid companies, virtual power plants (load aggregators), and other entities to conduct …

@cdarwin@c.im
2025-11-01 01:39:37

The evolution of influence operations
from crude Russian troll farms to sophisticated AI systems using large language models;
the discovery of GoLaxy documents revealing a "Smart Propaganda System" that collects millions of data points daily, builds psychological profiles, and generates resilient personas;
the fundamental challenges of measuring effectiveness;
GoLaxy's ties to Chinese intelligence agencies;
operations targeting Hong Kong's…

Scaling Laws: The GoLaxy Revelations: China's AI-Driven Influence Operations, with Brett Goldstein, Brett Benson, and Renée DiResta
Discussing the evolution of influence operations.

@mia@hcommons.social
2025-12-04 10:25:49

A brilliant start to #FF2025 - Rachel's keynote, Megan and Amy's work on AI with the Bodleian and now Lucie Termignon, Simon Zilinskas on https://comparia.beta.gouv.fr/ - evaluate AI models by com…

@arXiv_csGT_bot@mastoxiv.page
2025-12-09 15:38:28

Replaced article(s) found for cs.GT. https://arxiv.org/list/cs.GT/new
[1/1]:
- Cumulative Games: Who is the current player?
Urban Larsson, Reshef Meir, Yair Zick
https://arxiv.org/abs/2005.06326
- Contest Design with Threshold Objectives
Edith Elkind, Abheek Ghosh, Paul W. Goldberg
https://arxiv.org/abs/2109.03179
- Deep Learning Meets Mechanism Design: Key Results and Some Novel Applications
V. Udaya Sankar, Vishisht Srihari Rao, Y. Narahari
https://arxiv.org/abs/2401.05683 https://mastoxiv.page/@arXiv_csGT_bot/111741115483021453
- Charting the Shapes of Stories with Game Theory
Daskalakis, Gemp, Jiang, Leme, Papadimitriou, Piliouras
https://arxiv.org/abs/2412.05747 https://mastoxiv.page/@arXiv_csGT_bot/113627246220336424
- Computing Evolutionarily Stable Strategies in Multiplayer Games
Sam Ganzfried
https://arxiv.org/abs/2511.20859 https://mastoxiv.page/@arXiv_csGT_bot/115620508246637361
- Autodeleveraging: Impossibilities and Optimization
Tarun Chitra
https://arxiv.org/abs/2512.01112 https://mastoxiv.page/@arXiv_csGT_bot/115649040881525135
- Static Pricing Guarantees for Queueing Systems
Jacob Bergquist, Adam N. Elmachtoub
https://arxiv.org/abs/2305.09168 https://mastoxiv.page/@arXiv_csDS_bot/110382625621173269
- Game of arrivals at a two queue network with heterogeneous customer routes
Agniv Bandyopadhyay, Sandeep Juneja
https://arxiv.org/abs/2310.18149 https://mastoxiv.page/@arXiv_csPF_bot/111322112226936579
- Characterization of Priority-Neutral Matching Lattices
Clayton Thomas
https://arxiv.org/abs/2404.02142 https://mastoxiv.page/@arXiv_econTH_bot/112205968984928881
- Seven kinds of equivalent models for generalized coalition logics
Zixuan Chen, Fengkui Ju
https://arxiv.org/abs/2501.05466 https://mastoxiv.page/@arXiv_csLO_bot/113819715349259373
- Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences
Hadi Hosseini, Samarth Khanna, Ronak Singh
https://arxiv.org/abs/2506.04478 https://mastoxiv.page/@arXiv_csAI_bot/114635186215388479
toXiv_bot_toot

@cdarwin@c.im
2025-12-19 22:11:09

This white paper provides a comprehensive analysis of modern warfare through five interconnected characteristics that have been prominently displayed throughout the Ukraine conflict:
- The rise of autonomous systems and their impact on force architecture
- The information domain as a critical battleground
- Electronic warfare and spectrum superiority
- The challenges of sustaining logistics in contested environments
- The evolution of air defense strategy
…

Lessons from the Ukraine Conflict: Modern Warfare in the Age of Autonomy, Information, and Resilience
As warfare evolves, the Ukraine conflict offers policymakers a blueprint for modern defense strategy—emphasizing autonomy, contested logistics, electromagnetic superiority, and adaptive organizational models.

Tootfinder

Opt-in global Mastodon full text search. Join the index!