Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@arXiv_csCL_bot@mastoxiv.page
2025-07-14 09:56:22

A Third Paradigm for LLM Evaluation: Dialogue Game-Based Evaluation using clembench
David Schlangen, Sherzod Hakimov, Jonathan Jordan, Philipp Sadler
arxiv.org/abs/2507.08491

@arXiv_csCL_bot@mastoxiv.page
2025-07-14 09:48:52

Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization
Itai Mondshine, Tzuf Paz-Argaman, Reut Tsarfaty
arxiv.org/abs/2507.08342

@doktrock@toad.social
2025-06-11 21:50:41

Evolutionists Flock To Darwin-Shaped Wall Stain
theonion.com/evolutionists-flo

@arXiv_csCV_bot@mastoxiv.page
2025-07-14 10:04:52

L-CLIPScore: a Lightweight Embedding-based Captioning Metric for Evaluating and Training
Li Li, Yingzhe Peng, Xu Yang, Ruoxi Cheng, Haiyang Xu, Ming Yan, Fei Huang
arxiv.org/abs/2507.08710

@lysander07@sigmoid.social
2025-05-13 16:25:32

Last week, our students learned how to conduct a proper evaluation for an NLP experiment. To this end, we introduced a small textcorpus with sentences about Joseph Fourier, who counts as one of the discoverers of the greenhouse effect, responsible for global warming.

Slide of the Information Service ENgineering lecture 03, Natural Language Processing 02, section 2.6: Evaluation, Precision, and Recall
Headline: Experiment
Let's consider the following text corpus (FOURIERCORPUS):
 1
In 1807, Fourier's work on heat transfer laid the foundation for understanding the greenhouse effect.
2
Joseph Fourier's energy balance analysis showed atmosphere's heat-trapping role.
3
Fourrier's calculations, though rudimentary, suggested that the atmosphere acts as an insulato…
@Dragofix@mastodontti.fi
2025-07-12 22:27:10

Liito-orava on taigametsien tulevaisuuden avainlaji. Tuore geneettinen tutkimus paljastaa yllättäviä piirteitä liito-oravan evoluutiosta sekä vakavia huolia lajin suojelun kannalta. Kaukoidässä saattaa asustaa oma alalaji. helsinki.fi/fi/uutiset/evoluut

@arXiv_csIR_bot@mastoxiv.page
2025-07-14 07:40:51

DS@GT at LongEval: Evaluating Temporal Performance in Web Search Systems and Topics with Two-Stage Retrieval
Anthony Miyaguchi, Imran Afrulbasha, Aleksandar Pramov
arxiv.org/abs/2507.08360

@berlinbuzzwords@floss.social
2025-05-14 14:00:33

LLMs are now part of our daily work, making coding easier. Join Ivan Dolgov at this year's Berlin Buzzwords to learn how they built an in-house LLM for AI code completion in JetBrains products, covering design choices, data preparation, training and model evaluation.
Learn more:

Session title: How to train a fast LLM for coding tasks
Ivan Dolgov
Join us from June 15-17 in Berlin or participate online / berlinbuzzwords.de
@Techmeme@techhub.social
2025-06-15 10:05:34

Anthropic details how it built its multi-agent Claude Research system, claiming significant improvements in internal evaluations over single-agent systems (Anthropic)
anthropic.com/engineering/buil

@arXiv_csCL_bot@mastoxiv.page
2025-07-14 09:53:52

Diagnosing Failures in Large Language Models' Answers: Integrating Error Attribution into Evaluation Framework
Zishan Xu, Shuyi Xie, Qingsong Lv, Shupei Xiao, Linlin Song, Sui Wenjuan, Fan Lin
arxiv.org/abs/2507.08459