EarlySciRev: A Dataset of Early-Stage Scientific Revisions Extracted from LaTeX Writing Traces
L\'eane Jourdan, Julien Aubert-B\'educhaud, Yannis Chupin, Marah Baccari, Florian Boudin
https://arxiv.org/abs/2603.28515 https://arxiv.org/pdf/2603.28515 https://arxiv.org/html/2603.28515
arXiv:2603.28515v1 Announce Type: new
Abstract: Scientific writing is an iterative process that generates rich revision traces, yet publicly available resources typically expose only final or near-final versions of papers. This limits empirical study of revision behaviour and evaluation of large language models (LLMs) for scientific writing. We introduce EarlySciRev, a dataset of early-stage scientific text revisions automatically extracted from arXiv LaTeX source files. Our key observation is that commented-out text in LaTeX often preserves discarded or alternative formulations written by the authors themselves. By aligning commented segments with nearby final text, we extract paragraph-level candidate revision pairs and apply LLM-based filtering to retain genuine revisions. Starting from 1.28M candidate pairs, our pipeline yields 578k validated revision pairs, grounded in authentic early drafting traces. We additionally provide a human-annotated benchmark for revision detection. EarlySciRev complements existing resources focused on late-stage revisions or synthetic rewrites and supports research on scientific writing dynamics, revision modelling, and LLM-assisted editing.
toXiv_bot_toot
Israël disqualifying itself for the whole world.
https://www.nytimes.com/2026/03/30/world/middleeast/israel-death-penalty-palestinians-attacks.html?unlocked_article_…
Ein Bericht des britischen Geheimdienstes warnt vor eskalierenden #Umweltkrisen und zunehmender #Ernährungsunsicherheit, die #Migration und Konflikte verstärken können.
…
Russia’s external debt reaches $319.8 billion as wartime spending fuels borrowing binge: https://benborges.xyz/2026/02/14/russias-external-debt-reaches-billion.html
I know someone is going to tell me I’m just “doing it wrong”, or bragging but, I’ve written some *extremely basic* code this week with an LLM (I know I know, but it was mandated that I *try*).
I am absolutely certain I could have written this faster myself.
I'm actually believing this, the main reason "AI" hype has become this big is tech people being impressed by it "writing code".
Then they were wrongly extrapolating capabilities to other fields (because "programming is super hard, harder than any other vocation, therefore 'AI' can do anything!!!1!"). To them, it clearly appears be god because they see themselves as gods—because they can write quicksort and linked lists or something.
Meanwhile, LLMs are only passable at generating code because it is laughably easy, mainly because programming languages and "best practices" are extremely verbose, repetitive and clunky; requiring endless boilerplate and infinite layer cakes to achieve even the most trivial things.
Because other people that don't care about that shit are so dependent on technology, it gets pushed to everyone without consent.
Kind of like a hubris ouroboros.
I've written another essay about my mad #PostScarcitySoftware #Lisp system.
"We don't need to know, or have known, these people to build on their work. We don't have to, and cannot in detail, fully understand their work. There is simply too much of it, its complexity wou…
i’m reviewing a master’s thesis that is extremely well-written but doesn’t really make any sense #SlopAlert
from my link log —
Writing a PostgreSQL formatter / pretty printer in Rust.
https://blog.urth.org/2021/03/14/writing-a-postgres-sql-pretty-printer-in-rust-part-1/
saved 2021-03-18
Die #Klimaattributionsforschung ordnet konkrete #Extremwetter-Ereignisse dem menschengemachten #Klimawandel zu.
Damit können Betroffene Klimaschäden juristisch verfolge…