
2025-08-02 16:14:00
»Belgisches Gericht ordnet Sperre der Open Library des Internet Archive an:
Ein Brüsseler Gericht hat eine sehr breite Anordnung für Websperren erlassen. Sie richtet sich gegen die Open Library sowie Schattenbibliotheken wie Z-Library«
Archiv ist, egal in welcher techn. Form, wichtig und hat nichts mit Datenklau zu tun. Dies wird leider aber vom Kommerz öfters als solches angesehen.
🤨
Einige der zuletzt hier besonders häufig geteilten #News:
Belgisches Gericht ordnet Sperre der Open Library des Internet Archive an
"#OpenAccess and #Citation #Impact: Modality, Funding, Publisher, and Disciplinary Trends at the University of Kentucky"
MRpro - open PyTorch-based MR reconstruction and processing package
Felix Frederik Zimmermann, Patrick Schuenke, Christoph S. Aigner, Bill A. Bernhardt, Mara Guastini, Johannes Hammacher, Noah Jaitner, Andreas Kofler, Leonid Lunin, Stefan Martin, Catarina Redshaw Kranich, Jakob Schattenfroh, David Schote, Yanglei Wu, Christoph Kolbitsch
https://
LLM coding is the opposite of DRY
An important principle in software engineering is DRY: Don't Repeat Yourself. We recognize that having the same code copied in more than one place is bad for several reasons:
1. It makes the entire codebase harder to read.
2. It increases maintenance burden, since any problems in the duplicated code need to be solved in more than one place.
3. Because it becomes possible for the copies to drift apart if changes to one aren't transferred to the other (maybe the person making the change has forgotten there was a copy) it makes the code more error-prone and harder to debug.
All modern programming languages make it almost entirely unnecessary to repeat code: we can move the repeated code into a "function" or "module" and then reference it from all the different places it's needed. At a larger scale, someone might write an open-source "library" of such functions or modules and instead of re-implementing that functionality ourselves, we can use their code, with an acknowledgement. Using another person's library this way is complicated, because now you're dependent on them: if they stop maintaining it or introduce bugs, you've inherited a problem, but still, you could always copy their project and maintain your own version, and it would be not much more work than if you had implemented stuff yourself from the start. It's a little more complicated than this, but the basic principle holds, and it's a foundational one for software development in general and the open-source movement in particular. The network of "citations" as open-source software builds on other open-source software and people contribute patches to each others' projects is a lot of what makes the movement into a community, and it can lead to collaborations that drive further development. So the DRY principle is important at both small and large scales.
Unfortunately, the current crop of hyped-up LLM coding systems from the big players are antithetical to DRY at all scales:
- At the library scale, they train on open source software but then (with some unknown frequency) replicate parts of it line-for-line *without* any citation [1]. The person who was using the LLM has no way of knowing that this happened, or even any way to check for it. In theory the LLM company could build a system for this, but it's not likely to be profitable unless the courts actually start punishing these license violations, which doesn't seem likely based on results so far and the difficulty of finding out that the violations are happening. By creating these copies (and also mash-ups, along with lots of less-problematic stuff), the LLM users (enabled and encouraged by the LLM-peddlers) are directly undermining the DRY principle. If we see what the big AI companies claim to want, which is a massive shift towards machine-authored code, DRY at the library scale will effectively be dead, with each new project simply re-implementing the functionality it needs instead of every using a library. This might seem to have some upside, since dependency hell is a thing, but the downside in terms of comprehensibility and therefore maintainability, correctness, and security will be massive. The eventual lack of new high-quality DRY-respecting code to train the models on will only make this problem worse.
- At the module & function level, AI is probably prone to re-writing rather than re-using the functions or needs, especially with a workflow where a human prompts it for many independent completions. This part I don't have direct evidence for, since I don't use LLM coding models myself except in very specific circumstances because it's not generally ethical to do so. I do know that when it tries to call existing functions, it often guesses incorrectly about the parameters they need, which I'm sure is a headache and source of bugs for the vibe coders out there. An AI could be designed to take more context into account and use existing lookup tools to get accurate function signatures and use them when generating function calls, but even though that would probably significantly improve output quality, I suspect it's the kind of thing that would be seen as too-baroque and thus not a priority. Would love to hear I'm wrong about any of this, but I suspect the consequences are that any medium-or-larger sized codebase written with LLM tools will have significant bloat from duplicate functionality, and will have places where better use of existing libraries would have made the code simpler. At a fundamental level, a principle like DRY is not something that current LLM training techniques are able to learn, and while they can imitate it from their training sets to some degree when asked for large amounts of code, when prompted for many smaller chunks, they're asymptotically likely to violate it.
I think this is an important critique in part because it cuts against the argument that "LLMs are the modern compliers, if you reject them you're just like the people who wanted to keep hand-writing assembly code, and you'll be just as obsolete." Compilers actually represented a great win for abstraction, encapsulation, and DRY in general, and they supported and are integral to open source development, whereas LLMs are set to do the opposite.
[1] to see what this looks like in action in prose, see the example on page 30 of the NYTimes copyright complaint against OpenAI (#AI #GenAI #LLMs #VibeCoding
So, @… is working on using LLMs to process XML Except for, the models can’t write legal XML. So he’s using the model to generate a sloppy-XML parser: https://lucumr.pocoo.org/202…
Met @… at @… event. #CoSocialCa members in the wild.
Open Letter to CRL from the academic wing of #CripLib - ACRLog
https://acrlog.or…
DPLib: A Standard Benchmark Library for Distributed Power System Analysis and Optimization
Milad Hasanzadeh, Amin Kargarian
https://arxiv.org/abs/2506.20819
FAU University Press: Now in the top catalogs for open access publications https://ub.fau.de/en/2025/06/17/fau-university-press-now-in-the-top-catalogs-for-open-access-publications/
Harnessing LLMs for Document-Guided Fuzzing of OpenCV Library
Bin Duan, Tarek Mahmud, Meiru Che, Yan Yan, Naipeng Dong, Dan Dongseong Kim, Guowei Yang
https://arxiv.org/abs/2507.14558
#DH2025 Listening to Victoria and Thea on 'Building a FAIR data future at the Journal of Open Humanities' - I'm hoping you'll see a lot more British Library data papers over time, as along with datasheets for datasets it's a big part of making our open collections findable and usable
Subtooting since people in the original thread wanted it to be over, but selfishly tagging @… and @… whose opinions I value...
I think that saying "we are not a supply chain" is exactly what open-source maintainers should be doing right now in response to "open source supply chain security" threads.
I can't claim to be an expert and don't maintain any important FOSS stuff, but I do release almost all of my code under open licenses, and I do use many open source libraries, and I have felt the pain of needing to replace an unmaintained library.
There's a certain small-to-mid-scale class of program, including many open-source libraries, which can be built/maintained by a single person, and which to my mind best operate on a "snake growth" model: incremental changes/fixes, punctuated by periodic "skin-shedding" phases where make rewrites or version updates happen. These projects aren't immortal either: as the whole tech landscape around them changes, they become unnecessary and/or people lose interest, so they go unmaintained and eventually break. Each time one of their dependencies breaks (or has a skin-shedding moment) there's a higher probability that they break or shed too, as maintenance needs shoot up at these junctures. Unless you're a company trying to make money from a single long-lived app, it's actually okay that software churns like this, and if you're a company trying to make money, your priorities absolutely should not factor into any decisions people making FOSS software make: we're trying (and to a huge extent succeeding) to make a better world (and/or just have fun with our own hobbies share that fun with others) that leaves behind the corrosive & planet-destroying plague which is capitalism, and you're trying to personally enrich yourself by embracing that plague. The fact that capitalism is *evil* is not an incidental thing in this discussion.
To make an imperfect analogy, imagine that the peasants of some domain have set up a really-free-market, where they provide each other with free stuff to help each other survive, sometimes doing some barter perhaps but mostly just everyone bringing their surplus. Now imagine the lord of the domain, who is the source of these peasants' immiseration, goes to this market secretly & takes some berries, which he uses as one ingredient in delicious tarts that he then sells for profit. But then the berry-bringer stops showing up to the free market, or starts bringing a different kind of fruit, or even ends up bringing rotten berries by accident. And the lord complains "I have a supply chain problem!" Like, fuck off dude! Your problem is that you *didn't* want to build a supply chain and instead thought you would build your profit-focused business in other people's free stuff. If you were paying the berry-picker, you'd have a supply chain problem, but you weren't, so you really have an "I want more free stuff" problem when you can't be arsed to give away your own stuff for free.
There can be all sorts of problems in the really-free-market, like maybe not enough people bring socks, so the peasants who can't afford socks are going barefoot, and having foot problems, and the peasants put their heads together and see if they can convince someone to start bringing socks, and maybe they can't and things are a bit sad, but the really-free-market was never supposed to solve everyone's problems 100% when they're all still being squeezed dry by their taxes: until they are able to get free of the lord & start building a lovely anarchist society, the really-free-market is a best-effort kind of deal that aims to make things better, and sometimes will fall short. When it becomes the main way goods in society are distributed, and when the people who contribute aren't constantly drained by the feudal yoke, at that point the availability of particular goods is a real problem that needs to be solved, but at that point, it's also much easier to solve. And at *no* point does someone coming into the market to take stuff only to turn around and sell it deserve anything from the market or those contributing to it. They are not a supply chain. They're trying to help each other out, but even then they're doing so freely and without obligation. They might discuss amongst themselves how to better coordinate their mutual aid, but they're not going to end up forcing anyone to bring anything or even expecting that a certain person contribute a certain amount, since the whole point is that the thing is voluntary & free, and they've all got changing life circumstances that affect their contributions. Celebrate whatever shows up at the market, express your desire for things that would be useful, but don't impose a burden on anyone else to bring a specific thing, because otherwise it's fair for them to oppose such a burden on you, and now you two are doing your own barter thing that's outside the parameters of the really-free-market.
rd-spiral: An open-source Python library for learning 2D reaction-diffusion dynamics through pseudo-spectral method
Sandy H. S. Herho, Iwan P. Anwar, Rusmawan Suwarman
https://arxiv.org/abs/2506.20633
"Open Access and Citation Impact: Modality, Funding, Publisher, and Disciplinary Trends at the University of Kentucky" #OpenAccess
Spatialize v1.0: A Python/C Library for Ensemble Spatial Interpolation
Alvaro F. Ega\~na, Alejandro Ehrenfeld, Felipe Garrido, Mar\'ia Jes\'us Valenzuela, Juan F. S\'anchez-P\'erez
https://arxiv.org/abs/2507.17867
Cactus Flowers. Huntington Library, San Marino, California, USA. June, 2025. #huntingtonlibrary #cactüs #cactusflower
TokenSmith: Streamlining Data Editing, Search, and Inspection for Large-Scale Language Model Training and Interpretability
Mohammad Aflah Khan, Ameya Godbole, Johnny Tian-Zheng Wei, Ryan Wang, James Flemings, Krishna Gummadi, Willie Neiswanger, Robin Jia
https://arxiv.org/abs/2507.19419
DefElement: an encyclopedia of finite element definitions
Matthew W. Scroggs, Pablo D. Brubeck, Joseph P. Dean, J{\o}rgen S. Dokken, India Marsden
https://arxiv.org/abs/2506.20188
I learned¹ about the Baldwin Library of Historical Children's Literature² that has more than 10000 books scanned and available online. Just great.
It is hosted by the University of Florida. So let's hope that it stays available, i.e. that the Republicans don't find the old children's books from 1750 to woke.³
__
¹via
"How to Become an Integrity Sleuth in the Library"
https://katinamagazine.org/content/article/future-of-work/2025/how-to-become-an-integrity-sleuth-in-the-library
"Open access agreement management c…
A new Dune grid for scalable dynamic adaptivity based on the p4est software library
Carsten Burstedde, Mikhail Kirilin, Robert Kl\"ofkorn
https://arxiv.org/abs/2507.11386
FIDESlib: A Fully-Fledged Open-Source FHE Library for Efficient CKKS on GPUs
Carlos Agull\'o-Domingo (Universidad de Murcia), \'Oscar Vera-L\'opez (Universidad de Murcia), Seyda Guzelhan (Boston University), Lohit Daksha (Boston University), Aymane El Jerari (Northeastern University), Kaustubh Shivdikar (Advanced Micro Devices), Rashmi Agrawal (Boston University), David Kaeli (Northeastern University), Ajay Joshi (Boston University), Jos\'e L. Abell\'an (Universidad…
Cloudflare open sourced an OAuth library mostly written by Claude, showing how AI handles mechanical implementation while humans guide with context and judgment (Max Mitchell)
https://www.maxemitchell.com/writings/i-read-all-of-cloudflares…
f4ncgb: High Performance Gr\"obner Basis Computations in Free Algebras
Maximilian Heisinger, Clemens Hofstadler
https://arxiv.org/abs/2505.19304 https…
Is anyone looking for good first-timer OSS contributor issues? Crell/Serde has a few tagged "good first issue" if you're interested.
https://github.com/Crell/Serde/issues?q=is:issue state:open label:"good fir…
OpenSN: An Open Source Library for Emulating LEO Satellite Networks
Wenhao Lu, Zhiyuan Wang, Hefan Zhang, Shan Zhang, Hongbin Luo
https://arxiv.org/abs/2507.03248
ProCaliper: functional and structural analysis, visualization, and annotation of proteins
Jordan C. Rozum, Hunter Ufford, Alexandria K. Im, Tong Zhang, David D. Pollock, Doo Nam Kim, Song Feng
https://arxiv.org/abs/2506.19961
I’m trying to help a client pick a good UI framework they can start their product with, but ultimately grow into their own design system and component library. They have started development with React, which isn’t surprising, but they are also open to using a more framework-agnostic approach in the future.
Any suggestions for a really mature and solid, themeable framework as a starting point? Chakra UI? Ark UI? Radix?
ATM I don't see any end in site for me sipping for tailwind. It solves all my problems and doesn't cause any.
Always open to being sold something new but I wanted tailwind since 2017 when I wanted to just use inline css instead of what ever css library I was using.
Help wanted: Can we get someone to go through the build/link time dependencies of ngscopeclient, identify every third-party open source library we use, and ensure that they're all credited properly in the documentation, and include/link to the text of the appropriate licenses?
https://github.com/ng…
Very excited about this! Code to access GRIN will help lots of Google Books partners, and the example might open other doors, as well as the obvious benefits of access to data!
'Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability' https://arxiv.org/abs/2506…
PGLib-CO2: A Power Grid Library for Computing and Optimizing Carbon Emissions
Young-ho Cho, Min-Seung Ko, Hao Zhu
https://arxiv.org/abs/2506.14662 https://…
SAVANT: Vulnerability Detection in Application Dependencies through Semantic-Guided Reachability Analysis
Wang Lingxiang, Quanzhi Fu, Wenjia Song, Gelei Deng, Yi Liu, Dan Williams, Ying Zhang
https://arxiv.org/abs/2506.17798
Video-Guided Text-to-Music Generation Using Public Domain Movie Collections
Haven Kim, Zachary Novack, Weihan Xu, Julian McAuley, Hao-Wen Dong
https://arxiv.org/abs/2506.12573
MultiObjectiveAlgorithms.jl: a Julia package for solving multi-objective optimization problems
Oscar Dowson, Xavier Gandibleux, G\"okhan Kof
https://arxiv.org/abs/2507.05501 …
How Robust are LLM-Generated Library Imports? An Empirical Study using Stack Overflow
Jasmine Latendresse, SayedHassan Khatoonabadi, Emad Shihab
https://arxiv.org/abs/2507.10818
This https://arxiv.org/abs/1910.14012 has been replaced.
link: https://scholar.google.com/scholar?q=a
Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms
Zhiyi Hu, Siyuan Shen, Tommaso Bonato, Sylvain Jeaugey, Cedell Alexander, Eric Spada, Jeff Hammond, Torsten Hoefler
https://arxiv.org/abs/2507.04786
Understanding API Usage and Testing: An Empirical Study of C Libraries
Ahmed Zaki, Cristian Cadar
https://arxiv.org/abs/2506.11598 https://
At @oapenbooks.bsky.social, we have updated our #Metadata feeds, to better integrate our #OpenAccess #books into #libraries