Tootfinder

@arXiv_csSE_bot@mastoxiv.page
2025-07-22 10:21:10

Harnessing LLMs for Document-Guided Fuzzing of OpenCV Library
Bin Duan, Tarek Mahmud, Meiru Che, Yan Yan, Naipeng Dong, Dan Dongseong Kim, Guowei Yang
https://arxiv.org/abs/2507.14558

Harnessing LLMs for Document-Guided Fuzzing of OpenCV Library
The combination of computer vision and artificial intelligence is fundamentally transforming a broad spectrum of industries by enabling machines to interpret and act upon visual data with high levels of accuracy. As the biggest and by far the most popular open-source computer vision library, OpenCV library provides an extensive suite of programming functions supporting real-time computer vision. Bugs in the OpenCV library can affect the downstream computer vision applications, and it is critica…

@timbray@cosocial.ca
2025-06-22 01:20:45

So, @… is working on using LLMs to process XML Except for, the models can’t write legal XML. So he’s using the model to generate a sloppy-XML parser: https://lucumr.pocoo.org/202…

My First Open Source AI Generated Library
In a first for me, I published some agentic programmed AI slop to PyPI.

@nemobis@mamot.fr
2025-08-22 15:15:40

I randomly bought this book in a quirky bookshop in Copenhagen for the sole reason that it said all the wrong things right on the cover.
(Sales: the single most important profession. NLP™: not natural language processing but neuro-linguistic programming. Meta: the Meta Model™ and Meta Publications™.)
I just started reading it and boy oh boy, I was not disappointed. It's outrageously hilarious.
"Persuasion engineering".

"For many years now, the single most important professionals in the world have been ignored by our educational institutions: Sales"

"While it may seem that some of the sentence structures in this book read as grammatically incorrect, they are written for a purpose"

«"Some of them really work hard. They can’t afford these cars. But every time one of them buys one, I smile because I know they are going to be the most motivated they can be just to keep up with the payments. I like my sales people to be a little hungry. There’s nothing better to keep them moving.” And so, he considers them to be self motivated. Anytime one of them starts to slack off a little, he asks them how the new car is.

What you do is you induce a wanton buying state and show them the …

Persuasion engineering by Richard Bandler | Open Library
Persuasion engineering by Richard Bandler, unknown edition,

@gscherer2@social.linux.pizza
2025-06-21 16:36:52

Cactus Flowers. Huntington Library, San Marino, California, USA. June, 2025. #huntingtonlibrary #cactüs #cactusflower

Close up of two cactus flowers. One is just starting to open, the other is open with white central petals surrounded by reddish pink petals. The background is dark green and out of focus.

@ronaldsnijder@mastodon.social
2025-06-18 07:12:40

FAU University Press: Now in the top catalogs for open access publications https://ub.fau.de/en/2025/06/17/fau-university-press-now-in-the-top-catalogs-for-open-access-publications/

@heiseonline@social.heise.de
2025-08-02 16:14:00

Belgisches Gericht ordnet Sperre der Open Library des Internet Archive an
Ein Brüsseler Gericht hat eine sehr breite Anordnung für Websperren erlassen. Sie richtet sich gegen die Open Library sowie Schattenbibliotheken wie Z-Library.

Belgisches Gericht ordnet Sperre der Open Library des Internet Archive an
Ein Brüsseler Gericht hat eine sehr breite Anordnung für Websperren erlassen. Sie richtet sich gegen die Open Library sowie Schattenbibliotheken wie Z-Library.

@v_i_o_l_a@openbiblio.social
2025-07-14 10:00:47

"How to Become an Integrity Sleuth in the Library"
https://katinamagazine.org/content/article/future-of-work/2025/how-to-become-an-integrity-sleuth-in-the-library
"Open access agreement management c…

How to Become an Integrity Sleuth in the Library
Open access agreement management creates an opportunity for library staff to help identify publishing integrity problems. Here are some techniques to try.

@mia@hcommons.social
2025-07-18 13:47:39

#DH2025 Listening to Victoria and Thea on 'Building a FAIR data future at the Journal of Open Humanities' - I'm hoping you'll see a lot more British Library data papers over time, as along with datasheets for datasets it's a big part of making our open collections findable and usable

@kubikpixel@chaos.social
2025-08-03 06:20:38

»Belgisches Gericht ordnet Sperre der Open Library des Internet Archive an:
Ein Brüsseler Gericht hat eine sehr breite Anordnung für Websperren erlassen. Sie richtet sich gegen die Open Library sowie Schattenbibliotheken wie Z-Library«
Archiv ist, egal in welcher techn. Form, wichtig und hat nichts mit Datenklau zu tun. Dies wird leider aber vom Kommerz öfters als solches angesehen.
🤨

Belgisches Gericht ordnet Sperre der Open Library des Internet Archive an
Ein Brüsseler Gericht hat eine sehr breite Anordnung für Websperren erlassen. Sie richtet sich gegen die Open Library sowie Schattenbibliotheken wie Z-Library.

@gap@glammr.us
2025-06-18 10:36:21

Open Letter to CRL from the academic wing of #CripLib - ACRLog
https://acrlog.org/2025/05/2…

Open Letter to CRL from the academic wing of #CripLib - ACRLog
Editor's note: We welcome a guest blog post from the academic librarians of #CripLib, which focuses on the intersection of disability and library work. Note: The focus is on our concerns about the quality of the article and the editorial process that led to it being published as research - NOT on the author as

@lapizistik@social.tchncs.de
2025-06-19 20:37:50

I learned¹ about the Baldwin Library of Historical Children's Literature² that has more than 10000 books scanned and available online. Just great.
It is hosted by the University of Florida. So let's hope that it stays available, i.e. that the Republicans don't find the old children's books from 1750 to woke.³
__
¹via

Open Culture (Official) (@openculture@toot.community)
Attached: 1 image Enter an Archive of 10,000+ Historical Children’s Books, All Digitized & Free to Read Online https://www.openculture.com/2025/06/enter-an-archive-of-10000-historical-childrens-books.html

@heiseonline@social.heise.de
2025-08-04 16:15:11

Etwas mehr der heute besonders häufig geteilten #News:
Belgisches Gericht ordnet Sperre der Open Library des Internet Archive an

Belgisches Gericht ordnet Sperre der Open Library des Internet Archive an
Ein Brüsseler Gericht hat eine sehr breite Anordnung für Websperren erlassen. Sie richtet sich gegen die Open Library sowie Schattenbibliotheken wie Z-Library.

@arXiv_eessSY_bot@mastoxiv.page
2025-06-18 09:03:03

PGLib-CO2: A Power Grid Library for Computing and Optimizing Carbon Emissions
Young-ho Cho, Min-Seung Ko, Hao Zhu
https://arxiv.org/abs/2506.14662 https://…

PGLib-CO2: A Power Grid Library for Computing and Optimizing Carbon Emissions
A sustainable electricity infrastructure requires the explicit integration of carbon emissions into power system modeling and optimization paradigms. However, existing open-source datasets for power system R&D lack generator-level carbon emission profiling, limiting the ability to benchmark and compare various carbon-aware grid operational strategies. To address this gap, this work introduces PGLib-CO2, an open-source extension to the widely adopted PGLib-OPF test case library. PGLib-CO2 enrich…

@arXiv_csRO_bot@mastoxiv.page
2025-08-11 09:48:49

EcBot: Data-Driven Energy Consumption Open-Source MATLAB Library for Manipulators
Juan Heredia, Christian Schlette, Mikkel Baun Kj{\ae}rgaard
https://arxiv.org/abs/2508.06276 ht…

EcBot: Data-Driven Energy Consumption Open-Source MATLAB Library for Manipulators
Existing literature proposes models for estimating the electrical power of manipulators, yet two primary limitations prevail. First, most models are predominantly tested using traditional industrial robots. Second, these models often lack accuracy. To address these issues, we introduce an open source Matlab-based library designed to automatically generate \ac{ec} models for manipulators. The necessary inputs for the library are Denavit-Hartenberg parameters, link masses, and centers of mass. Ad…

@arXiv_csDC_bot@mastoxiv.page
2025-07-16 09:09:51

A new Dune grid for scalable dynamic adaptivity based on the p4est software library
Carsten Burstedde, Mikhail Kirilin, Robert Kl\"ofkorn
https://arxiv.org/abs/2507.11386

A new Dune grid for scalable dynamic adaptivity based on the p4est software library
In this work we extend the Dune solver library with another grid interface to the open-source p4est software. While Dune already supports about a dozen different mesh implementations through its mesh interface Dune-Grid, we undertake this new coupling effort in order to inherit p4est's practically unlimited MPI scalability as well as its relatively thin data structures, and its native support for multi-block (forest) mesh topologies in both 2D and 3D. The presented implementation is compared …

@mlawton@mstdn.social
2025-08-12 14:06:14

Your library only has compelling stories on the inside.
#library #libraries

A metal book drop with an open slot is mounted on a textured wall. A sign below warns that "Bees may still be in the area," and advises users to "Use at your own risk."

@heiseonline@social.heise.de
2025-08-04 05:00:39

Einige der zuletzt hier besonders häufig geteilten #News:
Belgisches Gericht ordnet Sperre der Open Library des Internet Archive an

Belgisches Gericht ordnet Sperre der Open Library des Internet Archive an
Ein Brüsseler Gericht hat eine sehr breite Anordnung für Websperren erlassen. Sie richtet sich gegen die Open Library sowie Schattenbibliotheken wie Z-Library.

@yaxu@post.lurk.org
2025-08-07 21:43:27

If you get an invite to this generative art software engineering call, note that if you submit something and it gets accepted, as far as I can tell it would cost you $3000 in open access fees... unless you want it to languish behind a paywall (you'd then only be allowed to share an unedited draft, and even then would have to advertise the paywall on it). They don't seem to want to make this clear in their call.

Call For Papers: Special Issue on Software Engineering in Generative Art
With this special issue, we aim to open a conversation between software engineers, generative artists and caretakers of these artworks, about the latest advancements at the intersection of software technology and algorithmic art.

@arXiv_physicsfludyn_bot@mastoxiv.page
2025-08-20 09:11:10

OpenLB-UQ: An Uncertainty Quantification Framework for Incompressible Fluid Flow Simulations
Mingliang Zhong, Adrian Kummerl\"ander, Shota Ito, Mathias J. Krause, Martin Frank, Stephan Simonis
https://arxiv.org/abs/2508.13867

OpenLB-UQ: An Uncertainty Quantification Framework for Incompressible Fluid Flow Simulations
Uncertainty quantification (UQ) is crucial in computational fluid dynamics to assess the reliability and robustness of simulations, given the uncertainties in input parameters. OpenLB is an open-source lattice Boltzmann method library designed for efficient and extensible simulations of complex fluid dynamics on high-performance computers. In this work, we leverage the efficiency of OpenLB for large-scale flow sampling with a dedicated and integrated UQ module. To this end, we focus on non-intr…

@crell@phpc.social
2025-07-14 17:19:23

Is anyone looking for good first-timer OSS contributor issues? Crell/Serde has a few tagged "good first issue" if you're interested.
https://github.com/Crell/Serde/issues?q=is:issue state:open label:"good fir…

Crell/Serde
Robust Serde (serialization/deserialization) library for PHP 8. - Crell/Serde

@heiseonline@social.heise.de
2025-08-03 05:00:13

Einige der zuletzt hier besonders häufig geteilten #News:
Belgisches Gericht ordnet Sperre der Open Library des Internet Archive an

Belgisches Gericht ordnet Sperre der Open Library des Internet Archive an
Ein Brüsseler Gericht hat eine sehr breite Anordnung für Websperren erlassen. Sie richtet sich gegen die Open Library sowie Schattenbibliotheken wie Z-Library.

@jonippolito@digipres.club
2025-08-04 12:46:31

AI is flooding libraries with generated content just as budgets and staff are at their most precarious. This Thursday at 10am EDT my ASIS&T webinar asks if we need to ban it, label it, absorb it—or rethink the library itself.
https://www.asist.org/meetings-events/webi

A screenshot from the ASIS&T web site with this text:

Webinar: Who Even Wrote This? Welcome to the Post-AI LibrarySponsored by: NEASIS&T ChapterLibraries and other digital collections are beginning to see a surge in AI-generated submissions—sometimes dozens from the same author. This growing strain is colliding in the US with an increasing threat to staff and budgets due to the administration's aim of dismantling the Institute of Museum and Library Services. This webinar will open by framing g…

Webinar: Who Even Wrote This? Welcome to the Post-AI Library
Sponsored by: NEASIS&T Chapter Libraries and other digital collections are beginning to see a surge in AI-generated submissions-sometimes dozens from the same author. This growing strain is colliding in the US with an increasing threat to staff and budgets due to the administration’s aim of dismantling the Institute of Museum and Library Services. This webinar…

@arXiv_csCR_bot@mastoxiv.page
2025-07-08 13:04:51

FIDESlib: A Fully-Fledged Open-Source FHE Library for Efficient CKKS on GPUs
Carlos Agull\'o-Domingo (Universidad de Murcia), \'Oscar Vera-L\'opez (Universidad de Murcia), Seyda Guzelhan (Boston University), Lohit Daksha (Boston University), Aymane El Jerari (Northeastern University), Kaustubh Shivdikar (Advanced Micro Devices), Rashmi Agrawal (Boston University), David Kaeli (Northeastern University), Ajay Joshi (Boston University), Jos\'e L. Abell\'an (Universidad…

FIDESlib: A Fully-Fledged Open-Source FHE Library for Efficient CKKS on GPUs
Word-wise Fully Homomorphic Encryption (FHE) schemes, such as CKKS, are gaining significant traction due to their ability to provide post-quantum-resistant, privacy-preserving approximate computing; an especially desirable feature in Machine-Learning-as-a-Service (MLaaS) cloud-computing paradigms. OpenFHE is a leading CPU-based FHE library with robust CKKS operations, but its server-side performance is not yet sufficient for practical cloud deployment. As GPU computing becomes more common in da…

@arXiv_csCE_bot@mastoxiv.page
2025-08-19 08:29:30

Porous Convection in the Discrete Exterior Calculus with Geometric Multigrid
Luke Morris, George Rauta, Kevin Carlson, James Fairbanks
https://arxiv.org/abs/2508.12501 https://

Porous Convection in the Discrete Exterior Calculus with Geometric Multigrid
The discrete exterior calculus (DEC) defines a family of discretized differential operators which preserve certain desirable properties from the exterior calculus. We formulate and solve the porous convection equations in the DEC via the Decapodes.jl embedded domain-specific language (eDSL) for multiphysics problems discretized via CombinatorialSpaces.jl. CombinatorialSpaces.jl is an open-source Julia library which implements the DEC over simplicial complexes, and now offers a geometric multigr…

@frankel@mastodon.top
2025-06-24 16:15:02

Carrot Cache: High-Performance, SSD-Friendly #Caching #Library for #Java

🚀 Carrot Cache: High-Performance, SSD-Friendly Caching Library for Java
We are happy to announce that Carrot Data has officially open-sourced Carrot Cache under the Apache 2.0 license. After two years of intense development, testing, and benchmarking, we’re making this…

@Techmeme@techhub.social
2025-06-09 05:05:37

Cloudflare open sourced an OAuth library mostly written by Claude, showing how AI handles mechanical implementation while humans guide with context and judgment (Max Mitchell)
https://www.maxemitchell.com/writings/i-read-all-of-cloudflares…

Max Mitchell's Personal Portfolio Website
Max Mitchell's personal portfolio website showcasing his photography, YouTube videos, coding projects, and work history.

@v_i_o_l_a@openbiblio.social
2025-08-05 12:37:29

"Navigating #openaccess publishing #agreement caps in 2025" https://…

@niklaskorz@rheinneckar.social
2025-07-12 19:03:27

TIL linking likely does not make a program a derivative of a library in the EU, thus making the GPL, LGPL and MPL effectivelly identical here.
https://interoperable-europe.ec.europa.eu/collection/eupl/news/copyleft-or-reciprocal

“Copyleft” or "Reciprocal"…
(a legal expert opinion) In an open source licence, the impact of a reciprocity clause (or copyleft clause) is quite clear: the same licence must be reused when re-distributing the covered work and...

@arXiv_csNI_bot@mastoxiv.page
2025-07-08 08:44:40

OpenSN: An Open Source Library for Emulating LEO Satellite Networks
Wenhao Lu, Zhiyuan Wang, Hefan Zhang, Shan Zhang, Hongbin Luo
https://arxiv.org/abs/2507.03248

OpenSN: An Open Source Library for Emulating LEO Satellite Networks
Low-earth-orbit (LEO) satellite constellations (e.g., Starlink) are becoming a necessary component of future Internet. There have been increasing studies on LEO satellite networking. It is a crucial problem how to evaluate these studies in a systematic and reproducible manner. In this paper, we present OpenSN, i.e., an open source library for emulating large-scale satellite network (SN). Different from Mininet-based SN emulators (e.g., LeoEM), OpenSN adopts container-based virtualization, thus …

@heiseonline@social.heise.de
2025-08-03 15:45:13

Zum Abend noch einige der heute besonders häufig geteilten #News:
Belgisches Gericht ordnet Sperre der Open Library des Internet Archive an

Belgisches Gericht ordnet Sperre der Open Library des Internet Archive an
Ein Brüsseler Gericht hat eine sehr breite Anordnung für Websperren erlassen. Sie richtet sich gegen die Open Library sowie Schattenbibliotheken wie Z-Library.

@tiotasram@kolektiva.social
2025-06-24 09:39:49

Subtooting since people in the original thread wanted it to be over, but selfishly tagging @… and @… whose opinions I value...
I think that saying "we are not a supply chain" is exactly what open-source maintainers should be doing right now in response to "open source supply chain security" threads.
I can't claim to be an expert and don't maintain any important FOSS stuff, but I do release almost all of my code under open licenses, and I do use many open source libraries, and I have felt the pain of needing to replace an unmaintained library.
There's a certain small-to-mid-scale class of program, including many open-source libraries, which can be built/maintained by a single person, and which to my mind best operate on a "snake growth" model: incremental changes/fixes, punctuated by periodic "skin-shedding" phases where make rewrites or version updates happen. These projects aren't immortal either: as the whole tech landscape around them changes, they become unnecessary and/or people lose interest, so they go unmaintained and eventually break. Each time one of their dependencies breaks (or has a skin-shedding moment) there's a higher probability that they break or shed too, as maintenance needs shoot up at these junctures. Unless you're a company trying to make money from a single long-lived app, it's actually okay that software churns like this, and if you're a company trying to make money, your priorities absolutely should not factor into any decisions people making FOSS software make: we're trying (and to a huge extent succeeding) to make a better world (and/or just have fun with our own hobbies share that fun with others) that leaves behind the corrosive & planet-destroying plague which is capitalism, and you're trying to personally enrich yourself by embracing that plague. The fact that capitalism is *evil* is not an incidental thing in this discussion.
To make an imperfect analogy, imagine that the peasants of some domain have set up a really-free-market, where they provide each other with free stuff to help each other survive, sometimes doing some barter perhaps but mostly just everyone bringing their surplus. Now imagine the lord of the domain, who is the source of these peasants' immiseration, goes to this market secretly & takes some berries, which he uses as one ingredient in delicious tarts that he then sells for profit. But then the berry-bringer stops showing up to the free market, or starts bringing a different kind of fruit, or even ends up bringing rotten berries by accident. And the lord complains "I have a supply chain problem!" Like, fuck off dude! Your problem is that you *didn't* want to build a supply chain and instead thought you would build your profit-focused business in other people's free stuff. If you were paying the berry-picker, you'd have a supply chain problem, but you weren't, so you really have an "I want more free stuff" problem when you can't be arsed to give away your own stuff for free.
There can be all sorts of problems in the really-free-market, like maybe not enough people bring socks, so the peasants who can't afford socks are going barefoot, and having foot problems, and the peasants put their heads together and see if they can convince someone to start bringing socks, and maybe they can't and things are a bit sad, but the really-free-market was never supposed to solve everyone's problems 100% when they're all still being squeezed dry by their taxes: until they are able to get free of the lord & start building a lovely anarchist society, the really-free-market is a best-effort kind of deal that aims to make things better, and sometimes will fall short. When it becomes the main way goods in society are distributed, and when the people who contribute aren't constantly drained by the feudal yoke, at that point the availability of particular goods is a real problem that needs to be solved, but at that point, it's also much easier to solve. And at *no* point does someone coming into the market to take stuff only to turn around and sell it deserve anything from the market or those contributing to it. They are not a supply chain. They're trying to help each other out, but even then they're doing so freely and without obligation. They might discuss amongst themselves how to better coordinate their mutual aid, but they're not going to end up forcing anyone to bring anything or even expecting that a certain person contribute a certain amount, since the whole point is that the thing is voluntary & free, and they've all got changing life circumstances that affect their contributions. Celebrate whatever shows up at the market, express your desire for things that would be useful, but don't impose a burden on anyone else to bring a specific thing, because otherwise it's fair for them to oppose such a burden on you, and now you two are doing your own barter thing that's outside the parameters of the really-free-market.

@mgorny@social.treehouse.systems
2025-08-12 18:12:44

I was trying to package #FlexiBLAS for #Gentoo, and to be honest, it doesn't look that good.
The first red flag is lack of an open bug tracker. Apparently, there is the tracker on GitLab that's limited to "members of their group and selected external contributors", but it doesn't seem to be used much. So it's "send us an email", and wonder how many people sent us the same bug report before.
The git repository is currently at something tagged 3.4.80 that seems to be prerelease, and its build system is quite broken. Not exactly the best path to verify that the bugs you are hitting are still there.
Now, upstream seems to insist on either using vendored netlib #LAPACK, or statically linking to the system library (we don't install the static libraries). Apparently I can specify the shared libraries instead, but it doesn't work — and it's unclear to me whether it doesn't work because I'm using the shared libraries, or because it doesn't support my LAPACK version. If I build LAPACK without deprecated symbols, it refuses to load it at runtime because of missing symbols. And if I build it with deprecated symbols, it fails to find some symbols at CMake time.
Honestly, I feel like I've spent too much time on this project already, especially given that its future is entirely unclear to me — the current git is quite broken, I have no clue how many issues were reported already and whether my bug reports will receive any reply. It definitely doesn't fare well for a package that we might start to rely heavily on. We don't want a cathedral there.
https://www.mpi-magdeburg.mpg.de/projects/flexiblas
https://gitlab.mpi-magdeburg.mpg.de/software/flexiblas-release

@stefan@gardenstate.social
2025-07-09 13:53:17

ATM I don't see any end in site for me sipping for tailwind. It solves all my problems and doesn't cause any.
Always open to being sold something new but I wanted tailwind since 2017 when I wanted to just use inline css instead of what ever css library I was using.

@matthiasott@mastodon.social
2025-07-07 08:02:49

I’m trying to help a client pick a good UI framework they can start their product with, but ultimately grow into their own design system and component library. They have started development with React, which isn’t surprising, but they are also open to using a more framework-agnostic approach in the future.
Any suggestions for a really mature and solid, themeable framework as a starting point? Chakra UI? Ark UI? Radix?

@arXiv_csSD_bot@mastoxiv.page
2025-06-17 10:18:09

Video-Guided Text-to-Music Generation Using Public Domain Movie Collections
Haven Kim, Zachary Novack, Weihan Xu, Julian McAuley, Hao-Wen Dong
https://arxiv.org/abs/2506.12573

Video-Guided Text-to-Music Generation Using Public Domain Movie Collections
Despite recent advancements in music generation systems, their application in film production remains limited, as they struggle to capture the nuances of real-world filmmaking, where filmmakers consider multiple factors-such as visual content, dialogue, and emotional tone-when selecting or composing music for a scene. This limitation primarily stems from the absence of comprehensive datasets that integrate these elements. To address this gap, we introduce Open Screen Sound Library (OSSL), a dat…

@awinkler@openbiblio.social
2025-07-24 21:12:41

this looks very elegant; I think the mixture of persistence and flexibility (through suffixes, variants, and content negotiation) is very intriguing and it's very unfortunate that so many cultural heritage projects in Germany have opted for DOIs or URNs.
VIA: @…

ARK Alliance (@arks_org@fosstodon.org)
Using ARK identifiers to simplify access to open data at the National Library of France https://arks.org/news/2025-07-23-ark-access-to-open-data-at-the-national-library-of-france/

@mia@hcommons.social
2025-06-11 20:24:27

Very excited about this! Code to access GRIN will help lots of Google Books partners, and the example might open other doors, as well as the obvious benefits of access to data!
'Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability' https://arxiv.org/abs/2506…

Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability
Large language models (LLMs) use data to learn about the world in order to produce meaningful correlations and predictions. As such, the nature, scale, quality, and diversity of the datasets used to train these models, or to support their work at inference time, have a direct impact on their quality. The rapid development and adoption of LLMs of varying quality has brought into focus the scarcity of publicly available, high-quality training data and revealed an urgent need to ground the steward…

@arXiv_csSE_bot@mastoxiv.page
2025-07-16 09:40:31

How Robust are LLM-Generated Library Imports? An Empirical Study using Stack Overflow
Jasmine Latendresse, SayedHassan Khatoonabadi, Emad Shihab
https://arxiv.org/abs/2507.10818

How Robust are LLM-Generated Library Imports? An Empirical Study using Stack Overflow
Software libraries are central to the functionality, security, and maintainability of modern code. As developers increasingly turn to Large Language Models (LLMs) to assist with programming tasks, understanding how these models recommend libraries is essential. In this paper, we conduct an empirical study of six state-of-the-art LLMs, both proprietary and open-source, by prompting them to solve real-world Python problems sourced from Stack Overflow. We analyze the types of libraries they import…

@azonenberg@ioc.exchange
2025-07-06 07:35:46

Help wanted: Can we get someone to go through the build/link time dependencies of ngscopeclient, identify every third-party open source library we use, and ensure that they're all credited properly in the documentation, and include/link to the text of the appropriate licenses?
https://github.com/ng…

Add full text of third party licenses as appropriate · Issue #22 · ngscopeclient/scopehal-docs
Section 2.4 mentions a bunch of third party libraries but doesn't actually include the license text.

@arXiv_eessIV_bot@mastoxiv.page
2025-08-01 08:22:01

MRpro - open PyTorch-based MR reconstruction and processing package
Felix Frederik Zimmermann, Patrick Schuenke, Christoph S. Aigner, Bill A. Bernhardt, Mara Guastini, Johannes Hammacher, Noah Jaitner, Andreas Kofler, Leonid Lunin, Stefan Martin, Catarina Redshaw Kranich, Jakob Schattenfroh, David Schote, Yanglei Wu, Christoph Kolbitsch
https://

MRpro - open PyTorch-based MR reconstruction and processing package
We introduce MRpro, an open-source image reconstruction package built upon PyTorch and open data formats. The framework comprises three main areas. First, it provides unified data structures for the consistent manipulation of MR datasets and their associated metadata (e.g., k-space trajectories). Second, it offers a library of composable operators, proximable functionals, and optimization algorithms, including a unified Fourier operator for all common trajectories and an extended phase graph si…

@arXiv_quantph_bot@mastoxiv.page
2025-08-11 09:56:39

Fast simulations of continuous-variable circuits using the coherent state decomposition
Olga Solodovnikova, Ulrik L. Andersen, Jonas S. Neergaard-Nielsen
https://arxiv.org/abs/2508.06175

Fast simulations of continuous-variable circuits using the coherent state decomposition
We present \texttt{lcg\_plus}, an open-source Python library for the simulation of continuous-variable quantum circuits with both generaldyne and photon-number-resolving detector capabilities. Our framework merges the linear combination of Gaussians methodology with the coherent state decomposition of arbitrary non-Gaussian states, forming a bridge between the Gaussian and Fock basis representations. By tracking the Wigner function, we can simulate the action of Gaussian channels and measuremen…

@arXiv_hepph_bot@mastoxiv.page
2025-08-13 08:25:12

ALPaca: The ALP Automatic Computing Algorithm
Jorge Alda, Marta Fuentes Zamoro, Luca Merlo, Xavier Ponce D\'iaz, Stefano Rigolin
https://arxiv.org/abs/2508.08354 https://

ALPaca: The ALP Automatic Computing Algorithm
The ALP Automatic Computing Algorithm, ALPaca, is an open source Python library devoted to studying the phenomenology of Axion-Like Particles (ALPs) with masses in the ranges $m_a \in [0.01 - 10]$ GeV. ALPaca provides a flexible and comprehensive framework to define ALP couplings at arbitrary energy scales, perform Renormalisation Group evolution and matching down to the desired low energy scale, and compute a large variety of ALP observables, with particular care to the meson decay sector. The…

@boris@cosocial.ca
2025-05-29 04:55:33

Met @… at @… event. #CoSocialCa members in the wild.

Boris open mouth wave selfie with JDD at Internet Archive Canada Permanent Library.

@arXiv_mathOC_bot@mastoxiv.page
2025-07-09 08:11:42

MultiObjectiveAlgorithms.jl: a Julia package for solving multi-objective optimization problems
Oscar Dowson, Xavier Gandibleux, G\"okhan Kof
https://arxiv.org/abs/2507.05501 …

MultiObjectiveAlgorithms.jl: a Julia package for solving multi-objective optimization problems
We present MultiObjectiveAlgorithms.jl, an open-source Julia library for solving multi-objective optimization problems written in JuMP. MultiObjectiveAlgorithms.jl implements a number of different solution algorithms, which all rely on an iterative scalarization of the problem from a multi-objective optimization problem to a sequence of single-objective subproblems. As part of this work, we extended JuMP to support vector-valued objective functions. Because it is based on JuMP, MultiObjectiveAl…

@arXiv_heplat_bot@mastoxiv.page
2025-08-18 08:14:00

Implementing the finite-volume three-pion scattering formalism across all non-maximal isospins
Athari Alotaibi, Maxwell T. Hansen, Ra\'ul A. Brice\~no
https://arxiv.org/abs/2508.11627

Implementing the finite-volume three-pion scattering formalism across all non-maximal isospins
We present a numerical exploration of the relativistic-field-theory (RFT) formalism for three pions with all possible values of non-maximal isospin, $I_{πππ} = 2$, $1$ and $0$. Using the generic-isospin extension of the RFT formalism and applying our open-source Python library to implement the framework, we predict a range of three-pion energies for illustrative values of the two-to-two scattering amplitudes for various finite-volume irreps also with non-zero total momentum $\boldsymbol P$ i…

@arXiv_statME_bot@mastoxiv.page
2025-07-25 08:26:32

Spatialize v1.0: A Python/C Library for Ensemble Spatial Interpolation
Alvaro F. Ega\~na, Alejandro Ehrenfeld, Felipe Garrido, Mar\'ia Jes\'us Valenzuela, Juan F. S\'anchez-P\'erez
https://arxiv.org/abs/2507.17867

Spatialize v1.0: A Python/C++ Library for Ensemble Spatial Interpolation
In this paper, we present Spatialize, an open-source library that implements ensemble spatial interpolation, a novel method that combines the simplicity of basic interpolation methods with the power of classical geostatistical tools, like Kriging. It leverages the richness of stochastic modelling and ensemble learning, making it robust, scalable and suitable for large datasets. In addition, Spatialize provides a powerful framework for uncertainty quantification, offering both point estimates an…

@arXiv_csCE_bot@mastoxiv.page
2025-08-12 08:52:33

Cardiotensor: A Python Library for Orientation Analysis and Tractography in 3D Cardiac Imaging
Joseph Brunet, Lisa Chestnutt, Matthieu Chourrout, Hector Dejea, Vaishnavi Sabarigirivasan, Peter D. Lee, Andrew C. Cook
https://arxiv.org/abs/2508.07476

Cardiotensor: A Python Library for Orientation Analysis and Tractography in 3D Cardiac Imaging
Understanding the architecture of the human heart requires analysis of its microstructural organization across scales. With the advent of high-resolution imaging techniques such as synchrotron-based tomography, it has become possible to visualize entire hearts at micron-scale resolution. However, translating these large, complex volumetric datasets into interpretable, quantitative descriptors of cardiac organization remains a major challenge. Here we present cardiotensor, an open-source Python …

@tiotasram@kolektiva.social
2025-08-04 15:49:00

Should we teach vibe coding? Here's why not.
Should AI coding be taught in undergrad CS education?
1/2
I teach undergraduate computer science labs, including for intro and more-advanced core courses. I don't publish (non-negligible) scholarly work in the area, but I've got years of craft expertise in course design, and I do follow the academic literature to some degree. In other words, In not the world's leading expert, but I have spent a lot of time thinking about course design, and consider myself competent at it, with plenty of direct experience in what knowledge & skills I can expect from students as they move through the curriculum.
I'm also strongly against most uses of what's called "AI" these days (specifically, generative deep neutral networks as supplied by our current cadre of techbro). There are a surprising number of completely orthogonal reasons to oppose the use of these systems, and a very limited number of reasonable exceptions (overcoming accessibility barriers is an example). On the grounds of environmental and digital-commons-pollution costs alone, using specifically the largest/newest models is unethical in most cases.
But as any good teacher should, I constantly question these evaluations, because I worry about the impact on my students should I eschew teaching relevant tech for bad reasons (and even for his reasons). I also want to make my reasoning clear to students, who should absolutely question me on this. That inspired me to ask a simple question: ignoring for one moment the ethical objections (which we shouldn't, of course; they're very stark), at what level in the CS major could I expect to teach a course about programming with AI assistance, and expect students to succeed at a more technically demanding final project than a course at the same level where students were banned from using AI? In other words, at what level would I expect students to actually benefit from AI coding "assistance?"
To be clear, I'm assuming that students aren't using AI in other aspects of coursework: the topic of using AI to "help you study" is a separate one (TL;DR it's gross value is not negative, but it's mostly not worth the harm to your metacognitive abilities, which AI-induced changes to the digital commons are making more important than ever).
So what's my answer to this question?
If I'm being incredibly optimistic, senior year. Slightly less optimistic, second year of a masters program. Realistic? Maybe never.
The interesting bit for you-the-reader is: why is this my answer? (Especially given that students would probably self-report significant gains at lower levels.) To start with, [this paper where experienced developers thought that AI assistance sped up their work on real tasks when in fact it slowed it down] (https://arxiv.org/abs/2507.09089) is informative. There are a lot of differences in task between experienced devs solving real bugs and students working on a class project, but it's important to understand that we shouldn't have a baseline expectation that AI coding "assistants" will speed things up in the best of circumstances, and we shouldn't trust self-reports of productivity (or the AI hype machine in general).
Now we might imagine that coding assistants will be better at helping with a student project than at helping with fixing bugs in open-source software, since it's a much easier task. For many programming assignments that have a fixed answer, we know that many AI assistants can just spit out a solution based on prompting them with the problem description (there's another elephant in the room here to do with learning outcomes regardless of project success, but we'll ignore this over too, my focus here is on project complexity reach, not learning outcomes). My question is about more open-ended projects, not assignments with an expected answer. Here's a second study (by one of my colleagues) about novices using AI assistance for programming tasks. It showcases how difficult it is to use AI tools well, and some of these stumbling blocks that novices in particular face.
But what about intermediate students? Might there be some level where the AI is helpful because the task is still relatively simple and the students are good enough to handle it? The problem with this is that as task complexity increases, so does the likelihood of the AI generating (or copying) code that uses more complex constructs which a student doesn't understand. Let's say I have second year students writing interactive websites with JavaScript. Without a lot of care that those students don't know how to deploy, the AI is likely to suggest code that depends on several different frameworks, from React to JQuery, without actually setting up or including those frameworks, and of course three students would be way out of their depth trying to do that. This is a general problem: each programming class carefully limits the specific code frameworks and constructs it expects students to know based on the material it covers. There is no feasible way to limit an AI assistant to a fixed set of constructs or frameworks, using current designs. There are alternate designs where this would be possible (like AI search through adaptation from a controlled library of snippets) but those would be entirely different tools.
So what happens on a sizeable class project where the AI has dropped in buggy code, especially if it uses code constructs the students don't understand? Best case, they understand that they don't understand and re-prompt, or ask for help from an instructor or TA quickly who helps them get rid of the stuff they don't understand and re-prompt or manually add stuff they do. Average case: they waste several hours and/or sweep the bugs partly under the rug, resulting in a project with significant defects. Students in their second and even third years of a CS major still have a lot to learn about debugging, and usually have significant gaps in their knowledge of even their most comfortable programming language. I do think regardless of AI we as teachers need to get better at teaching debugging skills, but the knowledge gaps are inevitable because there's just too much to know. In Python, for example, the LLM is going to spit out yields, async functions, try/finally, maybe even something like a while/else, or with recent training data, the walrus operator. I can't expect even a fraction of 3rd year students who have worked with Python since their first year to know about all these things, and based on how students approach projects where they have studied all the relevant constructs but have forgotten some, I'm not optimistic seeing these things will magically become learning opportunities. Student projects are better off working with a limited subset of full programming languages that the students have actually learned, and using AI coding assistants as currently designed makes this impossible. Beyond that, even when the "assistant" just introduces bugs using syntax the students understand, even through their 4th year many students struggle to understand the operation of moderately complex code they've written themselves, let alone written by someone else. Having access to an AI that will confidently offer incorrect explanations for bugs will make this worse.
To be sure a small minority of students will be able to overcome these problems, but that minority is the group that has a good grasp of the fundamentals and has broadened their knowledge through self-study, which earlier AI-reliant classes would make less likely to happen. In any case, I care about the average student, since we already have plenty of stuff about our institutions that makes life easier for a favored few while being worse for the average student (note that our construction of that favored few as the "good" students is a large part of this problem).
To summarize: because AI assistants introduce excess code complexity and difficult-to-debug bugs, they'll slow down rather than speed up project progress for the average student on moderately complex projects. On a fixed deadline, they'll result in worse projects, or necessitate less ambitious project scoping to ensure adequate completion, and I expect this remains broadly true through 4-6 years of study in most programs (don't take this as an endorsement of AI "assistants" for masters students; we've ignored a lot of other problems along the way).
There's a related problem: solving open-ended project assignments well ultimately depends on deeply understanding the problem, and AI "assistants" allow students to put a lot of code in their file without spending much time thinking about the problem or building an understanding of it. This is awful for learning outcomes, but also bad for project success. Getting students to see the value of thinking deeply about a problem is a thorny pedagogical puzzle at the best of times, and allowing the use of AI "assistants" makes the problem much much worse. This is another area I hope to see (or even drive) pedagogical improvement in, for what it's worth.
1/2

@arXiv_nlinPS_bot@mastoxiv.page
2025-06-26 08:20:10

rd-spiral: An open-source Python library for learning 2D reaction-diffusion dynamics through pseudo-spectral method
Sandy H. S. Herho, Iwan P. Anwar, Rusmawan Suwarman
https://arxiv.org/abs/2506.20633

rd-spiral: An open-source Python library for learning 2D reaction-diffusion dynamics through pseudo-spectral method
We introduce rd-spiral, an open-source Python library for simulating 2D reaction-diffusion systems using pseudo-spectral methods. The framework combines FFT-based spatial discretization with adaptive Dormand-Prince time integration, achieving exponential convergence while maintaining pedagogical clarity. We analyze three dynamical regimes: stable spirals, spatiotemporal chaos, and pattern decay, revealing extreme non-Gaussian statistics (kurtosis $>96$) in stable states. Information-theoretic m…

@v_i_o_l_a@openbiblio.social
2025-07-02 12:08:45

"#OpenAccess and #Citation #Impact: Modality, Funding, Publisher, and Disciplinary Trends at the University of Kentucky"

Open Access and Citation Impact: Modality, Funding, Publisher, and Disciplinary Trends at the University of Kentucky | Rawlins | Library Resources & Technical Services
Open Access and Citation Impact: Modality, Funding, Publisher, and Disciplinary Trends at the University of Kentucky

@arXiv_csSE_bot@mastoxiv.page
2025-06-16 10:23:49

Understanding API Usage and Testing: An Empirical Study of C Libraries
Ahmed Zaki, Cristian Cadar
https://arxiv.org/abs/2506.11598 https://

Understanding API Usage and Testing: An Empirical Study of C Libraries
For library developers, understanding how their Application Programming Interfaces (APIs) are used in the field can be invaluable. Knowing how clients are using their APIs allows for data-driven decisions on prioritising bug reports, feature requests, and testing activities. For example, the priority of a bug report concerning an API can be partly determined by how widely that API is used. In this paper, we present an empirical study in which we analyse API usage across 21 popular open-source…

@kexpmusicbot@mastodonapp.uk
2025-08-15 12:27:23

🇺🇦 #NowPlaying on #KEXP's #Early
The Linda Lindas:
🎵 Racist, Sexist Boy (Live at LA Public Library)
#TheLindaLindas
https://thelindalindas.bandcamp.com/track/racist-sexist-boy
https://open.spotify.com/track/6CSLL3sOgYIMSRj69mkGSI

@arXiv_csCL_bot@mastoxiv.page
2025-07-28 09:57:51

TokenSmith: Streamlining Data Editing, Search, and Inspection for Large-Scale Language Model Training and Interpretability
Mohammad Aflah Khan, Ameya Godbole, Johnny Tian-Zheng Wei, Ryan Wang, James Flemings, Krishna Gummadi, Willie Neiswanger, Robin Jia
https://arxiv.org/abs/2507.19419

TokenSmith: Streamlining Data Editing, Search, and Inspection for Large-Scale Language Model Training and Interpretability
Understanding the relationship between training data and model behavior during pretraining is crucial, but existing workflows make this process cumbersome, fragmented, and often inaccessible to researchers. We present TokenSmith, an open-source library for interactive editing, inspection, and analysis of datasets used in Megatron-style pretraining frameworks such as GPT-NeoX, Megatron, and NVIDIA NeMo. TokenSmith supports a wide range of operations including searching, viewing, ingesting, expor…

@arXiv_csDS_bot@mastoxiv.page
2025-08-04 12:04:33

Replaced article(s) found for cs.DS. https://arxiv.org/list/cs.DS/new
[1/1]:
- TGLib: An Open-Source Library for Temporal Graph Analysis
Lutz Oettershagen, Petra Mutzel

@gscherer2@social.linux.pizza
2025-08-05 14:48:04

Sacred Lotus. Huntington Library, San Marino, California, USA. July, 2025. #huntingtonlibrary #lotus #waterlily

A partially open reddish pink lotus flower, on a green stalk, against a background of large green leaves.

@arXiv_eessSY_bot@mastoxiv.page
2025-06-27 09:23:39

DPLib: A Standard Benchmark Library for Distributed Power System Analysis and Optimization
Milad Hasanzadeh, Amin Kargarian
https://arxiv.org/abs/2506.20819

DPLib: A Standard Benchmark Library for Distributed Power System Analysis and Optimization
\textit{DPLib} is an open-source MATLAB-based benchmark library created to support research and development in distributed and decentralized power system analysis and optimization. Distributed and decentralized methods offer scalability, privacy preservation, and resilience to single points of failure, making them increasingly important for modern power systems. However, unlike centralized tools such as MATPOWER, no general-purpose, reproducible data library package currently exists for distrib…

@arXiv_mathNA_bot@mastoxiv.page
2025-06-26 08:31:30

DefElement: an encyclopedia of finite element definitions
Matthew W. Scroggs, Pablo D. Brubeck, Joseph P. Dean, J{\o}rgen S. Dokken, India Marsden
https://arxiv.org/abs/2506.20188

DefElement: an encyclopedia of finite element definitions
DefElement is an online encyclopedia of finite element definitions that was created and is maintained by the authors of this paper. DefElement aims to make information about elements defined in the literature easily available in a standard format. There are a number of open-source finite element libraries available, and it can be difficult to check that an implementation of an element in a library matches the element's definition in the literature or implementation in another library, especiall…

@v_i_o_l_a@openbiblio.social
2025-07-24 12:24:20

"Open Access and Citation Impact: Modality, Funding, Publisher, and Disciplinary Trends at the University of Kentucky" #OpenAccess

Open Access and Citation Impact: Modality, Funding, Publisher, and Disciplinary Trends at the University of Kentucky | Rawlins | Library Resources & Technical Services
Open Access and Citation Impact: Modality, Funding, Publisher, and Disciplinary Trends at the University of Kentucky

@mia@hcommons.social
2025-07-04 14:22:45

Nice! Osma Suominen @… from National Library of Finland (Annif & FintoAI)'s 5 points for AI in libraries:
- Use AI to make the world better
- Use the smallest AI that works
- Don't depend on corporate AI
- Evaluate & create data sets
- Be open and transparent

@arXiv_csDC_bot@mastoxiv.page
2025-07-08 10:15:40

Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms
Zhiyi Hu, Siyuan Shen, Tommaso Bonato, Sylvain Jeaugey, Cedell Alexander, Eric Spada, Jeff Hammond, Torsten Hoefler
https://arxiv.org/abs/2507.04786

Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms
The NVIDIA Collective Communication Library (NCCL) is a critical software layer enabling high-performance collectives on large-scale GPU clusters. Despite being open source with a documented API, its internal design remains largely opaque. The orchestration of communication channels, selection of protocols, and handling of memory movement across devices and nodes are not well understood, making it difficult to analyze performance or identify bottlenecks. This paper presents a comprehensive anal…

@arXiv_mathOC_bot@mastoxiv.page
2025-08-07 09:19:44

A factorisation-based regularised interior point method using the augmented system
Filippo Zanetti, Jacek Gondzio
https://arxiv.org/abs/2508.04370 https://…

A factorisation-based regularised interior point method using the augmented system
This paper describes the implementation of a new interior point solver for linear programming for the open-source optimization library HiGHS. The solver uses a direct factorisation to solve the Newton systems, choosing the best approach between the normal equations and augmented system. Details of the multifrontal factorisation routine are given, with attention to the features that allow to achieve high performance, like storage formats, use of efficient dense linear algebra subroutines and par…

@arXiv_csMS_bot@mastoxiv.page
2025-05-27 07:22:20

f4ncgb: High Performance Gr\"obner Basis Computations in Free Algebras
Maximilian Heisinger, Clemens Hofstadler
https://arxiv.org/abs/2505.19304 https…

f4ncgb: High Performance Gröbner Basis Computations in Free Algebras
We present f4ncgb, a new open-source C++ library for Gröbner basis computations in free algebras, which transfers recent advancements in commutative Gröbner basis software to the noncommutative setting. As our experiments show, f4ncgb establishes a new state-of-the-art for noncommutative Gröbner basis computations. We also discuss implementation details and design choices.

@tiotasram@kolektiva.social
2025-07-30 17:56:35

Just read this post by @… on an optimistic AGI future, and while it had some interesting and worthwhile ideas, it's also in my opinion dangerously misguided, and plays into the current AGI hype in a harmful way.
https://social.coop/@eloquence/114940607434005478
My criticisms include:
- Current LLM technology has many layers, but the biggest most capable models are all tied to corporate datacenters and require inordinate amounts of every and water use to run. Trying to use these tools to bring about a post-scarcity economy will burn up the planet. We urgently need more-capable but also vastly more efficient AI technologies if we want to use AI for a post-scarcity economy, and we are *not* nearly on the verge of this despite what the big companies pushing LLMs want us to think.
- I can see that permacommons.org claims a small level of expenses on AI equates to low climate impact. However, given current deep subsidies on place by the big companies to attract users, that isn't a great assumption. The fact that their FAQ dodges the question about which AI systems they use isn't a great look.
- These systems are not free in the same way that Wikipedia or open-source software is. To run your own model you need a data harvesting & cleaning operation that costs millions of dollars minimum, and then you need millions of dollars worth of storage & compute to train & host the models. Right now, big corporations are trying to compete for market share by heavily subsidizing these things, but it you go along with that, you become dependent on them, and you'll be screwed when they jack up the price to a profitable level later. I'd love to see open dataset initiatives SBD the like, and there are some of these things, but not enough yet, and many of the initiatives focus on one problem while ignoring others (fine for research but not the basis for a society yet).
- Between the environmental impacts, the horrible labor conditions and undercompensation of data workers who filter the big datasets, and the impacts of both AI scrapers and AI commons pollution, the developers of the most popular & effective LLMs have a lot of answer for. This project only really mentions environmental impacts, which makes me think that they're not serious about ethics, which in turn makes me distrustful of the whole enterprise.
- Their language also ends up encouraging AI use broadly while totally ignoring several entire classes of harm, so they're effectively contributing to AI hype, especially with such casual talk of AGI and robotics as if embodied AGI were just around the corner. To be clear about this point: we are several breakthroughs away from AGI under the most optimistic assumptions, and giving the impression that those will happen soon plays directly into the hands of the Sam Altmans of the world who are trying to make money off the impression of impending huge advances in AI capabilities. Adding to the AI hype is irresponsible.
- I've got a more philosophical criticism that I'll post about separately.
I do think that the idea of using AI & other software tools, possibly along with robotics and funded by many local cooperatives, in order to make businesses obsolete before they can do the same to all workers, is a good one. Get your local library to buy a knitting machine alongside their 3D printer.
Lately I've felt too busy criticizing AI to really sit down and think about what I do want the future to look like, even though I'm a big proponent of positive visions for the future as a force multiplier for criticism, and this article is inspiring to me in that regard, even if the specific project doesn't seem like a good one.

@ronaldsnijder@mastodon.social
2025-06-16 09:42:56

At @oapenbooks.bsky.social, we have updated our #Metadata feeds, to better integrate our #OpenAccess #books into #libraries

A book is a book in any other markup language
Updating the OAPEN Library’s metadata feeds Since the launch of the OAPEN Library in 2010, the metadata feeds for our collection of open access (OA) books and chapters have always been and will continue to be freely available for anyone to access, download, and use as they need from the OAPEN website. Metadata is information … Continue reading "A book is a book in any other markup language"

@arXiv_qbiobm_bot@mastoxiv.page
2025-06-26 09:13:00

ProCaliper: functional and structural analysis, visualization, and annotation of proteins
Jordan C. Rozum, Hunter Ufford, Alexandria K. Im, Tong Zhang, David D. Pollock, Doo Nam Kim, Song Feng
https://arxiv.org/abs/2506.19961

ProCaliper: functional and structural analysis, visualization, and annotation of proteins
Understanding protein function at the molecular level requires connecting residue-level annotations with physical and structural properties. This can be cumbersome and error-prone when functional annotation, computation of physico-chemical properties, and structure visualization are separated. To address this, we introduce ProCaliper, an open-source Python library for computing and visualizing physico-chemical properties of proteins. It can retrieve annotation and structure data from UniProt an…

@arXiv_hepph_bot@mastoxiv.page
2025-06-03 16:44:54

This https://arxiv.org/abs/1910.14012 has been replaced.
link: https://scholar.google.com/scholar?q=a

$\texttt{HEPfit}$: a Code for the Combination of Indirect and Direct Constraints on High Energy Physics Models
$\texttt{HEPfit}$ is a flexible open-source tool which, given the Standard Model or any of its extensions, allows to $\textit{i)}$ fit the model parameters to a given set of experimental observables; $\textit{ii)}$ obtain predictions for observables. $\texttt{HEPfit}$ can be used either in Monte Carlo mode, to perform a Bayesian Markov Chain Monte Carlo analysis of a given model, or as a library, to obtain predictions of observables for a given point in the parameter space of the model, allowin…

@ronaldsnijder@mastodon.social
2025-05-27 08:00:25

I wrote a little blogpost about #AI #bots that are like a plague of locusts on the #OAPEN #Library

Funnelling locusts – further reflections on the OAPEN Library and DOAB’s response time
In my previous post on our less than optimal performance of the OAPEN Library and DOAB, I wrote about the effect of AI bots on our systems, and that it has become a common problem for many sites providing open access (OA) or open source software content. All of them – including the OAPEN Library … Continue reading "Funnelling locusts – further reflections on the OAPEN Library and DOAB’s response time "

@tiotasram@kolektiva.social
2025-07-31 16:25:48

LLM coding is the opposite of DRY
An important principle in software engineering is DRY: Don't Repeat Yourself. We recognize that having the same code copied in more than one place is bad for several reasons:
1. It makes the entire codebase harder to read.
2. It increases maintenance burden, since any problems in the duplicated code need to be solved in more than one place.
3. Because it becomes possible for the copies to drift apart if changes to one aren't transferred to the other (maybe the person making the change has forgotten there was a copy) it makes the code more error-prone and harder to debug.
All modern programming languages make it almost entirely unnecessary to repeat code: we can move the repeated code into a "function" or "module" and then reference it from all the different places it's needed. At a larger scale, someone might write an open-source "library" of such functions or modules and instead of re-implementing that functionality ourselves, we can use their code, with an acknowledgement. Using another person's library this way is complicated, because now you're dependent on them: if they stop maintaining it or introduce bugs, you've inherited a problem, but still, you could always copy their project and maintain your own version, and it would be not much more work than if you had implemented stuff yourself from the start. It's a little more complicated than this, but the basic principle holds, and it's a foundational one for software development in general and the open-source movement in particular. The network of "citations" as open-source software builds on other open-source software and people contribute patches to each others' projects is a lot of what makes the movement into a community, and it can lead to collaborations that drive further development. So the DRY principle is important at both small and large scales.
Unfortunately, the current crop of hyped-up LLM coding systems from the big players are antithetical to DRY at all scales:
- At the library scale, they train on open source software but then (with some unknown frequency) replicate parts of it line-for-line *without* any citation [1]. The person who was using the LLM has no way of knowing that this happened, or even any way to check for it. In theory the LLM company could build a system for this, but it's not likely to be profitable unless the courts actually start punishing these license violations, which doesn't seem likely based on results so far and the difficulty of finding out that the violations are happening. By creating these copies (and also mash-ups, along with lots of less-problematic stuff), the LLM users (enabled and encouraged by the LLM-peddlers) are directly undermining the DRY principle. If we see what the big AI companies claim to want, which is a massive shift towards machine-authored code, DRY at the library scale will effectively be dead, with each new project simply re-implementing the functionality it needs instead of every using a library. This might seem to have some upside, since dependency hell is a thing, but the downside in terms of comprehensibility and therefore maintainability, correctness, and security will be massive. The eventual lack of new high-quality DRY-respecting code to train the models on will only make this problem worse.
- At the module & function level, AI is probably prone to re-writing rather than re-using the functions or needs, especially with a workflow where a human prompts it for many independent completions. This part I don't have direct evidence for, since I don't use LLM coding models myself except in very specific circumstances because it's not generally ethical to do so. I do know that when it tries to call existing functions, it often guesses incorrectly about the parameters they need, which I'm sure is a headache and source of bugs for the vibe coders out there. An AI could be designed to take more context into account and use existing lookup tools to get accurate function signatures and use them when generating function calls, but even though that would probably significantly improve output quality, I suspect it's the kind of thing that would be seen as too-baroque and thus not a priority. Would love to hear I'm wrong about any of this, but I suspect the consequences are that any medium-or-larger sized codebase written with LLM tools will have significant bloat from duplicate functionality, and will have places where better use of existing libraries would have made the code simpler. At a fundamental level, a principle like DRY is not something that current LLM training techniques are able to learn, and while they can imitate it from their training sets to some degree when asked for large amounts of code, when prompted for many smaller chunks, they're asymptotically likely to violate it.
I think this is an important critique in part because it cuts against the argument that "LLMs are the modern compliers, if you reject them you're just like the people who wanted to keep hand-writing assembly code, and you'll be just as obsolete." Compilers actually represented a great win for abstraction, encapsulation, and DRY in general, and they supported and are integral to open source development, whereas LLMs are set to do the opposite.
[1] to see what this looks like in action in prose, see the example on page 30 of the NYTimes copyright complaint against OpenAI (#AI #GenAI #LLMs #VibeCoding

@arXiv_csCE_bot@mastoxiv.page
2025-08-05 07:48:50

EngiBench: A Framework for Data-Driven Engineering Design Research
Florian Felten, Gabriel Apaza, Gerhard Br\"aunlich, Cashen Diniz, Xuliang Dong, Arthur Drake, Milad Habibi, Nathaniel J. Hoffman, Matthew Keeler, Soheyl Massoudi, Francis G. VanGessel, Mark Fuge
https://arxiv.org/abs/2508.00831…

EngiBench: A Framework for Data-Driven Engineering Design Research
Engineering design optimization seeks to automatically determine the shapes, topologies, or parameters of components that maximize performance under given conditions. This process often depends on physics-based simulations, which are difficult to install, computationally expensive, and require domain-specific expertise. To mitigate these challenges, we introduce EngiBench, the first open-source library and datasets spanning diverse domains for data-driven engineering design. EngiBench provides …

@arXiv_csSE_bot@mastoxiv.page
2025-06-24 11:02:40

SAVANT: Vulnerability Detection in Application Dependencies through Semantic-Guided Reachability Analysis
Wang Lingxiang, Quanzhi Fu, Wenjia Song, Gelei Deng, Yi Liu, Dan Williams, Ying Zhang
https://arxiv.org/abs/2506.17798

SAVANT: Vulnerability Detection in Application Dependencies through Semantic-Guided Reachability Analysis
The integration of open-source third-party library dependencies in Java development introduces significant security risks when these libraries contain known vulnerabilities. Existing Software Composition Analysis (SCA) tools struggle to effectively detect vulnerable API usage from these libraries due to limitations in understanding API usage semantics and computational challenges in analyzing complex codebases, leading to inaccurate vulnerability alerts that burden development teams and delay c…

@tiotasram@kolektiva.social
2025-08-05 10:34:05

It's time to lower your inhibitions towards just asking a human the answer to your question.
In the early nineties, effectively before the internet, that's how you learned a lot of stuff. Your other option was to look it up in a book. I was a kid then, so I asked my parents a lot of questions.
Then by ~2000 or a little later, it started to feel almost rude to do this, because Google was now a thing, along with Wikipedia. "Let me Google that for you" became a joke website used to satirize the poor fool who would waste someone's time answering a random question. There were some upsides to this, as well as downsides. I'm not here to judge them.
At this point, Google doesn't work any more for answering random questions, let alone more serous ones. That era is over. If you don't believe it, try it yourself. Between Google intentionally making their results worse to show you more ads, the SEO cruft that already existed pre-LLMs, and the massive tsunami of SEO slop enabled by LLMs, trustworthy information is hard to find, and hard to distinguish from the slop. (I posted an example earlier: #AI #LLMs #DigitalCommons #AskAQuestion

Tootfinder

Opt-in global Mastodon full text search. Join the index!