Tootfinder

Opt-in global Mastodon full text search. Join the index!

@mgorny@social.treehouse.systems
2025-12-06 17:32:10

Worked on some more #Gentoo global #jobserver goodies today.
Firstly, Portage jobserver support patch: #PyTest jobs will also be counted towards total job count.
Again, it's not a perfect solution, but it works reasonably. The plugin still starts -n jobs as specified by the arguments, but it acquired job tokens prior to executing every test, therefore delaying actual testing until tokens are available. It doesn't seem to cause noticeable overhead either.

@mgorny@social.treehouse.systems
2025-12-07 05:32:56

#Gentoo #jobserver revealed another problem with steve in particular, and (I believe) the jobserver protocol in general: blocking clients are prioritized over polling clients.
The problem is simple: when handling blocking reads, steve can issue a job token immediately. When handling a poll, it merely indicates that a token is available, and the client must issue another read request to get it. So if tokens are scarce and there are both blocking and polling clients running, the former are likely to be taking all the incoming tokens.
My idea of working around this is to implement temporary reservations. If a client polls for a token, we reserve one for it. The reserved token can afterwards be only read by the same client. This way, both blocking and polling clients get a token — the former get it immediately, the latter get it reserved for them. And if there are no tokens available, both get into a single FIFO queue, for a poor man's round-robin (steve also throttles all reads to one token at a time).
However, polls technically don't guarantee that the client will eventually read the token, so we need to handle reservation expirations as well.

@mgorny@social.treehouse.systems
2025-12-18 09:37:54

0 days since it turned out that GNU #make does not respect the #jobserver protocol it designed itself in yet another way. Or to put it otherwise, I've wasted my whole morning implemented something that cannot work because it didn't occur to me that GNU make people only set rules without caring to actually follow them.
#Gentoo #Linux

@mgorny@social.treehouse.systems
2026-01-03 13:09:36

So it's been pointed out that steve, the #jobserver, doesn't have a #logo. Here's a first approximation. Made in #Inkscape, using lines and fonts (the apples are from Noto Sans, I think). I also have an SVG version but browsers mess it up by substituting the apples for some colorful emojis with incompatible metrics. I've added a bright background, since the actual transparent background doesn't work on black (which seems to be Mastodon default).

@mgorny@social.treehouse.systems
2025-12-23 13:11:20

I'm building webkit-gtk right now. It's one of these messy packages where a few source files need a lot of memory to compile, and ninja can randomly order jobs so that all of them suddenly start compiling simultaneously. So to keep things going smoothly without OOM-ing, I've been dynamically adjusting the available job count via steve the #jobserver.
While doing that, I've noticed that ninja isn't taking new jobs immediately after I increased the job count. So I've started debugging steve, and couldn't find out anything wrong with it. Finally, I've looked into ninja and realized how lazy their code is.
So, there are two main approaches to acquiring job tokens. Either you do blocking reads, and therefore wait for a token to become available, or you use polling to get noticed when it becomes available. Ninja instead does non-blocking reads, and if there are no more tokens available… it waits till one of its own jobs finish.
This roughly means that as other processes release tokens, ninja won't take them until one of its own jobs finish. And if ninja didn't manage to acquire any job tokens to begin with, it is just running a single process via implicit slot, and that process finishing provides it with the only chance to acquire additional tokens. So realistically speaking, as long as there are other build jobs running in parallel, ninja is going to need to be incredibly lucky to ever get a job token, since all other processes will grab the available tokens immediately.
This isn't something that steve can fix.
#Gentoo #NinjaBuild

@mgorny@social.treehouse.systems
2025-12-01 03:20:19

New on my #Gentoo blog: One #jobserver to rule them all
"""
A common problem with running Gentoo builds is concurrency. Many packages include extensive build steps that are either fully serial, or cannot fully utilize the available CPU threads throughout. This problem becomes less pronounced when running building multiple packages in parallel, but then we are risking overscheduling for packages that do take advantage of parallel builds.
Fortunately, there are a few tools at our disposal that can improve the situation. Most recently, they were joined by two experimental system-wide jobservers: #guildmaster and #steve. In this post, I’d like to provide the background on them, and discuss the problems they are facing.
"""
blogs.gentoo.org/mgorny/2025/1

@mgorny@social.treehouse.systems
2025-11-22 16:54:04

#Steve the #Jobserver has undergone a major rewrite over the last week. It's now implemented using CUSE, the #FUSE API for character devices. It is using pidfd to track processes acquiring job tokens, and automatically reclaims them if processes die without returning them, preventing dead processes from effectively locking the system jobserver.
The code's still a bit ugly — it's a C-changed-midway-to-C , with libevent for event loops and (still) FUSE's ugly argument parsing.
If someone wants to play with it, the live ebuild is available in #Gentoo as dev-build/steve.
gitweb.gentoo.org/proj/steve.g