Bisecting GBP Builds to Find a Bad Package

Let's say you use Gentoo Build Publisher to continuously build your Gentoo machine's packages, but you haven't updated your actual machine in a while. And finally when you do something's broken but you don't know what broke or when? Well something like that happened to me and I want to share the process that I went through to find the root cause.

The past few days I had been at PyCon US 2023 with my laptop. I hadn't gotten around to installing wireguard so didn't have access to my GBP instance running at home. Therefore I was not applying updates. After I got back I did a gbp publish lighthouse followed by an emerge --update ..., rebooted (there was a kernel update) and after my laptop came back up I'd log in and GNOME Shell would immediately crash. So sad 🙁. I looked at the logs and it was not obvious to me what was the source of the crash. And since I had been gone for a few days there were 14 GBP builds and more than 90 packages that have changed. So how do I figure out which package broke my machine?

Well it turns out this is a variation on "Rolling Back a Rolling Release with Gentoo Build Publisher". See that article to learn how to roll back in general. I can combine that methodology with doing a manual bisect. I knew that the last build was broken, and the build just before I left town was good, I run gbp list lighthouse and manually bisected and rolled back, and tested. It turned out that the fourth build on the list was the first breaking build. I then did a gbp diff between the third and the fourth build to see what where the differences in packages. The reason I did not use gbp status for that is because that would only show the packages for that build, and there might be "holes" in the list of builds that gbp status won't show.

After I got a diff of the packages it was more of a guessing game. There were a few GNOME-like package builds so any or a combination could have been the breaking package(s). So what I did was update to the broke GBP build. Then, on my laptop, manually downgrade a suspicious package. So for example, if I want to downgrade dev-libs/glib-2.76.2 I would run emerge -1va <dev-libs/glib-2.76.2. Since I already have the binary for the older packages in my binary package cache, this required no compiling. Then I would try logging in again with the downgraded package. If that did not fix the crash, I'd upgrade it the package back to the GBP build's version and try again with the next package. After this I was finally able to reduce the problem package to net-libs/libsoup-3.4.1 I then masked that version in my GBP repo's package.mask, ran a Jenkins build, published it on GBP and then upgraded to that build on my laptop. After that I was able to log in without crashing.

So after 14 builds and 90+ packages I was able to reduce the problem down to a single package in a single build. Next steps are to try to find the cause of the breakage and, if necessary, report a bug. After that gets resolved I can then remove the entry in package.mask.

So perhaps there should be a bisect subcommand to the GBP cli? Maybe that's not necessary but if I were sent a pull request for with that feature I might consider it.