Let's say you use Gentoo Build Publisher to continuously build your Gentoo machine's packages, but you haven't updated your actual machine in a while. And finally when you do something's broken but you don't know what broke or when? Well something like that happened to me and I want to share the process that I went through to find the root cause.
The past few days I had been at PyCon US 2023
with my laptop. I hadn't gotten around to installing
wireguard so didn't have access to my GBP
instance running at home. Therefore I was not applying updates. After I got
back I did a gbp publish lighthouse
followed by an emerge --update ...
,
rebooted (there was a kernel update) and after my laptop came back up I'd log
in and GNOME Shell would immediately crash. So sad 🙁. I looked at the logs
and it was not obvious to me what was the source of the crash. And since I
had been gone for a few days there were 14 GBP builds and more than 90
packages that have changed. So how do I figure out which package broke my
machine?
Well it turns out this is a variation on "Rolling Back a Rolling Release with
Gentoo Build Publisher". See that article to
learn how to roll back in general. I can combine that methodology with doing a
manual
bisect.
I knew that the last build was broken, and the build just before I left town
was good, I run gbp list lighthouse
and manually bisected and rolled back,
and tested. It turned out that the fourth build on the list was the first
breaking build. I then did a gbp diff
between the third and the fourth build
to see what where the differences in packages. The reason I did not use gbp
status
for that is because that would only show the packages for that build,
and there might be "holes" in the list of builds that gbp status
won't show.
After I got a diff of the packages it was more of a guessing game. There were
a few GNOME-like package builds so any or a combination could have been the
breaking package(s). So what I did was update to the broke GBP build. Then, on
my laptop, manually downgrade a suspicious package. So for example, if I want
to downgrade dev-libs/glib-2.76.2
I would run emerge -1va
<dev-libs/glib-2.76.2
. Since I already have the binary for the older packages
in my binary package cache, this required no compiling. Then I would try
logging in again with the downgraded package. If that did not fix the crash,
I'd upgrade it the package back to the GBP build's version and try again with
the next package. After this I was finally able to reduce the problem package
to net-libs/libsoup-3.4.1
I then masked that version in my GBP repo's
package.mask
, ran a Jenkins build, published it on GBP and then upgraded to
that build on my laptop. After that I was able to log in without crashing.
So after 14 builds and 90+ packages I was able to reduce the problem down to a
single package in a single build. Next steps are to try to find the cause of
the breakage and, if necessary, report a bug. After that gets resolved I can
then remove the entry in package.mask
.
So perhaps there should be a bisect
subcommand to the GBP
cli? Maybe that's not necessary but if I
were sent a pull request for with that feature I might consider it.