Testing and (false) Confidence

It is often said that one of the advantages of testing is that it gives one confidence when refactoring. And I think this is true. However sometimes poorly-written tests or incomplete testing can have the ability to give someone too much confidence.

Take the case of Gentoo Build Publisher. It is fairly well tested as far as coverage is concerned, but some of the tests can use a little refactoring of their own.

Gentoo Build Publisher has the concept of a "build" and data for a build can exist both in a database as well as on a filesystem (storage). Occasionally, for example as part of a purge process, it will delete a build, which means it must delete the record from the database as well as the filesystem storage. Previously I had overridden the .delete() method in the Django ORM model to take care of this.

def delete(self, using=None, keep_parents=False):
    # The reason to call super().delete() before removing the directories is if for
    # some reason super().delete() fails we don't want to delete the directories.
    super().delete(using=using, keep_parents=keep_parents)

    self.storage.delete_build(self.build)

There is also a celery task called purge_build that calls the above .delete() method on builds. I had written a test for purge_build() that simply created the database record, called purge_build() and asserted that the record no longer exists. I did not add an assertion that the filesystem storage was also deleted because at the time I felt that creating a build for testing that had both the database record and filesystem storage was difficult and not worth it and I already had a test that ensured that storage.delete_build() worked.

Then I took upon a great refactor. I decided I did not want the filesystem storage subsystem to know or care about the database and vice versa. So I removed self.storage from the database model and I created a "build manager" class that coordinated build operations between the filesystem, database, and Jenkins so only the manager knows about its subsystems. The build manager, BuildMan has a delete method of its own:

def delete(self):
    """Delete this build"""
    if self.model is not None:
        self.model.delete()

    self.storage_build.delete()

I felt good about the refactor, and confident when doing it because the tests gave me the confidence that everything it was doing before still worked.

Except that they weren't testing everything. Remember that I told you that the test for purge_build() was only testing that the database record got deleted and not the storage? Well that test still only tested half of the requirement and purge_build() was still calling the Django ORM's .delete() method. Except that method no longer deletes the storage.

And then I went on vacation. And at the airport waiting to depart I checked in on my GBP instance and found it had run out of disk space. At the time I deduced it was because I was creating too many builds and that was what was causing the disk to run out of space and not some software bug. I was confident of that. The tests made me very confident of that.

After I got back from vacation I tried to delete some builds through the Django Admin interface but that didn't seem to clear up any space. After a little snooping I then realized that the "deleted" builds still had their filesystem storage. A lot more snooping around and I realized that the purge process was not deleting the filesystem storage. "How could this be?" I thought. "Surely I have a test for this!"

It was then that I found the test and realized my mistake. I "fixed" the test, found it failed, and then fixed the code.

I think there are two lessons learned from my experience. The first is that you should really consider if your tests are "good enough". And whether that effort needed to make it more complete is worth it in the end. In my case I think it was, and the ability to use filesystem storage in tests eventually became easier anyway out of necissity but I never went back to "finish" that test. You should error on the side of a little extra effort is worth it. The second lesson is though tests are made to make you feel confident, you can sometimes feel too confident. Tests aren't the answer for everything. So you should definitely ensure you mix it with other disciplines.

In the end a bug got fixed and my tests are little bit better, so it's all good.