Capturing non-determinism by salting derivations

This is a technique a coworker and I recently discovered, and it solved a problem we were having so elegantly that it’s hard not to write about.

At $WORK, our CI runs benchmarks inside Nix derivations. This works great for only running benchmarks when dependencies change, but there’s an issue: Nix tries really hard to make sure that a derivation is deterministic, i.e. produces the same output no matter where, when, or how it is evaluated. Timing-based benchmark results are inherently at odds with that promise.

For our purposes here, the main problem is that the derivation output now depends on the machine it runs on. When we now want to run a benchmark on a specific computer, we instead get served the cached results from our CI machine, which has different hardware, and therefore gives unrepresentative benchmark results.

Saying “the derivation output is dependent on the machine it is evaluated on” already hints at the solution: we need to make the host machine a dependency of/input to the derivation. To do so, we use builtins.getEnv and --impure to sneak in an environment variable, and make that an argument to mkDerivation:

mkSaltedDerivation = args:
  let
    salt = builtins.getEnv "SALT";
  in
  if salt == ""
  then
    builtins.throw ''
      Called `mkSaltedDerivation`, but no salt was provided. This means either
        - the $SALT environment variable was not set
        - Nix was not run with --impure
    ''
  else stdenv.mkDerivation (args // { inherit salt; });

This way, much like a cryptographic salt, the host machine gets hashed along with the other inputs to the derivation. We now run the benchmarks as follows:

$ export SALT=$(hostname)
$ nix build --impure .#benchmark-reports

And that’s all there is to it. Non-deterministic derivations are a bit of a black art, but there are two clear wins here:

Of course, this technique is more broadly applicable. The salt doesn’t have to be a host name, it can occur in a different place than as an attribute of mkDerivation, and the derivation itself doesn’t have to be a benchmark. As long as there’s a source of non-determinism that we know about beforehand, we can capture it and turn it into an input to the derivation.