Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__unsafeDiscardOutputDependency used with exportReferencesGraph does not work #7330

Open
Ericson2314 opened this issue Nov 21, 2022 · 2 comments · May be fixed by #7339
Open

__unsafeDiscardOutputDependency used with exportReferencesGraph does not work #7330

Ericson2314 opened this issue Nov 21, 2022 · 2 comments · May be fixed by #7339
Labels

Comments

@Ericson2314
Copy link
Member

Ericson2314 commented Nov 21, 2022

Describe the bug

__unsafeDiscardOutputDependency used with exportReferencesGraph cause a build failure when the output paths of the drv File used with cannot be found in the sandbox because they are not yet built.

Steps To Reproduce

TODO make exact example; it can be used as a test.

Expected behavior

drv outputs are ignored, only the drv file itself matters.

Additional context

  1. __unsafeDiscardOutputDependency is not unsafe at all, it is just used to make a derivation depend on a drv file itself and not its outputs. (Just depending on the drv itself should probably be the default, but isn't for historical reasons.) How does it work? It removes any = the string context starts with. Implemented in

    static void prim_unsafeDiscardOutputDependency(EvalState & state, const PosIdx pos, Value * * args, Value & v)
    {
    PathSet context;
    auto s = state.coerceToString(pos, *args[0], context);
    PathSet context2;
    for (auto & p : context)
    context2.insert(p.at(0) == '=' ? std::string(p, 1) : p);
    v.mkString(*s, context2);
    }

  2. In

    nix/src/libexpr/primops.cc

    Lines 1202 to 1211 in 62960f3

    if (path.at(0) == '=') {
    /* !!! This doesn't work if readOnlyMode is set. */
    StorePathSet refs;
    state.store->computeFSClosure(state.store->parseStorePath(std::string_view(path).substr(1)), refs);
    for (auto & j : refs) {
    drv.inputSrcs.insert(j);
    if (j.isDerivation())
    drv.inputDrvs[j] = state.store->readDerivation(j).outputNames();
    }
    }
    the = is handled. It causes the newly-built derivation to have far more dependencies -- in addition to the drv file itself, all its drv file dependencies and all their outputs are depended upon.

  3. exportReferencesGraph is implemented in large part by Store::exportReferences in

    for (auto & j : paths2) {
    if (j.isDerivation()) {
    Derivation drv = derivationFromPath(j);
    for (auto & k : drv.outputsAndOptPaths(*this)) {
    if (!k.second.second)
    /* FIXME: I am confused why we are calling
    `computeFSClosure` on the output path, rather than
    derivation itself. That doesn't seem right to me, so I
    won't try to implemented this for CA derivations. */
    throw UnimplementedError("exportReferences on CA derivations is not yet implemented");
    computeFSClosure(*k.second.second, paths);
    }
    }
    }
    . If a derivation is encountered, it also adds its outputs to the closure.

It seems that the export references code is wrong:

  1. It is assuming all drv paths got the = treatmeant but this is not necessarily the case. For ones that didn't (using __unsafeDiscoardOutputDependency), we cannot start processing their outputs because those are not necessarily dependencies of the derivation.

  2. The code is unnecessary, because the treatment of = in the evaluator will already ensure all those paths are in inputPaths/inputDrvs, and

    /* TODO (impure derivations-induced tech debt):
    Tracking input derivation outputs statefully through the
    goals is error prone and has led to bugs.
    For a robust nix, we need to move towards the `else` branch,
    which does not rely on goal state to match up with the
    reality of the store, which is our real source of truth.
    However, the impure derivations feature still relies on this
    fragile way of doing things, because its builds do not have
    a representation in the store, which is a usability problem
    in itself */
    if (auto outPath = get(inputDrvOutputs, { depDrvPath, j })) {
    worker.store.computeFSClosure(*outPath, inputPaths);
    }
    else {
    auto outMap = worker.evalStore.queryDerivationOutputMap(depDrvPath);
    auto outMapPath = outMap.find(j);
    if (outMapPath == outMap.end()) {
    throw Error(
    "derivation '%s' requires non-existent output '%s' from input derivation '%s'",
    worker.store.printStorePath(drvPath), j, worker.store.printStorePath(depDrvPath));
    }
    worker.store.computeFSClosure(outMapPath->second, inputPaths);
    }
    }
    }
    }
    /* Second, the input sources. */
    worker.store.computeFSClosure(drv->inputSrcs, inputPaths);
    ensures that the closure of those is already in inputPaths

As such we should be able to do this:

  1. Write test confirming the current failure in tests/export-graph.nix. See there is one for a drvPath without __unsafeDiscardOutputDependency aready.

  2. Delete the code in Store::exportReferences that seems pointless, and confirm the bug is fixed.

See 2897286 for what I think introduced the bug.

@Ericson2314
Copy link
Member Author

Ericson2314 commented Jan 24, 2023

I was wrong about this. The code in Store::exportReferences is not redundant, because it is not expanding the ambient closure (inputPaths) but just checking that the new closure is within that closure (i.e. that exportReferences does not allow one to gain access to new dependencies).

Instead, as I think @edolstra was saying, we probably need to reconsider 2897286 or otherwise disambiguate these cases.

@roberth
Copy link
Member

roberth commented Oct 11, 2023

So exportReferencesGraph calls Store::exportReferences, which in its second pass treats all derivation references as DrvDeep.
Unfortunately, whether a reference was meant to be "deep" or not has gradually been lost.
Specifically, it is not recorded in the database or narinfo, but even before then, in the .drv itself, the info is degraded, as deepness has the effect of

  • adding the whole closure to inputDrvs
  • adding it to inputSrcs as well

I can see two reproducibility-preserving ways forward. I'm leaning towards (b).

a. Extend data model, conserve interface

This assumes that we want to keep the user interface the same as it is today. We have two kinds of derivation string contexts that can be converted back and forth (given addDerivationOutputDependency, TBD, addition).

In order to preserve composition, the boolean decision to be deep or not should be local, and it should be preserved as long as necessary. This will involve opt-in improvements to the derivation format, and new fields in the references database as well as in narinfo.

For instance a derivation variable ("accurateDerivationReferences","1") could trigger the automatic creation of ("deepDerivationReferences","/nix/store/one.drv /nix/store/two.drv"). Whether this replaces or supplements the existing behavior could be discussed. It should also trigger the automatic creation of a requiredFeatures feature, although that does require manual intervention by everyone with remote builders...

All in all this requires a lot of change throughout the data model, but the user interface remains mostly unchanged. Certainly if Nixpkgs enables accurateDerivationReferences by default.

b. Soft-deprecate DrvDeep and fix exportReferencesGraph by opt-in

This does not require changes to the data model.
The crux is that DrvDeep is not necessary when you can specify this behavior at exportReferencesGraph. The composition argument is not a big deal if we accept that the deep vs non-deep choice is always made in a derivation that uses exportReferencesGraph. This seems to be a reasonable assumption, as the main (only?) use case is for image building, which tends to already use this feature, because it's the only way to build a store path database along with the contents.

Concretely, add a new derivation variable discardDerivationOutputDependency, to be used in conjunction with exportReferencesGraph, which when set to 1 disables the implicit reinterpretation as DrvDeep, in favor of reinterpreting as a regular reference. Note that by making a reference to a DrvDeep drvPath already adds the whole closure as direct inputDrvs, so the outputs are still added, pretty much as it would with the current behavior. (any caveats?)

This change seems preferable, because we don't have to deal with changes all over the (public!) data model, while we don't seem to lose expressiveness or compatibility with existing expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants