Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concerns regarding testability of DID Resolution and Dereferencing #549

Closed
msporny opened this issue Jan 17, 2021 · 14 comments
Closed

Concerns regarding testability of DID Resolution and Dereferencing #549

msporny opened this issue Jan 17, 2021 · 14 comments
Assignees
Labels
pr exists There is an open PR to address this issue

Comments

@msporny
Copy link
Member

msporny commented Jan 17, 2021

A few items of concern:

  • We currently have 153 normative MUST statements in the specification. That's a lot of tests that the group will have to write over the next month or two.
  • 43 of those tests (or roughly 1/3rd of the normative statements in the specification) come from the Resolution and Dereferencing section.
  • We have no conformance class for DID resolution.
  • We say the following in the Resolution section: "This section defines the inputs and outputs of DID resolution and DID URL dereferencing. These functions are defined in an abstract way. Their exact implementation is out of scope for this specification, but some considerations for implementors are discussed in [DID-RESOLUTION]."
  • We have these resolutions:

At this point, we need one or more people to step forward to write the tests for the DID Resolution and Dereferencing section and we need at least two companies that will commit to implement DID resolvers that will pass the tests. If we can't get that sort of commitment (I don't even know how we'd write the tests with our current test suite -- maybe @OR13 has some suggestions), we should mark the section at risk and say that the Resolution and Dereferencing section may be moved into a separate NOTE published by the group.

My recollection is that the group desired to know how the interfaces for resolution and dereferencing would work. I believe the group has achieved that with the current specification and we could safely move Resolution and Dereferencing into a NOTE for publication. This would set the stage for a future DID Resolution WG without having any substantive negative impact on the understanding of how Resolution and Dereferencing are meant to work. Publication as a NOTE would have the added benefit of relieving a good chunk of CR implementation pressure.

So, things we need to know from the group in the next week or two:

  1. Are you planning to implement software that will pass the normative statements related to Resolution and Dereferencing in the specification?
  2. Will you commit to writing a significant portion of the Resolution and Dereferencing tests?
  3. Would you be supportive of or object to moving the Resolution and Dereferencing section into a separate document that will be published by the DID WG as a Working Group NOTE?

It would be good to hear responses from @peacekeeper and @jricher, as well as implementers.

@msporny msporny self-assigned this Jan 17, 2021
@msporny
Copy link
Member Author

msporny commented Jan 17, 2021

On behalf of Digital Bazaar:

Are you planning to implement software that will pass the normative statements related to Resolution and Dereferencing in the specification?

We are not planning to implement software that will pass the normative statements related to Resolution and Dereferencing at this time. We prefer that such work is done in a future DID Resolution WG. We do have DID Resolution software that we have authored called did-io.

Will you commit to writing a significant portion of the Resolution and Dereferencing tests?

We are not committing to writing any Resolution or Dereferencing tests at this time.

Would you be supportive of or object to moving the Resolution and Dereferencing section into a separate document that will be published by the DID WG as a Working Group NOTE?

We would be supportive of moving the sections into a separate document as a NOTE as that will help signal to W3C Membership that a future WG for DID Resolution would be a logical next step and de-risk the Candidate Recommendation process.

@jandrieu
Copy link
Contributor

I've never understood the link between normative statements and testability, as terms like SHOULD, IMO, adequately express obligations that may be untestable, but nevertheless are defined with enough clarity for those obligations to be fulfilled, potentially by human-mediate processes that are fundamentally untestable.

This is especially clear when we have already agreed that DID Core will define an interface ONLY for resolution and dereferencing, while leaving the implementation details to a later spec.

If we can't even define the contract for resolution and dereferencing, then we have, at best, only an illusion of interoperability defined in DID Core.

@iherman
Copy link
Member

iherman commented Jan 18, 2021

I agree with @jandrieu (and that is also a comment on #550). The 'at risk' flag refers to the possibility of putting the whole of §8 in a note, which would make the core document fairly difficult to understand and, I am afraid, would harm acceptance. I would think that, in case we have difficulties to get implementations, the alternative is to mark this section as non-normative, but keep it in the document. It may sound like just a nuance, but I think it is more than that: the reader of this specification would have a much stronger incentive to follow the statement in §8 and, eventually, implementations will follow. A, at first glance, independent separate Note has a lesser weight imho.

@iherman
Copy link
Member

iherman commented Jan 18, 2021

B.t.w. if there are (min 2) implementations that declare that they do the implementations as described in some way or other (the language API is not defined, the whole definition is abstract anyway) then we may accept that as an CR exit criteria per se. There is no obligation to back up everything with explicit, externally executable tests.

The goal of the CR is not to test implementations. The goal of the CR is to prove that the specification is (a) consistent and clear and (b) that all features are implementable (or used when it comes to, say, a vocabulary).

@msporny
Copy link
Member Author

msporny commented Jan 18, 2021

I've never understood the link between normative statements and testability,

Let me try and highlight why machine-testability of statements are an important consideration below.

Over the past decade or so, many of the WGs that I have participated in try to make sure that at least the MUST statements have tests in the test suite and the SHOULD/MAY statements are written in a way that is testable by a machine.

Humans make terrible spec enforcers for at least the following reasons: 1) the ones that know what they're doing are in high demand, 2) some of them are not consistent when language is vague, 3) people are clever and will get around human tests if they want to, and 4) the ones that are involved now tend to move on to other things/retire/die/etc.

This is based on deployment experience where non-testable statements tend to be completely ignored by implementers (because there is no negative outcome for doing so) and they then in effect, do nothing and the real interoperability ends up being around the tests that are written for the test suite.

Making sure normative statements are testable is the easiest path to ensuring interoperability. You can get there through other means, but depending on the good will of implementers is rarely a path to interoperability.

terms like SHOULD, IMO, adequately express obligations that may be untestable, but nevertheless are defined with enough clarity for those obligations to be fulfilled, potentially by human-mediate processes that are fundamentally untestable.

Doing so is asking for trouble. As a concrete example... I feel extremely uneasy when reviewing DID Methods for inclusion into the DID Spec Registries -- the DID WG has put this responsibility largely on @OR13 and my shoulders and I know I'm not doing enough to push back on bad DID Methods. I'm not pushing back hard because the human-enforced rules for getting into the registry are vague, and meeting the minimum bar is really easy. For example, we don't say "You can write total and complete garbage in your Privacy Considerations section, you can lie, you can say things that are fantastically dangerous from a privacy perspective, there is no mention of needing to do a 3rd party privacy audit, etc." -- so, that's what human-enforceability gives you... a protection that is small, vague, and largely unenforceable.

@msporny
Copy link
Member Author

msporny commented Jan 18, 2021

@iherman wrote:

The 'at risk' flag refers to the possibility of putting the whole of §8 in a note, which would make the core document fairly difficult to understand and, I am afraid, would harm acceptance.

I disagree, I don't expect it would harm acceptance any more than marking the entire section as non-normative would. I will also note that having a section that is non-normative that has 40+ normative statements in it is confusing to implementers -- if we're going to do that, we should downgrade all language to be non-normative (which would be worse than publishing a NOTE that contains the normative language and provides guidance).

The goal of the CR is not to test implementations. The goal of the CR is to prove that the specification is (a) consistent and clear and (b) that all features are implementable (or used when it comes to, say, a vocabulary).

While you're technically correct, I'd rather we stay away from spec lawyering and have a higher bar for what our expectations are wrt. implementability of the specification... and now to quote from W3C Process (for the benefit of those that don't know it by heart):

https://www.w3.org/2020/Process-20200915/#transition-cr

6.2.7. Transitioning to Candidate Recommendation
To publish a Candidate Recommendation, in addition to meeting the requirements for advancement a Working Group, or in the case of a candidate Amended Recommendation (a document intended to become an Amended Recommendation), the W3C:
...
must document how adequate implementation experience will be demonstrated,
...
While no exhaustive list of requirements is provided here, when assessing that there is adequate implementation experience the Director will consider (though not be limited to):

is each feature of the current specification implemented, and how is this demonstrated?
...

My expectation is that we will demonstrate that each feature of the current specification is implemented by pointing to tests for each normative statement. Where we can't point to a normative statement, we'll point to people following the normative requirement (e.g., point to at least two DID Methods that contain Privacy Considerations sections and @OR13 and @msporny asking people to put those sections in their DID Method before we'll allow the PR to go through to register their DID Method).

I expect any other normative statement that can't do the things above will have a hard time meeting the "must document how adequate implementation experience will be demonstrated" bar in the W3C Process.

@iherman
Copy link
Member

iherman commented Jan 18, 2021

Just for the sake of arguments:-)

We have, in our specification, five different verification relationships defined normatively as properties. We can (and will) have tests for all of those, in the sense of having tests that check their values, constraints, etc., etc. However, I believe we will also have to show that all those terms are “meaningful”, i.e., that they are used in real implementations (and not only toy implementations, obviously) out there, and that for at least two implementations it makes a difference whether a DID Document uses, say, capabilityInvocation for what is specified in the specification and not something else. Otherwise, if the term is not used, the director may question whether the term should be normatively defined. I doubt that this can be covered by a test alone.

That “usage” is not a matter of (mechanical) tests. It will be based, at the end of the day, on human reporting. And that is all right, because we are not testing the implementation, we are not creating some sort of certification flag for implementation, i.e., there is no incentive for humans to cheat the system. All implementations have a common goal here: to make it sure that the specification is o.k.

This is all to say that a purely test-centric view may not cover all the needs for this spec. Whether this is true for the content of §8 is not clear to me, but we may need some flexibility in how we define our exit criteria.

@msporny
Copy link
Member Author

msporny commented Jan 18, 2021

I believe we will also have to show that all those terms are “meaningful”, i.e., that they are used in real implementations (and not only toy implementations, obviously) out there, and that for at least two implementations it makes a difference whether a DID Document uses, say, capabilityInvocation for what is specified in the specification and not something else. Otherwise, if the term is not used, the director may question whether the term should be normatively defined. I doubt that this can be covered by a test alone.

I believe we can do this for all core properties EXCEPT FOR alsoKnownAs. The same for DID parameters, except perhaps hl and versionId -- that is, there are implementations we can point to today that use those terms. I don't think we can do the same for many of the resolution metadata properties.

there is no incentive for humans to cheat the system

I wish this were true -- everyone has a pet feature that they want in the specification. Some of these pet features don't have multiple working implementations behind them. I would prefer that the group requested demonstrations of working code for every feature by the end of CR as exit criteria. I understand that not everyone wants to set the bar that high.

@iherman
Copy link
Member

iherman commented Jan 19, 2021

Some of these pet features don't have multiple working implementations behind them.

.... then they loose. It is up to us to set the bar even higher: we can require not two, but three or four implementations to use a feature when it comes for human response.

But that is irrelevant as far as this issue is concerned. My only reservation is that the "at risk" note is formulated in a way that, if requirements are not fulfilled, then the whole of chapter §8 is removed from the spec. I would like to be a bit milder and keep the door open for leaving §8 in the spec, albeit as a non-normative section. At this moment, that is all.

@OR13
Copy link
Contributor

OR13 commented Jan 19, 2021

I plan on writing tests for "resolveRepresentation". I consider it a huge failure to have a spec that recognizes 3+ representation formats and then fails to explain (with tests) how they actually function.... Sure "complex" resolution topics can be punted to a future wg, but not the basics associated with the why created an abstract data model.... if we don't test those, we should not have an abstract data model and 3 representations...

it is trivial to provide tests for resolution, you have:

  1. an input IRI / DID URL
  2. a function name (resolveRepresentation)
  3. an output DID Document

normative statements about the relation between the input and fields in the output.

in the case of "dereferencing"... its also trivial to test....

here is a DENO package manager that uses DID Documents, and service and relativeRef...

https://github.com/OR13/deno-did-pm

What we have in the did core spec should be well covered by tests, or moved to another document.

@msporny
Copy link
Member Author

msporny commented Jan 19, 2021

it is trivial to provide tests for resolution

Yes, for a concrete function/API. We don't have a concrete API -- we have abstract functions. You can't test abstract functions unless you make them concrete. If the group wants to define a concrete interface, that's fine, but some might argue that doing so is going to far. We had specifically mentioned that implementation of the abstract functions are up to another specification. Now we're saying that we're going to define the abstract functions (and, again -- people might object to that). I know that I wasn't expecting us to write concrete tests for resolution/dereferencing -- I'm not going to object, but this is exactly what I was concerned about happening when we started going down this slippery slope.

We're at a point now where we need someone to write the tests... and we need two entities to write DID resolver implementations. It's basically going to come down to that. If we get that, we're going to have to change the spec to normatively state that these functions are no longer abstract -- they're concrete, and I'm not sure that the group agreed to go that far.

What we have in the did core spec should be well covered by tests, or moved to another document.

Agreed -- we need a volunteer to write those tests and take it through the CR process. We need two organizations that are going to volunteer to write implementations. I hesitate to put the burden on @OR13 given everything else that's on his plate. If this is not done and we don't have at least two interoperable implementations (and we don't have the warning), we will have to go through CR again (and we can only do that 2-3 times before the axe comes down on the group).

@msporny
Copy link
Member Author

msporny commented Jan 24, 2021

PR #550 has been merged, which addresses this issue by marking resolution at risk and providing multiple ways in which the WG could address this concern during CR.

If we want to test the section, we have to specify a concrete interface that can be tested. I'll put in a PR to do that as well.

@msporny
Copy link
Member Author

msporny commented Feb 1, 2021

PR #587 #591 #592 and #601 have been raised to address this issue. This issue will be closed once one of those PRs is merged.

@msporny
Copy link
Member Author

msporny commented Feb 11, 2021

PR #601 has been merged, closing.

@msporny msporny closed this as completed Feb 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr exists There is an open PR to address this issue
Projects
None yet
Development

No branches or pull requests

4 participants