Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we nuance our mental model on DID control slightly? #233

Closed
dhh1128 opened this issue Mar 18, 2020 · 85 comments
Closed

Can we nuance our mental model on DID control slightly? #233

dhh1128 opened this issue Mar 18, 2020 · 85 comments
Labels
pending close Issue will be closed shortly if no objections

Comments

@dhh1128
Copy link
Contributor

dhh1128 commented Mar 18, 2020

PR #213 has generated an interesting comment stream, and I think some useful clarity. I am happy to have multiple smart people agree in writing to the concept that a DID can identify anything, because this flexibility seemed to have been excluded by some verbiage I was hearing.

Now I'd like to explore a subtlety around the concept of control. I will frame this in terms of a use case that I'm familiar with in cybersecurity and malware research, but I think you'll quickly see how it might apply to use cases brought up by others.

Malware researchers typically identify malware (viruses, worms, infected or malicious files) by a sha256 hash. The first time a particular sample is seen in the wild, a researcher hashes the sample and goes to virustotal.com or some similar site to see if anybody else has seen it before. If no, the sample is uploaded to the site's DB for all the world to look at. If it is already known, then the researcher has just made a second (or a third, or a tenth) independent discovery.

Now, suppose I wrote a DID method that was all about identifying malware with DIDs. The logical identifier format would be did:mymethod:hash-of-sample. With me so far?

Okay, now what are the control semantics?

What I have heard so far is that DIDs are always created by a controller, who can then (even in the genesis DID doc) choose to retain control or give it away (e.g., by specifying no control after the creation transaction). This makes sense for many situations.

However, that doesn't quite fit this scenario, because A) the researcher who reports the malware is never, at any time, in a "control" relationship with the sample's identifier, and would not want to be considered so; B) the identifier cannot have control semantics, even at its genesis transaction, because its derivation mechanism disallows it; C) the identifier doesn't have a DID doc. What's being identified here is content that exists, that is explicitly uncontrollable to begin with. Anybody who discovers the content will discover the same identifier. Two researchers could register the same content on two different systems of record and both would be equally valid and not in conflict.

So my question is this:

Would we be comfortable saying that DIDs can be used to identify such things, too? And if yes (which I hope is an easy answer), are we willing to not describe such a scenario as "the controller creates the DID" but rather "the DID identifies something inherently uncontrollable, so it never has a controller, even during creation; rather, it has a discoverer" (or something to that effect)?

@dhh1128
Copy link
Contributor Author

dhh1128 commented Mar 18, 2020

Tagging @ewelton and @mitfik and @swcurran

@ewelton
Copy link

ewelton commented Mar 18, 2020

This would impact issue #122 as well - in fact, it would force the selection of 1b and 2a.

@jandrieu
Copy link
Contributor

jandrieu commented Mar 18, 2020

There is always a controller. So 1b doesn't work.

At the bare minimum, whoever created the DID is the controller. It does not imply they are, in any way, a controller of the DID Subject. All it means is that they controlled the initial DID Document, and presumably--depending on the method--retain the ability to further modify the document.

In the malware use case, I believe a better way to model that is that the initial reporter generates a DID and issues a credential saying that malware with a given hash has been given such and such a DID, perhaps with other corroborating claims in that credential. No control relationship needs to be established between the DID Controller and the DID Subject. But no matter what the relationship between Subject and Controller are, there is ALWAYS a Controller, whether or not there is a controller property.

Alternatively, the malware discoverer could just issue a credential with that hash, without using DIDs. I'm not sure DIDs buy the use case much.

@dhh1128
Copy link
Contributor Author

dhh1128 commented Mar 18, 2020

@jandrieu : Let me challenge "there is always a controller" a slightly different way. If a DID doc is created (said differently, if any metadata is associated with a DID), then I agree that there is always a controller, at least at the outset. Whoever creates the DID doc (chooses the metadata) is the controller. They can make decisions about whether to continue being the controller, or to disclaim control over the content by removing control methods. Furthermore, I agree that all our thinking up until now has assumed that DIDs and DID docs are inseparable concepts, because <assumption>of course we want metadata</assumption>.

But what I'm suggesting is a scenario where an identifier needs to be created (perhaps better said, it gets discovered) without a DID doc--zero metadata--from the get-go. Nobody is allowed to define any metadata about the identifier -- even at the outset. The need here is a pure identifier that has the decentralization characteristic of DIDs, but not the resolution characteristic. It's almost like a hashlink, except it makes no claim about location (or any other properties), only about existence. What is known about malware (metadata) could vary in thousands of different ways, and be stored in databases all over the world, and nobody intends to be an authoritative source for any of it. They just want to agree that they're talking about the same thing. Full stop. And since the mechanism for generating the identifier derives it from the subject, there can never be any controller making decisions, by definition. Uncontrollable things exist, and we identify them. Since there is no resolution, there is no controller. That's what the controller controls--the DID doc/resolution, n'est-ce pas?

Now, you could say, "No. We must have a DID doc. 99.999% of DIDs are worthless without it. For the weird corner case you're bringing up, if it really exists at all, just keep the convention and live with the weirdness." That would mean that we analyze the first person to create the guaranteed-empty DID doc for malware X as its "controller" for the purposes of the DID ecosystem. But two researchers could discover it independently, with no way of proving who was first. So we have a theoretical controller role that is unavoidably ambiguous, just because we want to keep the concept of controller. If we instead say, "Yep, there's cases where something exists but its metadata is not controlled, and DIDs can point to them. In such cases, it becomes impossible to create a DID doc (because if you do that, by definition you're exercising control), but it's still a sort of DID because it's a decentralized identifier" then we get to broaden the conceptual tent of DIDs a bit.

@ewelton
Copy link

ewelton commented Mar 18, 2020

It is not clear to me yet how this interacts with methods - perhaps some methods are capable of representing passive, persistent objects, and other methods are not - because the owner of a set of keys may always be able to update "the DID document".

On the other hand - they can't update the did itself once it is minted - and that is what matters. So the controller (of the registration) might only be able to update information about the subject's registration record.

Another thing that I worry about is that a did:<method>:<data> model - if the <data> is related to the genesis key pair, then that method might not be capable of representing the "virus hash" you described above. The virus hash could only be represented as an assertion (or a VC)

In other words, for methods where the <data> part of the DID is derived from the genesis key pair, then the document belongs to the discoverer and the ability to represent an arbitrary thing, in a self-certifying manner, is simply not possible in that method. Those methods are constrained to represent "loci of control", which is an undeniably critical group - and the one which has come to dominate our thinking and discussion of DIDs.

This is especially apparent in the context of "verification methods" (i.e. #190 ) when combined with the new abstract-data-model/registry approach. The ADM/Registry model forces a "union of all possible realities" model and results in very complicated modeling. For example, we will need to put this sort of information in the registry:

  • capabilityDelegation field, if present, means <x>, but it MUST not be present when the method supports non-key hash registration in the data component of the DID and the data component of the DID is not directly related to the control mechanics of the underlying method. If, the method does not support arbitrary hash registration, then the capabilityDelegation field MAY be used, subject to the definition above.
  • assertionMethod field, if present, means <x>, but it MUST not be present, when the method supports non-key hash registration in the data component of the DID and the data component of the DID is not directly related to the control mechanics of the underlying method. If, the method does not support arbitrary hash registration, then the assertionMethod field MAY be used, subject to the definition above.

or the ADM/Registry needs structure like

  • for methods which allow arbitrary hash registration and for those DIDs which do not correspond to genesis keypairs (e.g. method1, method2, etc.), then
    • capabilityDelegation field MUST NOT be used
    • assertionMethod field MUST NOT be used
  • for methods which allow only DIDs derived from genesis key pairs
    • capabilityDelegation field is OPTIONAL and means <x>
    • assertionMethod field is OPTIONAL and means <x>

use of an @context field simplifies this substantially by allowing a DID-document to declare the semantic which applies - as in "This subject represents a locus of control" or "This subject was discovered and represents an external entity" - and, of course, depending upon the method - it may or may not be possible to render the DID document formally immutable - in which case, an actor capable of updating the DID document could "morph" the "sort of thing" the DID represents.

What this suggests to me - in answer to @dhh1128 's question

Would we be comfortable saying that DIDs can be used to identify such things, too? And if yes (which I hope is an easy answer), are we willing to not describe such a scenario as "the controller creates the DID" but rather "the DID identifies something inherently uncontrollable, so it never has a controller, even during creation; rather, it has a discoverer" (or something to that effect)?

Is that it is far from clear whether or not DIDs are suitable as generic identifiers for self-certified content. Perhaps DIDs are always and only statements by actors, about people, organizations, and things - which means

Alternatively we could move the semantics partially into methods. Perhaps we could have did-core define a set of "classes" of DIDs - each with it's own ADM/Registry/@context and let methods subscribe to them somehow - perhaps with an @class attribute which names the appropriate did-core semantic model. If we did that we could possibly

The most radical suggestion would be to step out of the battle altogether, and give DID-documents a sort of "sovereignty" and let them announce what they are and how to process them using some sort of attribute that identified and advertised feature and property sets. The proposed attribute would let the creator of the DID-document assert things like

  • this DID represents a locus of control
  • this DID is an actor/agent's description of a context
  • this DID represents a discovered digital artifact as not controlled

and on and on - at the discretion of the environment and suitable to the needs of adopters.

We could even say "if a DID-document says nothing, then it is assumed to follow the rules in did-core" and provide a fallback Abstract Data Model that clearly defines what it ought to be.

@ewelton
Copy link

ewelton commented Mar 18, 2020

@jandrieu re: 1b - i believe 1b is specific to the controller attribute in the DID-Doc, not the qualitative ability to control the DID-doc, simply the explicit representation of it in the DID-doc.

if there is always a controller, then the hypothesis that starts this is not possible. DIDs can not represent immutable content, they can only represent loci of control - and as such they can not really refer to things - they can only refer to the controllers name for things.

In other words - "The Moon" can not be the subject of a DID I create, "What Eric Thinks of as The Moon" is a proper scope, but "The Moon" is not.

@dhh1128
Copy link
Contributor Author

dhh1128 commented Mar 18, 2020

Quoting Joe from issue #122 :

DID Controller is a functional definition. Any entity that can actually control the DID Document is a controller.

So if my theoretical DID method exists, in which it's impossible to place any metadata in the DID Document, or to create one at all, that would imply that there cannot be a controller, because nobody can perform the function that satisfies the definition.

This begs the question, of course, is the DID method that I posited allowed to exist? I can think of many use cases for it. It's highly decentralized (would score great on many rubrics), but by its lack of resolution support, it is definitely an odd duck.

@jandrieu
Copy link
Contributor

What you are talking about is not a DID. It's just an identifier.

Obviously, there is still a discussion going on about what constituted meta-data. And, to my mind, I want ALL meta-data out of the DID Document. What needs to be in the DID Document is the cryptographic material for secure interaction (everything else is meta). In some cases, that material can be deterministically derived from the DID itself, like with did:key, in which case resolving the DID is how you transform the raw DID into the DID Document.

I think a big part of what's happening right now is people wanting to do EVERYTHING with DIDs, and I agree DIDs can refer to ANY subject. But that doesn't mean they are the right tool for every single identifier use case nor is it appropriate to pollute the core spec to support convenience features. They can be addressed in DID-AWESOME instead of DID-CORE.

If your identifier is most appropriately generated by hashing the object, GREAT. Just use that as an identifier. No DID required.

The fundamentally topological shift in DIDs over other forms of identifiers, including cryptographically verifiable ones like public keys, is the level of indirection between the DID and the cryptographic material, allowing for appropriate maintenance like rotation without invalidating the DID and auditing of transitions in material over the lifetime of the DID. Without that level of indirection, which is the fundamental link between DIDs and DID Documents, then you don't have DIDs, you just have an identifier.

@jandrieu
Copy link
Contributor

@ewelton wrote

In other words - "The Moon" can not be the subject of a DID I create, "What Eric Thinks of as The Moon" is a proper scope, but "The Moon" is not.

That's all it ever could be.

The singular notion of "The Moon" doesn't exist. That is just what English speaking people, aka Eric, sometimes use to refer to the Earth's natural satellite. Other people use other terms.

This is the fundamental shift that VCs gaurantee. All you can ever say are statements that "some issuer asserts some 'fact'", which is exactly the structure above. This is epistemologically rigorous. Imagining that "The Moon" is, in absolute knowable truth, the subject of a given DID is not. In order for such a statement to exist, we would first have to rigorously understand what "The Moon" really means to you. Then what it really means to me. Then we might be able to convince ourselves that we are talking about the same thing.

It's the same with DIDs. The only way to know if the subject is what you think it is (unless you are the controller) is to gather enough assertions about that DID to convince you of what the Subject is. And EVEN then, all you have done is convince yourself.

Reality is fundamentally unknowable. All we can do is invest resources convincing ourselves of enough shared agreement to interact reasonably.

So, this isn't about a search for Truth with a capital "T". That's a fools errand. Rather, DIDs are a rigorous mechanism to establish cryptographically secured interactions with an arbitrary Subject. Figuring out what that Subject is or is not happens at another layer, including the mechanisms that embody what it means to "interact" with the Subject.

@ewelton
Copy link

ewelton commented Mar 18, 2020

@jandrieu I believe there is more to it than 'just an identifier' - it is more than a UUID, because it is linked to the thing itself. It is suitable only to 'hashable' objects, and not physical objects. You can't hash a tree, you can't hash the moon - and, you can argue that you can not refer directly to "the moon" - there is a huge tradition in philosophical semantics about exactly this - and DIDs, in a sense are taking a deep philosophical stance.

So far - what seems like it works is this:
1 - DIDs can not be used to identify digital content in a shared namespace
2 - DIDs can not refer to things
3 - DIDs can be a specific actor/agent's name for a thing

In a sense, it does not matter where this falls - just as long as it falls somewhere and leads to clear and precise (and simple) language. So "the subject is the king of england" for example, would not be quite optimal "actor-x's name for the king of england is did:123" would be the right way to say it.

@jandrieu
Copy link
Contributor

jandrieu commented Mar 18, 2020

So "the subject is the king of england" for example, would not be quite optimal "actor-x's name for the king of england is did:123" would be the right way to say it.

Yes. That's what DIDs always say. But since we ALSO don't know who the Controller is, the statement "Controller's name for a thing is XYZ" is rigorously restatable as "A thing is the subject of DID XYZ"

The assertion that DID XYZ refers to the King of England goes in a VC if you want it to be rigorous, in which case you get the lovely construct that "Issuer ABC says DID XYZ is the King of England".

@ewelton
Copy link

ewelton commented Mar 18, 2020

@jandrieu Nah, I don't quite agree with that. I would agree that saying "A thing is the subject of DID XYZ", while technically rigorous, leads to exactly the sort of miscommunication the community has been having.

I'm not sure I follow the VC comment. Who is ABC and how is ABC related to the construct?

What I'm trying to get to is making it clear, in everyday language, so that it is always apparent that A thing might have dozens of DIDs, because DIDs are "scoped" by controllers - and DIDs can not always serve as points of coordination in a discussion.

What we want for the VC case, and what is being discussed here, is that - given the limitations of DIDs and the incorrect statements about their scope for the last few years - is a new form of identifier that can be shared by communities, and around which we can clearly say "The controller of DID XYZ says DID XYZ refers to N" and "The controller of DID ABC says DID ABC refers to N" and then let DID ABC and DID XYZ rest happy that they are talking about the same N, so that they can have fruitful discussions about attributes of N, such as "cn=King of England" vs. "cn=King of Great Britain"

@dhh1128
Copy link
Contributor Author

dhh1128 commented Mar 18, 2020

Okay. I think Joe's given a succinct articulation of a position on the proper scope of DIDs. Thank you, Joe. I love the crispness.

I would like to ask for two things to resolve this issue:

  1. A survey of the group to see if they agree with Joe's rule of thumb.
  2. Assuming yes, a new PR against the DID spec that summarizes the thinking, so future readers of the spec don't wonder whether DIDs apply to their use case. (I am happy to volunteer to raise such a PR, contingent on Fragment identifier semantics are independent of URI scheme #1 and on the rest of my comment. Or someone else can.)

Before we poll the group, however, I would like to offer an alternative formulation to Joe's. I don't know if I can be as crisp as he was, but I'm going to try. Going into this, let me acknowledge that the following is heresy, according to the spec; I'm only articulating it because I wonder if we're missing an opportunity here, if we could let go of tightly held notions a little. Here's the alternative worldview:


Lots of identifier schemes already exist. They have various properties. DIDs are unique in that they accomplish ALL of the following goals simultaneously:

  1. Decentralize: allow identifiers to be created by anyone, without permission or coordination.
  2. Eliminate ambiguity: make the referent of the identifier completely uncontroversial.
  3. Provide extensibility: define a methodology whereby new subcategories can be defined without a central authority, yet guarantee that common processing remains viable.

UUIDs accomplish goal 1, but not goal 2 or 3. A given UUID can mean anything, to anybody. Fred can create it, and Jill can repurpose it. They can argue about who's right, or whether they're both right. There is no strong binding to anything in particular. Most decentralized identifiers (e.g., the names of newly born children) are similar.

IP addresses accomplish goal 2, and sometimes goal 3, but not goal 1. Most centralized systems (twitter handles, phone numbers, domain names) are similar.

DIDs accomplish goal 1 in lots of clever ways that I won't go into here.

DIDs accomplish goal 2 in one of these ways:

a. They use cryptography to bind the identifier to a controller. The controller then defines what the identifier refers to. This was the original use case for DIDs, and the one we've thought about the most.

b. They define some other intrinsic property that is objectively observable, that derives the value of the identifier, such that it is impossible for the binding to be ambiguous. A DID that identifies each element in the periodic table by its atomic number would eliminate ambiguity without having cryptographic control, while still remaining decentralized, and while still being enough of a DID to be processed by DID handlers.

Notice that in this formulation, cryptographic control is a means to an end (eliminating ambiguity), not an end in and of itself. Notice also that cryptographic control is just a special case of the other approach (objectively observable property that makes the binding unambiguous). I think that's the crux of the difference between this worldview and the other one.

DIDs accomplish goal 3 through the use of the DID method extension mechanism.


Now that I've articulated an alternate worldview, here's the argument I'd offer in its favor: Although the world needs control-based binding for DIDs in the worst way, it also needs the other kind of binding (which I might call inherent binding). Both bindings are worthy of the moniker "decentralized identifier." UUIDs are not a good alternative because they lack the solution for ambiguity. URLs are not a good alternative because they lack decentralization of domain names. If we force the conception of DIDs to be narrow, we're setting ourselves up for a situation where another type of decentralized identifier comes along that has just as much claim to the word "decentralized", but that thinks about control differently. Result = muddiness and doubt about adoption. If we bring this ugly stepchild into the DID tent and let it take a bath, I suspect it will turn out to be cute and a good family member, in time. I don't think it would take much more than 2 or 3 paragraphs to talk about "uncontrolled decentralized identifiers" in the spec; they're way simpler than the controlled variant.

@dhh1128
Copy link
Contributor Author

dhh1128 commented Mar 18, 2020

Tagging a few people who may have opinions about this interesting conversation: @peacekeeper @dlongley @msporny @burnburn @brentzundel @talltree . Please bring in others as appropriate.

@ewelton
Copy link

ewelton commented Mar 18, 2020

@jandrieu I think i passed over #233 (comment) while I was writing my response. and to @dhh1128 's

Now that I've articulated an alternate worldview, here's the argument I'd offer in its favor: Although the world needs control-based binding for DIDs in the worst way, it also needs the other kind of binding. UUIDs are not a good alternative because they lack the solution for ambiguity. URLs are not a good alternative because they lack decentralization of domain names. If we force the conception of DIDs to be narrow, we're setting ourselves up for a situation where another type of decentralized identifier comes along that has just as much claim to the word "decentralized", but that thinks about control differently. Result = muddiness and doubt about adoption. If we bring this ugly stepchild into the DID tent and let it take a bath, I suspect it will turn out to be cute and a good family member, in time. I don't think it would take much more than 2 or 3 paragraphs to talk about "uncontrolled decentralized identifiers" in the spec; they're way simpler than the controlled variant.

I think this is exactly right it was what I was trying to capture with, what we want

is a new form of identifier that can be shared by communities, and around which we can clearly say "The controller of DID XYZ says DID XYZ refers to N" and "The controller of DID ABC says DID ABC refers to N" and then let DID ABC and DID XYZ rest happy that they are talking about the same N, so that they can have fruitful discussions about attributes of N, such as "cn=King of England" vs. "cn=King of Great Britain"

in other words - there is a missing piece to the puzzle. DID's are not necessarily up to the task, unless there is some tweaking to the spec - some core, fundamental tweaking and clarity.

So far, attempts to discuss the missing puzzle piece get blocked by discussion of controllers, subjects, and very obtuse technical issues. Those discussions have cut off the forest and the larger view has been lost. I like the idea of "bringing it into the DID tent, and giving it a bath"

@jandrieu
Copy link
Contributor

DIDs don't solve #2

In fact, I don't think #2 is possible in any construction. We can only clarify the DID and when we refer to the DID we can use an unambiguous string of characters.

However, any statements can get attached to that identifier, by any author, and there is no way to know--at the DID level--which statement is "correct". Even if one of the statements is signed by the Controller, you can't be certain that it is "correct". Heck, you can't even prove the controller is the Subject.

What you are bumping up against is essentially Goedel's incompleteness theorem. You can't disambiguate everything. There will always be statements that cannot be proven, no matter how convoluted our schemes may be.

All we can do is anchor assertions by specific issuers to understand (and document) what they are willing to assert about a Subject, as identified by a DID. Statements about the same DID can be taken to be intended as statements about the same Subject, but even then the statements themselves may be wrong.

Content-based hashes of arbitrary content are NOT DIDs because they cannot be resolved directly to some form of cryptographic material. You could, of course, create an IPFS DID Document and have a DID method that uses its content-based address, but that hash is of the DID Document, not of the resource.

IMO, if we are going to get closure on this spec, we need to stop trying to add everything that seems like it might be convenient, and we need to stop trying to construct crazy edge cases--ESPECIALLY if you have no use cases for it (as you put it @dhh).

Maybe others with more experience in standards development can chime in. I know that VCs almost didn't get done because of mid-process shifts to support ZKPs. The consensus was that was a good thing. But it still risked finishing within the required deadline. Kitchen sink engineering a solution that solves everyone's problems is, IMO, an anti-pattern in a standardization process.

We need to be here locking down the simplest feature set for maximum interoperability to do the fundamental thing that DIDs do: enable cryptographically robust management of identifiers without reliance on central registry entities to keep track of who controls what. EVERYTHING else is superfluous and deserves a critical evaluation about whether or not we can remove it and still achieve the fundamental requirement of this work. EVERY add-on is another lengthy drawn out debate, additional implementation complexity, and yet another point of confusion for anyone who wants to adopt the tech. So, let's stop with the add-ons and start focusing on what we can do to minimize the complexity rather than exploring how we can extend DIDs to do extra magic. If DIDs can do that magic, it is perfectly fine to add that at another layer or in the next iteration of the spec.

@ewelton
Copy link

ewelton commented Mar 18, 2020

@jandrieu would you then be backing this

1 - DIDs can not be used to identify digital content in a shared namespace
2 - but allow #199 (comment)
3 - clarify that DIDs can not refer to things, only a specific actor/agent's name for a thing
4 - summarize the structure as #199 (comment)

does that seem right?

@jandrieu
Copy link
Contributor

Um... no.

DIDs can identify ANYTHING. I've said this before, so I'm surprised you'd suggest I'd back that set of statements.

@jandrieu
Copy link
Contributor

My particular point here is that the are mathematical guarantees we can affirm with DIDs. That's what the cryptography gives us. Anything more than that which we can mathematically guarantee should be achieved at another level.

@ewelton
Copy link

ewelton commented Mar 18, 2020

@jandrieu ok, so it seems like we're stuck.

It may not be possible to discuss DIDs.

Either a DID subject refers to ANYTHING and NOT a name for a thing scoped by a controller. But I have a feeling that if I say that it represents ANYTHING then you will say that it is scoped by a controller. I am getting dizzy.

If I was trying to describe DIDs to clients and customers (which i have stopped doing by the way) I need to be able to say something - if I say that "the subject is the King of England" to them without clarifying that there is a controller involved, they get the wrong idea. So I try to say "the subject is scoped by the controller" and then you say "no, I am suprised you said that" - I really am totally at a loss.

A DId subject is both scoped by a controller and not scoped by a controller and it is sometimes anything and sometimes restricted. I just don't get which set of constraints are in play - other than jsut not what anyone else is saying.

@dhh1128
Copy link
Contributor Author

dhh1128 commented Mar 19, 2020

DIDs don't solve #2...any statements can get attached to that identifier, by any author, and there is no way to know--at the DID level--which statement is "correct"... Even if one of the statements is signed by the Controller, you can't be certain that it is "correct". What you are bumping up against is essentially Goedel's incompleteness theorem. You can't disambiguate everything. There will always be statements that cannot be proven, no matter how convoluted our schemes may be.

Perhaps you read #2 a bit too fast?

I'm not interested in proving the correctness of arbitrary statements about an identifier. I agree that anybody can claim any attributes they want about anything, and that it's not useful/desirable for DIDs to facilitate that. In fact, the example scheme I proposed explicitly precludes the association of any statements with the identifier other than existence/scope of reference (the subject). I'm saying that it's a defining characteristic of DIDs that they prove the correctness of exactly one type of statement, which is an assertion about scope of reference -- and I'm claiming that is a generalization of the variant you like, which is scope as proved by cryptographic evidence. Control is only interesting as a mechanism of achieving the real goal, which is knowing with confidence what you're talking about. Your own verbiage "Even if one of the statements is signed by the Controller" presupposes that it's possible to ascertain truth about this subtopic; signing is just the mechanism for proving that the scope of reference is what the Controller, not some other entity, asserts. I think this is exactly what you meant when you said the DID subject can't be the moon, but can be what the controller thinks of as the moon.

While it is true that eliminating all ambiguity is impossible, and on a philosophical level, we can't even prove that we exist rather than being figments of one another's imaginations, I am very surprised to hear anybody claim that DIDs don't provide practical clarity about what the referent is. Elsewhere you have claimed that the referent is whatever the controller wants it to be. That's an unambiguous binding. Yes, it can change. Yes, the controller can do a lousy or inconsistent job of definition. But the fact remains that whatever scope of reference is embodied in the controller's choices constitute exactly and uncontroversially the referent for a DID at a point in time, if the binding is based on cryptographic control.

Maybe others with more experience in standards development can chime in. I know that VCs almost didn't get done because of mid-process shifts to support ZKPs

I agree that bringing this up and tackling it is a tradeoff. Eric is not alone in believing that if we don't broaden our conception, important use cases are lost. But that could be the right answer, and I would accept it if it's the will of the community (even though I continue to disagree with your other argument). So I, too, am curious to hear how other people would weigh it.

@jandrieu
Copy link
Contributor

@ewelton I don't think we are stuck. We are just dealing with the fundamentals of what is knowable and what is provable. As such, we bump into issues of epistemology and Goedel's incompleteness theorem. There are bounds on what we can know and bounds on what we can prove. Any technology that purports to exceed those bounds should be considered with the same skepticism as claims of a perpetual motion machine.

That said, it is a different issue how we talk to regular folks. In the same way that it is hard to explain why perpetual motion machines will never work, it will be hard to explain the boundaries of what is knowable and provable.

@ewelton
Copy link

ewelton commented Mar 19, 2020

@jandrieu I understand how you frame it and why you say what you are saying. But there are practical solutions to the problem @dhh1128 raised. More importantly, we just need to pick one and move forward.

What you are saying is true, but I feel you are simply missing the point of what we are saying, and are convinced that this is because we fail to appreciate your point.

The subject of a DID has no semantics - and, importantly, if the hash is cryptographically bound to the genesis key pair, then it CAN NOT serve the role of identifying digital content in a self-certifying manner. Instead, it can only be the "name" of a record that contains the target identifier.

What we are exploring is a way to augment that environment - to make self-certifying content identifiers first-class citizens. This exploration is not about mathematical provability or Cantor's Paradise.

In terms of did methods - we are starting to see 'strange methods' like did:key - which, one might argue, have a different relationship with 'controllerhood' than do blockchain-resident did methods with long running did-documents that can evolve over time and can engage in complex expressions of verification methods and service_endpoints.

The option on the table is to recognize some of those differences - and instead of rage against them, decide if that variation can be co-opted and exploited.

In a sense it does not matter which is chosen - as long as it is chosen soon, and precisely. There is a strong argument for disallowing this sort of "content-hash" immutable element - like did:immutable:<hash> - and there are arguments for it. It is not the case that it is fundamentally impossible due to the Principle of Least Action driving the inherent increase in Entropy we commonly experience as the Arrow of Time - it is a pragmatic decision for the spec.

@pknowl
Copy link

pknowl commented Mar 20, 2020

An object in a decentralized network needs an identifier. The DID name itself "Decentralized Identifier" suggests that there should be room to include a solution in the DID spec.

@pknowl
Copy link

pknowl commented Mar 20, 2020

... and, for the record, semantic objects should never be governed in a decentralized network. That is why schema.org, etc. are open-access and free of governance. If semantics are governed they simply won't be adopted.

@jandrieu
Copy link
Contributor

I may be missing the point. I certainly don't understand what @dhh is trying to get at with disambiguating. But I also don't understand your previous comments. We can talk about DIDs and your suggestion that I would support those three items you listed made it seem like you didn't understand my point. If you do, great.

I really don't understand how #2 is accomplished, in any identification architecture.

  1. Eliminate ambiguity: make the referent of the identifier completely uncontroversial.

@dhh later expands that to

I'm saying that it's a defining characteristic of DIDs that they prove the correctness of exactly one type of statement, which is an assertion about scope of reference -- and I'm claiming that is a generalization of the variant you like, which is scope as proved by cryptographic evidence.

I'm still not following. The referent is not scoped by the DID. Rather, a link to a certain set of cryptographic material is provided by a DID Method after resolution.

That's it. What's what DIDs do. Resolve up a DID and you'll get some cryptographic material that can be used to interact securely with "The Subject" whatever/whoever that is. Maybe it is the controller. Maybe it is not. It isn't well scoped at all. It can even change over time. It is completely ambiguous what it refers to.

The only DID that resolves ambiguity is this hypothetical did:immutable. Which doesn't seem like a DID at all to me. So, yes, you can change the definition of DIDs to add something like did:immutable. But you can't say DIDs have a primary function of removing ambiguity--and then use that to justify an argument FOR did:immutable--because no other DIDs do that.

Don't get me wrong: immutable ids are cool. iid:[hashtype][hash] seems like a reasonable thing to standardize. github.com/w3c-ccg/multihash seems like it's half-way there.

I just don't think that's a DID in any sense that this community has been working on.

Maybe I am missing something. In any case, I'm definitely not following the logic on how did:immutable and its kin is anything like other DIDs.

Also... I'm not raging. I'm just disagreeing. DIDs are a thing. They aren't everything. They don't solve all the identifier problems. They are not the right identifier for every kind of thing that might need an identifier. They are a particular type of identifier that might be useful for certain things. Their key distinction is the ability to find the current authoritative cryptographic material for interacting with the Subject of the DID.

Before DIDs, there was not a particularly good way to find such material, not in any definitive way, without reliance on a third party. PGP's web of trust was the best prior art in this area. DIDs are a huge advancement in the usability of cryptography for a large number of use cases. It would be great if we could just focus on getting this fundamental innovation in the books, so we can turn our attention to building the amazing services on top of DIDs that so many of us are excited about.

@pknowl
Copy link

pknowl commented Mar 20, 2020

The name DID should really be DEI (Decentralized Entity Identifier). DID suggests that you can identify anything in a decentralized network. If an object identifier cannot be accommodated, the name DID is misleading which is a shame. We would also have to build out an entirely new standard for a DOI (Decentralized Object Identifier) which of course can be done.

In an ideal world of DIDs for everything in a decentralized network, you would have did:e:<hash> for Entity and did:o:<hash> for Object.

Which way are we going to go?

@dhh1128
Copy link
Contributor Author

dhh1128 commented Mar 20, 2020

@jandrieu : I think we are talking past each other because we are talking about different manifestations of ambiguity -- and it might be because of my own clumsy language. If so, I apologize. Let me try again. And let me step away from DIDs for a minute; maybe a different context will help.

Suppose, one day, that Alice invents a brand new word: "habapookajar." She's at a party, and she applies it as an adjective to a person wearing expensive Italian clothes. Those who overhear her are pretty sure it means something sort of like "sophisticated" -- but they're not quite sure. Her meaning is ambiguous. Even if they ask Alice what she means, there's no guarantee she'll tell them the truth, or be able to give them a definition that perfectly embodies her intentions.

This is ambiguity, and I believe we're in alignment in suggesting that it's fundamentally unresolvable. Let's call that "type 1" ambiguity for a moment.

But at least we know who's the definitive authority on the meaning: Alice. Whatever she says it means, we have to accept. There's no ambiguity about that, right?

Or is there? Suppose there's another party a week later, and Bob is overheard using this word. Someone asks him if he got it from Alice, and he says "No, I invented it. Who's Alice?"

Although all ambiguity has things in common, this new ambiguity feels like it's worth putting into a second bucket. Let's call it "type 2" ambiguity. This is not ambiguity about what the word means; it's ambiguity about how to approach learning the word's meaning; we don't even know where to start.

No identity systems can resolve type 1 ambiguity.

A centralized system resolves type 2 ambiguity because the system is the acknowledged authority on the question of what the identifier refers to. That doesn't make the identifier's meaning perfectly clear (nothing can) -- but it removes any ambiguity about how to learn more. But type 2 ambiguity has always been a big problem in decentralized systems, because there is no such authority.

Part of the genius of DIDs is that they solve this problem. That's a hugely valuable innovation. We've explained that innovation in terms of cryptographic control, and if we choose to, we can continue to explain it that way. We can say that the problem is proving control, and the solution is cryptography.

But what I'm suggesting is that we can define the problem in a slightly more general way, and that this might have nice consequences. It would be a tradeoff, as you say.

Old problem statement: How do I prove control of the identifier?
Old answer: With cryptography.

New problem statement: How do I eliminate type 2 ambiguity?
New answer: So far we've imagined two ways. One is to prove control with cryptography. Another way is to derive an identifier from objectively observable properties that remove all ambiguity. Maybe we'll realize there are other ways, too.

I admit that this new formulation is a departure from the official party line. The arguments in favor of it that I'd offer are:

  1. It explains cryptographic control's desirability from first principles, not as an end unto itself. That feels deeply true/correct to me.

  2. It is open-ended, but not infinitely broad. It claims for DIDs all conceptual identifier territory that intends to be decentralized but not type 2 ambiguous. UUIDs and numerous naming schemes fall outside the scope for clear reasons, but other mechanisms could be discovered that have the defining properties. Maybe we'll learn something. It would be nice not to have to start a new standard when the one we've already built anticipates such possibilities.

  3. It allows me to leverage the hard work that's been done on DIDs to solve a whole new set of problems that are currently ruled out by the insistence that DIDs must be based on control of the identifier. This set of problems has been simmering in the background of DIDs for several years now, with people never quite able to explain why they felt misaligned. It's now late in the process, but I finally feel some clarity about why the disconnect and what a solution might be. The identifier variant that derives from objective properties is a type of identifier that's "discovered" rather than "created", and I intuit (but cannot prove) that we may come to love that type of identifier and want it under the DID umbrella.

I don't think these three arguments are a slam dunk argument in favor of what I'm proposing. But I'm hoping that at least my worldview and my comments about ambiguity make better sense?

@pknowl
Copy link

pknowl commented Mar 20, 2020

Spelt out, the two options are doi:<hash> or did:o:<hash> for an object identifier. That should probably be put to a community vote.

@pknowl
Copy link

pknowl commented Mar 26, 2020

Thanks, @ewelton . That is a sound argument but, going back to my original argument of DIDs for everything in a decentralised network which allows us to move into a synergistic future with better naming conventions and smarter identifiers, I'm keen to keep investigating.

@kdenhartog - Are you able to answer Eric's first question ...

1.) How is control surrender enforced?

@mitfik - Are you able to answer Eric's second question ...

2.) How does resolution work (e.g. what is the relationship w/ an underlying registry)?

Let's hammer that out before coming back to method naming.

Just so I don't have to scroll back later on, can someone also give me a definitive answer on whether a DID method type should depict a function or a target? Thanks.

@jandrieu
Copy link
Contributor

jandrieu commented Mar 26, 2020

We've said a lot of words here. I have tried to keep this brief (and failed). HOWEVER, I am responding with a different illustration of what I see as the defining mismatch between content-based identifiers and DIDs.

This thread has shifted my sense of how we communicate what a DID is. Regardless of whether was adopt this new kind of DID as something we, as a standards effort want to incorporate, we should definitely update the language in the spec so the mismatch can be minimized for future readers. People have a hard time understanding how DIDs do what they do, which is vital to understand if they are appropriate for a given reader's needs. However the technical questions resolve, we definitely have a documentation problem.

Here's what clicked for me as I was trying to understand how we are talking past each other.

DIDs are a framework for cryptographically proving control over an identifier without relying on a trusted third party.

This is what's new. This is what's different.

This proposal to "nuance" our mental model abandons that and would create a new class of DID which is essentially uninteroperable with other DIDs. I'll call these CIDs for content identifiers, which have all the characteristics described by others. As I've stated many times, they sound awesome. They will be useful. It makes sense to standardize a way to use them.

Consider the use cases document:
https://w3c.github.io/did-use-cases/

First, two of the first four essential characteristics of DIDs are not met by CIDs:

  1. crytopgraphically verifiable: it should be possible to prove control of the identifier cryptographically;
  2. resolvable: it should be possible to discover metadata about the identifier.

#3 is not met because the hash provides NO way to demonstrate control. It only demonstrates knowledge of the associated content.

#4 is not met because there is no derivable meta-data about the identifier. A CID has no mechanism to lead you to additional details that would allow the core functionality that define DIDs. In particular, there is no way to bootstrap a control framework just from a hash.

Maybe I'm missing something on #4, but to my understanding, revealed knowledge cannot establish control in the way that secret knowledge can. If you must reveal the knowledge to satisfy the cryptography, as you do with hashes, you cannot prove anything cryptographically without ceding equivalent control to the recipient of the proof. It's a leaks control and therefore isn't suitable as a control framework.

Second, of the 13 actions enabled by DIDs, only the first two are supported by CIDs:

3.1 Create
3.2 Present
3.3 Authenticate
3.4 Sign
3.5 Resolve
3.6 Dereference
3.7 Verify Signature
3.8 Rotate
3.9 Modify Service Endpoint
3.10 Forward / Migrate
3.11 Recover
3.12 Audit
3.13 Deactivate

CIDs can't be used to Authenticate, Sign, Resolve, Dereference, Verify Signature, Rotate, Modify Service Endpoint, Forward / Migrate, Recover, Audit, or Deactivate.

Third, the reason DIDs are useful in decentralized identity is precisely because of the ability to demonstrate control. Not because they identify only a particular class of thing or because they can disambiguate anything.

(FWIW, even @dhh's second definition of disambiguate wrt Alice's definition is unknowable and unprovable. Because people other than Alice can use the DID as a subject without getting confirmation from Alice that they are using it in the way that she means it. And even if they did, there is still the risk of semantic drift as Alice's sense of what she means evolves over time.)

The way DIDs bootstrap digital identity, in the most typical use case where Subject==Holder==Controller (whether or not the issuer is identified by DID) is as follows:

Two stages.

First, you get the credential.
Stage I

  1. You onboard at an issuer--they first prove who you are to their satisfaction.
  2. You prove control over a given DID (often called DID-AUTH) using the secret associated with the cryptographic material specified in the DID Document
  3. The issuer generates a VC with that DID as the subject and gives it to you, signing it in a provable manner.

Second, you use the credential.
Stage II

  1. A Verifier presents a challenge in a request for a credential
  2. You construct a Verifiable Presentation which includes both the challenge and the VC, signed by the same secret material used to prove control in Stage I.2
  3. The Verifier checks that the holder and subject are identified by the same DID.
  4. The Verifier checks that the presentation (with the challenge) is signed with secret material indicated by implication in the DID document. Most commonly, the VP is proven to be signed by a private key that matches a public key in the DID Document.
  5. The Verifier checks that the signature of the credential matches the known cryptographic material from the issuer (this can be from a DID Document or from any other pre-arranged mechanism to exchange keys or the like).

At this point, the Verifier knows that the current presenter of the VC has proven control over the same secret information as the subject, and therefore, with a specific level of assurance they can accept that the current presenter is one of the following:

  1. the Subject
  2. a delegate of the subject with cryptographic authorization (someone who has control over a proof mechanism listed in the authentication section of the DID Document or who simply has been given the private keys of the Subject for this purpose)
  3. a bad actor who has compromised the keys (or proof mechanism) of the Subject

We always have to allow for #3. That's the weakness in the system. However, the entirety of modern cryptography has this weakness, which is why keys MUST be kept secret if they are to have any use whatsoever.

It is the ability to perform this proof of control that ties the issuance of a VC to its presentation so that a Verifier can have some proof that the party presenting the credential is, in fact, the entity given that credential, which to the best knowledge of the issuer was believed to be the subject of that credential.

You could, of course, use a third party to demonstrate proof of control. You just ask Facebook who they believe is the current presenter. They'll use their own authentication approach then present their result. The whole point of DIDs is to enable this sort of bootstrapping of verifiability WITHOUT relying on the likes of Facebook. That's what makes DIDs unique and valuable.

CIDs can't be used in this fashion. As such, they just don't do--CAN'T DO--the fundamental thing that DIDs were created to do.

Yes, we can attempt to interpret the "decentralized" part of the DID name in the hope of supporting all the kinds of identifiers that can be rigorously created without a trusted third party, but, when we can't even agree on the meaning of the word "decentralized", that seems like a particular kind of madness. No offense to @dhh1128 @pknowl @ewelton or any other proponents of this idea. It's just that shoehorning an incompatible, non-interoparable notion of DIDs because of lexical similarity with an ill-defined term just doesn't stack up for me.

That said, I do like CIDs. They have been implemented as URNs in several forms from urn:hash to urn:sha. The particular variation proposed here might deserve its own namespace, such as urn:cid or perhaps if it builds on multihash, urn:multihash.

However, since

  • you can't use CIDs to perform proof of control to bootstrap decentralized identity in the way describe above
  • CIDs lack 2 of out 4 "essential" characteristics of a DID
  • you can't use CIDs to perform 11 of 13 actions of DIDs as captured in the Use Cases and Requirements document

I can't help but come to the conclusion that CIDs are not DIDs.

If it doesn't look like a duck and doesn't quack like a duck, it's probably not a duck.

It might be a bird. It might taste delightful when prepared in the Peking style, but it still probably isn't a duck.

@ChristopherA
Copy link
Contributor

ChristopherA commented Mar 26, 2020

The did:o: identifiers would not sit in any identity registries.

There is some precedence for this. The DNS RFCs specifically exclude the .onion root domain (and a few others) from fully complying with the DNS standard. See specifically https://tools.ietf.org/html/rfc7686

-- Christopher Allen

@jandrieu
Copy link
Contributor

I'm sorry, is the proposal here to have a did:o namespace that then has multiple methods underneath it?

For example

did:o:sha:123...
did:o:multihash:abc...
did:o:myHash:xyz

Is that was you're suggesting @pknowl?

@ewelton
Copy link

ewelton commented Mar 26, 2020

@jandrieu I want to clarify - I am not a proponent of adding content based identifiers into the current model of DIDs. This is because of the two reasons I enumerated - lack of solution to resolution, and no way to fully "surrender control" - and "reproducing" simple urns but calling them DIDs is silly - and besides did:o:sha:123 doesn't assist resolution at all, because it is missing location information.

One of the mistakes made in the DID model is the strange handling of resolution - DIDs contain some location information but rely on a bunch of secret hidden magic to make them resolvable. Resolution is critical, and leaving it out of scope is just part of what I consider "a long series of mistakes" beginning around mid 2019.

Current DIDs have become defined the way you define them as the result of evolution of the community. DIDs were more open to flexibility and interpretation in the past. Alternative approaches to DIDs lost out in the sea of privacy, control, and decentralization voices - and that is fine. The rubric idea became myopically focused on decentralization, so we lost most of the structure for navigating the alternatives. The use cases became focused on what I consider a niche world. The collapse of semantic flexibility meant we got onto the road of "the one true DID"

So, to be clear - I believe that there are legitimate use cases for these sorts of "non-controlled" and "verifiable" content-based identifiers. And I believe that 1 year ago would have been a great time to sweep them into DIDland so that we could build them into the resolution infrastructure. And I believe that the flexible semantics we had 1 year ago gave a very clean path to model this larger landscape inclusively and to the benefit of the global community.

However, as of today, DIDs are more focused - they are much more specific thing, and that means that a spec will be produced and we'll get some nifty tools out. It also means that I think that getting these sorts of capabilities into the DID landscape, for the goals @pknowl identifiers, might not be viable today - the window has closed and it is time to work with the DIDs we have, not the DIDs we want. Maybe there is a way to shoehorn them into the authoritative model of DIDness, but it will take a cleverer person than me to do it.

Don't get me wrong - there has been a lot of great work and thought behind DIDs-of-today - but DIDs are neither revealed truth nor natural law, they are the result of a negotiated specification that reflects the loudest and most energetic voices. Since those have focused on privacy paternalism, control, anti-correlation, and a particular interpretation of decentralization - that is what we have. I am excited to see a lot of the work that is going on, but these DIDs are just not that relevant to my use cases - there are alternatives which I can use today to deliver "improved-sovereignty" and "improved government and business processes" through the use of non-DID grounded credentials and capabilities. When DIDs are mature and in broad adoption, it will be easy to incorporate them into my world and further improve sovereignty - and I am looking forward to that.

What makes DIDs strong for some people, make them weak for others - and that is normal. What is most important is that the spec stabilizes and is released. There is always room for adaptation in the next round of specs, and via alternative specs - so I support this effort to the extent that it does not derail or retard the delivery of a clear specification - whatever it winds up saying.

@pknowl
Copy link

pknowl commented Mar 26, 2020

Many thanks for pointing me to that link, @ChristopherA . Very much appreciated.

@jandrieu - For our purposes, we're not interested in location, we just need to know that the content is immutable. Perhaps resolution characteristics and MIME-type would be held in the associated DID document. I would expect the did:o: namespace to be very simple ...

did:o:<hashofcontent>

For example, if a non-governed object were moved from Drive A to Drive B, the identifier should remain the same even though the location has changed.

@mitfik will certainly have some deeper insight into requirements and resolution.

@pknowl
Copy link

pknowl commented Mar 26, 2020

@ewelton - I'm also acutely aware that if we get the naming convention right at this stage for non-governed objects, the Semantics side of the model would remain stable despite the release of future versions of the DID specification. This is just as much about sustainability to the network going forward as it is to non-governed objects requiring a stable identifier under the DID umbrella.

@ChristopherA
Copy link
Contributor

Actually, the precedence of allowing for some “special purpose domains” that do not need to fully adhere to the DNS RFCs is described more fully in Section 3 of RFC 6761.

https://tools.ietf.org/html/rfc6761#section-3

The .onion domain RFC https://tools.ietf.org/html/rfc7686 describes more why this top level domain meets the
criteria.

I’d like to suggest that we support a similar carve out (like in RFC 6761) for how to register a “special purpose method”, but specifically do not add to our agenda to tackle specifying the nature of any such method.

This allows the did:o, etc. people to proceed with their ideas, and allows others others who do not meet the full criteria of the 1.0 standard to still be able experiment.

For could begin with registering those method that don’t support full CRUD by marking them as “special purpose method” in the registry, and the method only has to show why they qualify as such a method.

— Christopher Allen

@ewelton
Copy link

ewelton commented Mar 26, 2020

@ChristopherA That does seem like a particularly useful way of sorting out some of the "stranger" methods, and perhaps keeping the door open a crack for at least playing around with novel ideas. If some of those ideas catch hold, they could make it into an future version of the spec itself - but they do not have to challenge the progress achieved by focusing DIDs, and they do not need to distract by requiring additions to the use cases.

+1 !

@peacekeeper
Copy link
Contributor

@kdenhartog

For example, I believe that did:atom:carbon is valid today [..]

I agree with your comment.

Just wanted to point out that there's an interesting difference between did:atom:carbon and did:atom:6. In the second example ("6"), the identifier is an "intrinsic property that is objectively observable" (quoting @dhh1128 here), whereas in the first example ("carbon"), that is not the case.

@dhh1128
Copy link
Contributor Author

dhh1128 commented Mar 28, 2020

I've gone quiet on this long thread that I started, but I wanted to say thank you to all the smart people who chimed in.

Re. the final pair of comments from @kdenhartog and @peacekeeper : yes to the distinction Markus was trying to highlight. When you have a property that is objectively observable as the basis of an identifier, and everybody knows what property to look for, then you have the interesting phenomenon that multiple observers will automatically be led to agree on the identifier for the object -- even for new objects not yet discovered. This has some very desirable benefits in a decentralized ecosystem. Perhaps Joe is right that this doesn't belong inside the DID umbrella; I'm content to let consensus rule, but just wanted to make the strongest case I could for it.

As the original opener of the issue, I am happy enough with the ensuing discussion to let it be closed now. But we can also keep it open longer if procedure or the preferences of others pushes us that way.

@peacekeeper
Copy link
Contributor

I think for those who would like to update the mental model in ways that have been discussed in this thread, a concrete next step would be to:

  • Propose that the "create" operation be made optional, just like a while ago we made "update" and "deactivate" optional, OR:
  • Demonstrate in some draft version of a DID method spec how the "create" operation would be defined.

@talltree
Copy link
Contributor

@peacekeeper A DID using this method-to-be-named would still have a definition of the Create operation, no? It's just that the Create operation in the DID method spec would describe the special way in which DIDs using this method are created.

RE naming, I thought the original proposal was for DIDs using this method to use the multihash format. If so, why not just call it did:multihash:.

@pknowl
Copy link

pknowl commented Mar 29, 2020

@talltree I'm keen to name this method type did:o:, a name that can be cast in stone unhindered by future revisions to DID specifications and methodology. An "object is an object" so why not be bold from the outset.

The other argument for sticking with the "O" method type is that there will be a huge number of these identifiers woven into the fabric of the decentralized network. 50% of all identifiers (i.e. anything non-governed within the data capture side of the model) will contain this method type. To help people digest, adopt and ultimately scale this new identifier type, users could simply refer to them as "DID-Os".

@peacekeeper
Copy link
Contributor

+1 to did:multihash over both did:immutable and did:o. The method name should be a hint to how the DIDs are created and resolved, rather than indicating what is being identified.

I think this is another interesting aspect in this thread. Almost all DID methods I am aware of don't restrict what is being identified. This one seems to have such a restriction, i.e. it can only identify what can be hashed.

@pknowl
Copy link

pknowl commented Mar 29, 2020

@peacekeeper I suppose the method name should reflect how the community sees the DID space evolving. I, for one, hope that the argument for the development of did:e: (entity identifiers) and did:o: (object identifiers) will be supported by the DIDWG in the future. I'm not saying we need to get there tomorrow but, now that a light has been shone, it will be difficult to ignore.

We have a rare opportunity to name the object identifier correctly right off the bat whilst hinting at an elegant DID syntax evolution for the future. Why wait for governed identifiers to align to the methodology. If the identifier name is set to did:multihash:, it will inevitably have to be renamed to did:o: in the future.

If I'm missing something and did:multihash: will simply be easier to get over the line for DID v1.0 then I'll concede for the greater good but that shouldn't stop the DIDWG from investigating did:e:/did:o: further upstream in a bid to resolve the potential method-type scaling issue highlighted in this thread.

@pknowl
Copy link

pknowl commented Mar 30, 2020

@mitfik has just messaged me saying that he has a feeling that a non-governed object identifier may need to contain more than just a simple 'multihash'. On that note, I propose that the community hold off on a casting vote until the tech guys have had a chance to further investigate what identifier characteristics should be included.

@ewelton
Copy link

ewelton commented Mar 30, 2020

@peacekeeper

I think this is another interesting aspect in this thread. Almost all DID methods I am aware of don't restrict what is being identified. This one seems to have such a restriction, i.e. it can only identify what can be hashed.

This is critical as I see it, because it is the presence of a controller that defines the semantic space within which the identified exists. I see that as a key strength of controlled DIDs. When you and I talk about the same thing using different DIDs, the only way that can coordinate is by presenting evidence from attached and found information - external claims, credentials, and the like which are linked to the controlled document. That is very valuable, however....

The reason these were of interest was that, like urn:multihash:1234 there is a restrction on what is identified - namely that which can be hashed. It is this property that allows them nearly zero semantic ambiguity - down around 1 in 2^80 or above range - tweakable by the hash, of course. This means that we can talk about the same thing, using an identifier, without pinning it on a negotiation.

This is useful, for example, when pointing to a credential schema or context or other primitive from which one scaffolds deterministic processing in a decentralized data economy - it provides an "open authority" without simply using DIDs to create "a new root of central authority." I find the concept of a Bitcoin Anchored Semantic every bit as Centrally Controlled as schema.org.

Hashlinks give us a lot of the power needed - and in particular they give us the thing that is missing from simply using did:whatnot:<hash> - namely, hints about location and thus a pathway to resolution. What nothing gives us yet is a specification about what sort of descriptor could come back, and that definitely has value - giving programmers a coordination point that was not bound to specific implementations, but bound to the concept of uncontrolled, self-certifying identifiers.

I also remain concerned about the maintenance of hidden control - the 'create' method would effectively be a 'register' method - but register it in what infrastructure? - which gets, again, to resolution. And it is the infrastructure of the registry which defines the possibility of true "surrender of control" vs. "good samaritan waiving" - i think it makes sense to wait to name this concept until those elements are clear:

  • how do create/register
  • how does read work
  • how is control surrender enforced

if we can not do these, then we have defined something equivalent to regular DIDs with a claim "this DID that I control is about urn:multihash:1234" - and those DIDs are fine, but they can not be the foundation for scaffolding semantic processing on a decentralized data economy - for that we need a decentralized identifier with broader capabilities than DIDs.

@kdenhartog
Copy link
Member

I think for those who would like to update the mental model in ways that have been discussed in this thread, a concrete next step would be to:

* Propose that the "create" operation be made optional, just like a while ago we made "update" and "deactivate" optional, OR:

* Demonstrate in some draft version of a DID method spec how the "create" operation would be defined.

I'd say there's probably a few things we could take from this thread as well to make as additions to the did core spec. Some of the arguments against this method have pointed to a few things that are left as tribal knowledge that I'm wondering if we could get normative, testable statements for.

For example, one of @jandrieu point I felt was a pretty strong point. On creation of a DID it SHOULD (could be upgraded to MUST) be possible to prove limited control of the identifier via a cryptographic mechanism.

Another one I've been toying around with is the idea of a minimum number of possible namespace entries. E.g. the method specific identifier must be able to identify at least 2^80 unique identifiers. I'm not sure this really adds much enforcement to the idea of the identifier not needing an authority to authorize access to the namespace.

I also like @ewelton point about adding at least non-normative statements and normative statements if possible around surrendering control because I feel that was part of the crux of what makes this possible.

@peacekeeper do you have any ideas around other things that might be worth adding for this?

@kdenhartog
Copy link
Member

Thanks, @ewelton . That is a sound argument but, going back to my original argument of DIDs for everything in a decentralised network which allows us to move into a synergistic future with better naming conventions and smarter identifiers, I'm keen to keep investigating.

@kdenhartog - Are you able to answer Eric's first question ...

1.) How is control surrender enforced?

It's surrender at the point of creation by the intrinsic nature of the method. In other words, control of the knowledge is all that's necessary to create the method. Representation and proof of control is unnecessary after creation, just as it's unnecessary after all keys have been revoked in all other methods.

@kdenhartog
Copy link
Member

I'm sorry, is the proposal here to have a did:o namespace that then has multiple methods underneath it?

For example

did:o:sha:123...
did:o:multihash:abc...
did:o:myHash:xyz

Is that was you're suggesting @pknowl?

I hope not, that makes the method name even more likely to centralize around a naming authority.

@kdenhartog
Copy link
Member

I've gone quiet on this long thread that I started, but I wanted to say thank you to all the smart people who chimed in.

Re. the final pair of comments from @kdenhartog and @peacekeeper : yes to the distinction Markus was trying to highlight. When you have a property that is objectively observable as the basis of an identifier, and everybody knows what property to look for, then you have the interesting phenomenon that multiple observers will automatically be led to agree on the identifier for the object -- even for new objects not yet discovered. This has some very desirable benefits in a decentralized ecosystem. Perhaps Joe is right that this doesn't belong inside the DID umbrella; I'm content to let consensus rule, but just wanted to make the strongest case I could for it.

As the original opener of the issue, I am happy enough with the ensuing discussion to let it be closed now. But we can also keep it open longer if procedure or the preferences of others pushes us that way.

It looks like the author of this issue feels satisfied by the discussion that occurred. Next steps for this can go one of two ways (potentially both) I would guess. @mitfik @pknowl and I can draft a strawman did method to explore what these immutable, surrender control on creation dids would look like, or we can begin to propose language to constrain what did methods are possible.

Any opinions on which way to go?

@pknowl
Copy link

pknowl commented Apr 6, 2020

Thanks, @kdenhartog . I believe this is now in the capable hands of @mitfik and a couple others in the HCF tech group to start working on a strawman/draft spec. The workload has suddenly gone through the roof at this end which is why this stream has slowed down. That said, I think we have everything we need for now.

@kdenhartog
Copy link
Member

I propose we close this issue then since the did method can be shared via the did method registry. Any objections?

@kdenhartog kdenhartog added the pending close Issue will be closed shortly if no objections label Apr 7, 2020
@brentzundel
Copy link
Member

No activity since marked pending close, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending close Issue will be closed shortly if no objections
Projects
None yet
Development

No branches or pull requests