-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministic canonical form for CBOR/DagCBOR not implementable, prevents other representations #585
Comments
@cabo, I'd value your input. My thoughts are that it is important in our spec to have a deterministically encoded representation to get consistent encoders/decoders. The group seems to think this is a daunting task and not worth the effort. I'm thinking given our spec isn't that complex and we aren't dealing with numbers, with a little bit more work we can do it and will pay off in spades down the road. ... just like all the work we put in to having an Abstract Data Model will.....( I hope!) @cabo, I also appreciate your work with the updated CBOR #rfc8949 @msporny regarding first constraint: |
@msporny, btw, this also leave to door open for CBOR-LD and dagCBOR to co-exist. |
Most definitely, we wrote Section 4.2 of RFC 8949 based on what we had learned in seven years of using CBOR in a wide variety of environments. You want to point to that (Section 4.2.1 specifically), not try to rephrase it. Re the additional rules: You can't really have a SHOULD in deterministic encoding. I don't understand what that rule is trying to do -- why not say you MUST leave out entries that would have an undefined value? If the CBOR encoding defines some integer labels (as opposed to text string labels), and both are equivalent, I'd probably err on the side of using the shorter ones (i.e., integer). But that requires correctly curating the numeric labels: You cannot add one after the fact after you already have had a text string version, because that would cause older implementations that only know the text strings to produce different output than newer ones that know the integer labels. In SenML (RFC 8428), which also supports JSON and CBOR, we went radical about this: The set of integer labels is fixed and cannot grow; new labels are always text only. |
@cabo, many thanks for your input, much appreciated and fine work on the RFC 8949!
yes, that makes sense, at the time we were having a debate about |
The issue was discussed in a meeting on 2021-01-28 List of resolutions:
View the transcript2. CBOR sectionsSee github pull request #552. Manu Sporny: Let's talk about the CBOR section and the DagCBOR section. Jonathan, can you give an overview on those sections now? Jonathan Holt: On our call on Tuesday, we're working on a security document. We need to have deterministic encoding of the DID document, especially if the method will be signing and having a deterministic ordering is important.
Jonathan Holt: Including 64-bit integers and floats, but the language that's now in the dagCBOR section, but should be in the CBOR section. So here's a new PR to fix it. Manu Sporny: Thanks for that overview, Jonathan. There are numerous concerns around deterministic canonical form for CBOR. Just so everyone is on the same page for deterministic canonical form. Typically when you digitally sign things you want to have them in a deterministic canonical form. Jonathan Holt: I think the digital signatures are not in scope for the charter. I agree with that. Data modeling is. How we get to data modeling to ordering is relevant for us to sign. Orie Steele: I think I agree with most of what jonathan said. We have an ADM and serializations of that ADM in various different forms. If we're limiting ourselves to just JSON forms, there are multiples in JSON alone and the same applies to CBOR. Thinking of a canonical representation of an ADM, I'd like to dispel the idea that that is possible. I don't believe it is. If it were, we'd have a holy war and all the representations would fight to "be it".
Orie Steele: The people who proposed the ADM never finished the work to solve the registration problem and now jonathan is encountering that. It should be trivial to register the mime type, we should say, here's where you reference the external spec that makes it trivial to implement, and this should not be hard. There's tension over what goes in DID core and what goes in the registries. DID core will get frozen, and you should put things you're
Orie Steele: If you can't create a new mime type after DID core is done then the ADM was a mistake.
Manu Sporny: To propose two questions: Do we want to specify a canonical form/rules for the ADM or the information model. I expect everyone to say no to that, no one signed up to do that.
Jonathan Holt: I don't think canonicalization of the ADM makes sense to me, but certainly what you're signing is a representation in a particular format. Getting a one way in and one way out -- as suggested by the RFC ... our protocol should say how to sign the CBOR and get into a particular format.
Jonathan Holt: Also from the perspective of the order here, the conversations we had ... ordering matters and it matters for signatures, but what I didn't highlight -- is that it's up to the DID Doc producer to put it in the right order.
Dave Longley: in response ot manu's question, -1 for canonicalising the ADM, I don't understand what that would mean
Dave Longley: getting text in the spec that says here is how you can add more representations, and into the registries
Markus Sabadello: Moving the other representations in to the DID spec registries -- I wanted to do that, would that be ok with jonathan? We have registered properties, parameters, DID methods, so on. If we have a process for representations, that would be ok with that. Jonathan Holt: If we can flush out the governance, I may be ok with that. It's just dangling out there right now.
Ivan Herman: Just for my understanding, as far as I understood, the only reason we're talking about canonicalization here, is for the purpose of signature. If that is the case, and we're not defining signature for the time being. We don't say how you would sign the JSON representation, and if we don't talk about signature, then there is no reason to have canonicalization in the document.
Manu Sporny: Seeing some of the feedback in IRC and where the discussion seems to be headed. Two proposals I'd like to emote in IRC to look at before we take them up. Orie Steele: The second part of Manu's proposal isn't clear enough to me, if we can be clear about the registration process and that representations are free to define canonical forms, etc. that would help. Manu Sporny: I think everyone wants the process to be more detailed. I thought we agreed to not put registration processes in DID core. Because those are hard to change. I thought consensus was that the registration processes would go in the DID registries document. I'd be fine with specifying how to define representations in that doc, doing it in DID core would be a problem.
Drummond Reed: I think if we want to put the process over in the registries, I think that's what we want to do. I totally agree that we need to document it and I want to help work on it and that's where it belongs.
Drummond Reed: I agree with what Dave Longley just said that DID core just needs to say go look at the registries doc for the process. Orie Steele: I recall -- Drummond and Manu are correct that the consensus is that the DID spec registries would define the process and that's where the work needs to get done. And it just hasn't happened. And so that's why it's hard to see how it will work. Jonathan Holt: I think I can defer to Mike Jones and Justin Richer on this. Unlike JSON which isn't as strict, CBOR facilitates more strictness in the RFCs to facilitate this problem with base64 encoding with JWT for instance. It's natively supported in the RFC. It's specified that protocols should consider deterministic encoding of the representation.
Jonathan Holt: I'm also reading that other RFCs such as for COSE, the RFC punts that back up to CBOR RFC 7049 and the updated one. It's saying why it's a bad idea ... it battles with the JOSE spec. There's a lot of language, and I wish I had expertise as Jim Schaad, and Carsten, to get some weigh in for the implications of not addressing this right now. Michael Jones: With respect to COSE, because there isn't a standard canonical CBOR, is what COSE does, when it wants to sign something it just puts it in a binary string and encapsulates it. It's kind of the equivalent of what JOSE with base64. COSE side steps this by representing it as a binary string. Manu Sporny: I'm going to put in the poll, but before doing that. Just real quick. On the CBOR language, that is being referred to. It does not guarantee a canonical form. It was never meant to be that -- that's why it says "These are things you might want to keep in mind". It says "If you want a canonical form, you might want to try and do at least these things" But it's up to other specs to do that and as Mike says other specs just print out a binary string and sign it. Jonathan Holt: It is possible, and I'd like to tease out, what parts of it do you have problems with. I'd love to address those concerns.
Jonathan Holt: You know I'm going to object. I'm really harping on this canonicalization, it makes it so much easier if we have a canonical representation in CBOR. I think the ADM, it's just too abstract. So having a concise binary object representation helps facilitate the lossless encoding and decoding into other formats. It behooves us to tackle this, as it opens the door. Ivan Herman: I could say similar things about other formats. The reasons why I started work on doing various types of constraint languages, e.g., for json schema and for JSON-LD -- and I've put them into the registry repo right now... part of that to be discussed. Having your work put there would be what I would expect to happen. That can be done one the CR is published because this is not something absolutely necessary to go ahead with the CR. Brent Zundel: I'm getting pretty concerned that we're getting close to things that are officially out of scope for our group. It could argued that explaining a deterministic algorithm for signatures could be out of scope because it's too close to signatures. If we're not past the point of our scope we're very close to it.
Ivan Herman: I have a question on the proposal. I thought what we'd do in the registries, is not only the canonical forms, but also any kind of additional representations. Manu Sporny: Yes, that is correct, do you feel that the proposal doesn't say that?
Ivan Herman: If I want to have a yaml representation of the model, I should be able to do that in the registry. That, for me, is not clearly in the proposal. Manu Sporny: Yes. That's the intent. Jonathan Holt: How about this compromise, only the core model is in the DID core spec, and the representations are all in the registries. Ted Thibodeau Jr.: I think jonathan, that's roughly the intent at this time. Part of the pushback against you right now is that you have acknowledged that you're not an expert on the thing you want in the spec and we're up against tight timelines right now. Without the expertise to write the PR for what you want to add, I don't see that as possible. Manu Sporny: I think we should take up another proposal to clarify what's going on.
Michael Jones: This talk of all the representations being in the registry doesn't match what we've actually done in the spec. The JSON and JSON-LD and the dagCBOR representations are all defined in the core spec, not in any registries. I propose we don't change that and don't make any resolutions so it appears that's not true. Manu Sporny: I would like the group to focus on getting one proposal passed at a time.
Michael Jones: Yeah, specifications are specifications and registries are registries. Registries are lists of things. Specs have normative text. Talking about moving large blocks of text into a registry is nonsensical.
Dave Longley: I put a proposal in IRC. Can we solve the second class citizen issue by being clear in the core spec
Manu Sporny: I don't think that would address the issue, Dave. But let's run proposals.
Manu Sporny: Do we need to run the opposite proposal? Where we say we're going to keep the core representations in the spec?
Michael Jones: This is very strangely worded. You make a representation in a specification. You might also list that specification in a registry. You don't add a representation directly to a registry. A registry is a list not a spec.
Jonathan Holt: I haven't seen this in any protocol/place where some representation isn't able to handle this, the deterministic section in CBOR says it's up to authors. We are supposed to clearly state how to handle a representation. Not kicking the can down the road into some registry process.
Drummond Reed: My understanding is that everything that is defined in DID core is listed in the registry. Everything in the registry is official. It doesn't really matter whether a representation is in DID core or outside of DID core. All are siblings, all are in the registry. Manu Sporny: That's correct. Ivan Herman: That's correct.
Jonathan Holt: It's a fair compromise, I think we need to flush out the governance of the registry -- in which case it will be seamless, but it's punting it and I don't like that.
Brent Zundel: Thanks for coming, thanks to scribe, thanks for the input.
|
PR #593 addresses this issue. This issue will be closed once that PR is merged. |
PR #606 also addresses aspects of this issue. |
PR #606 has been merged, closing. |
At present, the deterministic canonical form for CBOR is meant to apply to all CBOR-based formats. The rules in the specification today are as follows:
These rules are partially copied from RFC7049 (the CBOR specification) in the following ways:
The first rule prevents CBOR formats that would want use a more compact form by transforming map keys to small integers, which is common for a CBOR format.
The second rule imposes a mandate that other CBOR formats might want to avoid (for example, by dropping the value instead of preserving it as undefined).
The rest of the rules don't properly transfer the meaning of RFC 7049 by leaving out text that is important to ensure implementers can create a canonical form.
All of the rules also presume that all CBOR serializations were serialized in the same way, which is an assumption that cannot be made in the DID specification. One might go from JSON->ADM->CBOR while others may go from CBOR->ADM->CBOR -- JSON and CBOR implementations are not guaranteed to keep order when sets are used, and it is expected that many will use sets in their implementations. Therefore, a false sense of security is created that the rules above create a canonical CBOR form when that cannot be guaranteed given the rules above.
The specification shouldn't be proposing that it provides a canonical form across representations, or even one representation. The group wasn't chartered to do that work, we have not been doing that work, and we're now two weeks out from going into the Candidate Recommendation phase (so we don't have the appropriate time left to do that work).
We should mark this section at risk, in the very least. Ideally, we'd remove any mention that we guarantee any sort of deterministic canonical form from the specification. We don't want to give people that impression.
The text was updated successfully, but these errors were encountered: