Skip to content
This repository has been archived by the owner on Jun 14, 2018. It is now read-only.

Formalize schema, standardize on normalization process, and improve solc support #3

Merged
merged 20 commits into from
May 19, 2017

Conversation

gnidan
Copy link
Contributor

@gnidan gnidan commented May 16, 2017

"Don't avert your eyes; normalize!"

Ref: #1

Herein:

N.B. this work involves several breaking changes and will require a major release.

  • Define json-schema specification of contract objects
  • Match bytecode/deployedBytecode convention from newer solc, replacing runtimeBinary, srcmapRuntime, etc.
  • Rename snake_case properties to their camelCase equivalents:
    • contract_name -> contractName
    • updated_at -> updatedAt
    • schema_version -> schemaVersion
  • Update Schema module to expose the following behaviors:
    • validate(obj) returns bool
    • normalize(objDirty) returns obj
  • (Remove generateObject)
  • Ensure contract objects are safely and correctly generated from all kinds of different sources ("dirty" objects), namely:
    • Previous versions' contract object JSON files (with different keys, etc.)
    • truffle-contract abstraction classes or instances
    • solc output either from .compile or .compileStandard
    • Contract objects that already match the standard
  • Use ISO8601 timestamp for updatedAt

Normalization approach:

  • Reimplement contract object normalization with declarative description of translation:
    • Define properties as single canonical output key, sourced from one of multiple alternative input keys.
    • Ensure that normalization happens as part of a single-pass operation over this translation description.
  • Eliminate "flat-object"-style input (no more top-level address, network_id, links, events!)

gnidan added 3 commits May 15, 2017 14:36
- Add runtimeBytecode -> deployedBytecode normalization
- Ensure deployedBytecode starts with 0x in generated output
Copy link
Contributor Author

@gnidan gnidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(some questions / thoughts)

"definitions": {
"ContractName": {
"type": "string",
"pattern": "^[a-zA-Z_][a-zA-Z0-9_]*$"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if contract names should be so restricted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This restriction seems fine. Might be worth seeing what Solidity restricts them to.

},
"Address": {
"type": "string",
"pattern": "^0x[a-fA-F0-9]{40}$"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's bound to be lots of these specific-number-of-bytes-in-hex regexes... not sure if there's any way around that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there needs to be. This looks good to me.

"type": "string"
},
"AST": {
"type": "object"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the AST representation in scope for something to be part of this spec?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not in scope - at least, don't write your own validator for it. I'd be surprised if solidity doesn't already have one. If they do, treat it like the abi above; if they don't, leave it alone for now.

"pattern": "^0x[a-fA-F0-9]{40}$"
},
"ABI": {
"type": "array"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ABI part of the spec? ethpm/ethpm-spec#20 indicates that the recommendation is "opaque JSON"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"doing what ethpm does" is probably a good strategy. Our validator within the truffle-contract-schema can include the abi schema and validate that as well.

}
},
"Event": {
"type": "object"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each event is a subset of the ABI; we probably want to include that in the spec here? So would we want to include the ABI in the spec, if so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't, see above. Although you could copy the event portion of the abi json schema since we're using that within our own. I think either way (either copying the abi's schema for events, or doing nothing) is good for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I'm confused - I don't believe there is a json-schema for the ABI, so there's no event portion. I will double-check in EthPM's Gitter to see what Piper concluded on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I guess @tcoulter the question I would have is "how much does truffle treat the ABI/events as a black box, how much does truffle need to know the internals?"

I would guess "very needs to know the internals", and so I would think it's probably worth specing those out fully, even if there's some risk they'll change upstream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Piper linked to a python version of the json schema. That's what I was referring to. Honestly I didn't click in. If there's no javascript json schema, then lets ignore it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess "very needs to know the internals", and so I would think it's probably worth specing those out fully, even if there's some risk they'll change upstream.

If you think you can spec those out without taking too much time, go for it. If you think it'll take awhile, leave it as a black box.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think you can spec those out without taking too much time, go for it. If you think it'll take awhile, leave it as a black box.

I'll look into that after doing code stuff. I don't think it will take a lot of time, but I'm treating this pass at the schema as a first version anyway, so certainly can leave things out of scope.

@gnidan gnidan requested a review from tcoulter May 16, 2017 01:07
@gnidan
Copy link
Contributor Author

gnidan commented May 16, 2017

@tcoulter – wanted to follow-up to mention one other thing with this, namely, code-generating the json-schema.

Not pushing for it right now, naturally, but I think there is potentially some value in making the canonical json-schema be the generated output of a build process (like what @pipermerriam is doing in Python.)

One of the main reasons I would suggest this is because of all the byte regexes. Unfortunately, there does not seem to be a way to deduplicate object keys, like there is for deduplicating object values ("$ref": ...). Writing some basic preprocessing would make these regexes a lot less error-prone, and probably remove a lot of the cruft required with allOf inheritence (see bytecode/deployedBytecode)

YAML does the kind of preprocessing I'm imagining, through it's notion of anchors. For example:

definitions:
  Bytecode: &bytecode
    type: string
    pattern: "^...$"
properties:
  bytecode:
    <<: *bytecode
    description: Contract creation bytecode
  deployedBytecode:
    <<: *bytecode
    description: Contract bytecode on chain

YAML also does nice multiline strings with various kinds of escaping, which I like, and even supports comments. But I don't know, is YAML a bad word? Obviously we'd generate JSON from the YAML as the actual schema artifact.

Can skip YAML and just do JSON manipulation also, probably a million different pre-built ways, too, but I include YAML specifically because of the features I listed.

Really, I don't feel much urgency in doing this kind of thing; just wanted to include these thoughts somewhere transparent, and curious what your initial thoughts might be.

@gnidan
Copy link
Contributor Author

gnidan commented May 16, 2017

(Oh actually it seems that json-schema has some notion of custom formats. That might be an easier thing to do before introducing preprocessing)

@gnidan
Copy link
Contributor Author

gnidan commented May 16, 2017

@tcoulter Does it make sense to call [an object represented by a contract JSON file] a "Contract Object"?

(Distinguished from the "[Contract] Abstraction", because it doesn't have methods or behavior?)

@gnidan gnidan mentioned this pull request May 16, 2017
@gnidan gnidan changed the base branch from develop to fix-develop-tests May 16, 2017 18:27
gnidan added 2 commits May 16, 2017 14:32
- Use separate methods for different sources (solc, abstraction, regular opts)

- Make `networks`, `address`, etc. (properties related to instance or
  truffle specifically) explicitly not a part of normalization process

- Move code around in `generateObject` to ensure truffle-option copying
  happens (now that it's gone from normalize)

- Switch to camelCase for fields in schema

- Add initial test for solc standard output

- Remove `x-` check for `normalizeInput`, as that's no longer part of that.
@tcoulter
Copy link
Contributor

I'm into code generation of the schema if the schema becomes unwieldy or hard to manage, or if we encounter bugs that code generation solves. I'd say as a first pass do it by hand unless code generation saves you time initially, rather than taking more time.

(Oh actually it seems that json-schema has some notion of custom formats. That might be an easier thing to do before introducing preprocessing)

👍

Does it make sense to call [an object represented by a contract JSON file] a "Contract Object"?

I like that.

index.js Outdated
reject(validate.errors);
}
});
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost wonder if this is too explicit (and I would counter my own argument by asking if it matters one way or another). Still, I like validate() over validateContractObject(), especially in the case where there's only one type of thing that ever gets validated. If there are two types of things, I would definitely call for the explicit function name. What's your take on simply naming it validate()?

Additionally, why is this wrapped inside a promise? It looks as though Ajv's validate function is synchronous, meaning you can have it throw instead of promisified. I recommend only wrapping it in a promise if it's doing asynchronous work or if there's a high likelihood we'll add asynchronous code to the function in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, I like validate() over validateContractObject()

Agreed, I'll switch it.

Additionally, why is this wrapped inside a promise? It looks as though Ajv's validate function is synchronous, meaning you can have it throw instead of promisified. I recommend only wrapping it in a promise if it's doing asynchronous work or if there's a high likelihood we'll add asynchronous code to the function in the future.

I figured that I didn't want to make any assumptions about the underlying schema validation library. If Ajv turns out to not be the best choice and we have to change it to something that uses Promises, we'd already be covered.

But it might be heavy handed, I'm kind of just playing around to see what looks good / makes sense. Your call!

gnidan added 7 commits May 18, 2017 15:13
- Use property compatibility / data transformation mapping to identify
  where data should come from in a uniform manner.
- Drop notion of "additional sources" - define properties to list
  sources explicitly including their canonical name, or default to
  source only canonical name when `"sources":` is omitted

- Allow string values for sources in addition to functions. For
  string values, assume possible chain operation (e.g. "evm.bytecode.object")
  and split/chain accordingly
- Change `objectable` (whatever that means) to `objDirty`
- Move `properties` to the top and give it a docstring
- Move getter/chain combinators lower, also document
@gnidan gnidan mentioned this pull request May 19, 2017
@gnidan gnidan changed the title Define json-schema for contract abstractions Formalize schema, standardize on normalization process, and improve solc support May 19, 2017
@gnidan gnidan changed the base branch from fix-develop-tests to develop May 19, 2017 20:42
gnidan added 4 commits May 19, 2017 17:32
- Rename schema_version import to pkgVersion
- Inline copyCustomOptions (not needed separately)
- Eliminate contractName default
@tcoulter
Copy link
Contributor

#pushit

Great work @gnidan

@gnidan gnidan merged commit ee6baba into develop May 19, 2017
@gnidan
Copy link
Contributor Author

gnidan commented May 19, 2017

wheeeee!!!!

@gnidan gnidan deleted the json-schema branch May 19, 2017 23:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants