Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-string data keys #123

Closed
davishmcclurg opened this issue Jan 21, 2023 · 10 comments
Closed

Non-string data keys #123

davishmcclurg opened this issue Jan 21, 2023 · 10 comments

Comments

@davishmcclurg
Copy link
Owner

Non-string data keys cause errors when generating pointers:

?> schema = {
?>   'type' => 'object'
>> }
=> {"type"=>"object"}
>>
?> data = {
?>   true => 'value'
>> }
=> {true=>"value"}
>>
>> JSONSchemer.schema(schema).valid?(data)
/Users/dharsha/repos/json_schemer/lib/json_schemer/schema/base.rb:625:in `escape_json_pointer_token': undefined method `gsub' for true:TrueClass (NoMethodError)

        token.gsub(JSON_POINTER_TOKEN_ESCAPE_REGEXP, JSON_POINTER_TOKEN_ESCAPE_CHARS)
             ^^^^^
	from /Users/dharsha/repos/json_schemer/lib/json_schemer/schema/base.rb:552:in `block in validate_object'
	from /Users/dharsha/repos/json_schemer/lib/json_schemer/schema/base.rb:551:in `each'
	from /Users/dharsha/repos/json_schemer/lib/json_schemer/schema/base.rb:551:in `validate_object'
	from /Users/dharsha/repos/json_schemer/lib/json_schemer/schema/base.rb:299:in `validate_type'
	from /Users/dharsha/repos/json_schemer/lib/json_schemer/schema/base.rb:211:in `validate_instance'
	from /Users/dharsha/repos/json_schemer/lib/json_schemer/schema/base.rb:91:in `each'
	from /Users/dharsha/repos/json_schemer/lib/json_schemer/schema/base.rb:91:in `none?'
	from /Users/dharsha/repos/json_schemer/lib/json_schemer/schema/base.rb:91:in `valid_instance?'
	from /Users/dharsha/repos/json_schemer/lib/json_schemer/schema/base.rb:81:in `valid?'
	from (irb):9:in `<main>'
	from bin/console:14:in `<main>'

Calling to_s in escape_json_pointer_token solves the issue, but I'm not sure if it makes sense since non-string keys aren't JSON and can produce conflicts with other valid keys.

For now, I'm opening this for discussion and to see if more people run into these errors.

Introduced: #122

@davishmcclurg
Copy link
Owner Author

There's a little more info here: #122 (comment)

@legendetm
Copy link

Having similar problem, when key is Integer

schemer = JSONSchemer.schema(JSON.parse(<<~JSON))
  {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "$id": "analytics",
    "title": "Analytics",
    "description": "Analytics",
    "type": "object",
    "properties": {
      "superscripts": {
        "type": "object",
        "propertyNames": {
          "type": "integer"
        },
        "additionalProperties": {
          "type": "string"
        }
      }
    }
  }
JSON
schemer.valid?({ 'superscripts' => { 1 => 'Note' } })
# NoMethodError: undefined method `gsub' for 1:Integer
# from json_schemer-0.2.24/lib/json_schemer/schema/base.rb:626:in `escape_json_pointer_token'

Needed to revert back to version 0.2.21.

@pmusaraj
Copy link

pmusaraj commented Mar 9, 2023

I am running into this as well. Would it make sense for a non-string key to simply raise a validation error? Something like this:

# on line 553 of schema/base.rb (for example)
if !key.kind_of?(String)
  yield error(instance, 'invalid_keys')
end
escaped_key = escape_json_pointer_token(key)

@legendetm
Copy link

Hi pmusaraj, the schema allows me to define, that propertyName type is integer and I would expect this to work, not to raise an error.

@pmusaraj
Copy link

@legendetm hmm, as @davishmcclurg pointed out above, only string keys are valid JSON.

@legendetm
Copy link

legendetm commented Mar 21, 2023

When I read the core specification, it says the "name that the schema is testing will always be a string" what also may mean, that the key is stringified for testing. This does not say, that the key must be string.

propertyNames
The value of "propertyNames" MUST be a valid JSON Schema.

If the instance is an object, this keyword validates if every property name in the instance validates against the provided schema. Note the property name that the schema is testing will always be a string.

Omitting this keyword has the same behavior as an empty schema.

But when viewing understanding object reference, then this states that object keys must always be strings.

Since object keys must always be strings anyway, it is implied that the schema given to propertyNames is always at least: { "type": "string" }

In all, this can be confusing and you can find multiple examples in wild where the type for propertyNames is set to another type. It would be better, if the schema would be also considered invalid when non string type is specified. As currently the schema is considered valid, but cannot be validated against JSON.

Considering the above, I will update my schema to use "pattern": "[0-9]+" instead of "type": "integer".

I checked how different online JSON schema validators validate, and this seems to be handled in multiple different ways:

  1. "type": "integer" was mostly invalid.
  2. "pattern": "[0-9]+" and actual key was number 1, was mostly working (key in JSON was stringified)
  3. "pattern": "[0-9]+" and actual key was string "1", was working everywhere.

@davishmcclurg
Copy link
Owner Author

I am running into this as well. Would it make sense for a non-string key to simply raise a validation error?

@pmusaraj that's an interesting idea. I'm concerned about conflating schema validation errors and input errors, though. I think it makes sense to only yield errors for actual validation errors and not for bad input. Maybe it would be better to raise an error similar to InvalidSymbolKey:

raise InvalidSymbolKey, 'schemas must use string keys' if schema.is_a?(Hash) && !schema.empty? && !schema.first.first.is_a?(String)


When I read the core specification, it says the "name that the schema is testing will always be a string" what also may mean, that the key is stringified for testing. This does not say, that the key must be string.

@legendetm I think the "key must be string" portion is implied because JSON schemas operate on JSON, which only allows string keys:

>> JSON.parse('{1:"x"}')
/Users/dharsha/.asdf/installs/ruby/3.2.0/lib/ruby/3.2.0/json/common.rb:216:in `parse': unexpected token at '{1:"x"}' (JSON::ParserError)
        from /Users/dharsha/.asdf/installs/ruby/3.2.0/lib/ruby/3.2.0/json/common.rb:216:in `parse'
        from (irb):1:in `<main>'
        from /Users/dharsha/.asdf/installs/ruby/3.2.0/lib/ruby/gems/3.2.0/gems/irb-1.6.2/exe/irb:11:in `<top (required)>'
        from /Users/dharsha/.asdf/installs/ruby/3.2.0/bin/irb:25:in `load'
        from /Users/dharsha/.asdf/installs/ruby/3.2.0/bin/irb:25:in `<main>'
>> JSON.parse('{"1":"x"}')
=> {"1"=>"x"}

At the level the JSON schema specification operates on, there's no such thing as a non-string key.

Considering the above, I will update my schema to use "pattern": "[0-9]+" instead of "type": "integer".

That seems like the right approach 👍

@myxoh
Copy link

myxoh commented Apr 7, 2023

Updating from 0.2.22 to 0.2.24 is breaking some of our specs that were generated using symbols as keys so:
An example minimul reproduceable spec:

This worked in 0.2.23 but stops working on 0.2.24

  it 'tests a schema' do
    schema = JSONSchemer.schema(
      { title: 'A titled schema', additional_properties: false, type: 'object',
        properties: { title: { type: 'string ' } } }.stringify_keys, insert_property_defaults: true
    )
    expect(schema).to be_valid({ title: 'some title' })
  end

While I think the new behaviour is understandable - I don't think this should have been a point change - it completely breaks the contract!

I think it makes sense to have more clear errors (Although I do think there's certainly value in supporting symbol keys when it comes to ease of typing fixtures in Ruby).

But ideally the library should avoid releasing point versions with breaking changes like this - this should definetely be added as a spec to make sure future changes can't break the behaviour.

I'd be happy to contribute with a spec

@davishmcclurg
Copy link
Owner Author

Thanks for the comment @myxoh! It seems like this is causing people enough pain to revert to the previous behavior. It's unfortunate because symbol keys are not well supported, ie it only works if both the schema and data use the same type of keys.

I think it makes sense to have more clear errors (Although I do think there's certainly value in supporting symbol keys when it comes to ease of typing fixtures in Ruby).

I agree it would be nice to support symbols properly. Though I'm not sure exactly what that looks like. I've been trying to avoid HashWithIndifferentAccess but maybe that would make the most sense.

I'd be happy to contribute with a spec

That would be great. I believe the fix (if you don't mind contributing that as well) is to add a to_s in escape_json_pointer_token:

token.gsub(JSON_POINTER_TOKEN_ESCAPE_REGEXP, JSON_POINTER_TOKEN_ESCAPE_CHARS)

davishmcclurg added a commit that referenced this issue May 22, 2023
This uses a refinement to support indifferent access for user-provided
hashes. The interface is purposefully limited to prevent future use of
hash methods that may not work as expected. Refinements are tricky,
though, since you lose them whenever they're out of scope (eg, passing a
hash to `include?` doesn't call the refinement's `==` method).

I initially tried using activesupport's `HashWithIndifferentAccess` but
it doesn't work for data hashes because `insert_property_defaults` needs
to operate on the actual hash. Requiring all of activesupport just for
`HashWithIndifferentAccess` also felt like a pain.

The refinement works slightly different from `HashWithIndifferentAccess`
because it leaves the keys as whatever type they already are. That might
be a little confusing in error objects and `insert_property_defaults`
but at least it's consistent with whatever is passed in.

Symbol keys are mostly being tested by running the
json-schema-test-suite tests again with symbol keys (using
`symbolize_names: true` when parsing JSON). I also pulled those tests
into their own file and cleaned things up.

Addresses:

- #91
- #123
davishmcclurg added a commit that referenced this issue May 22, 2023
This uses a refinement to support indifferent access for user-provided
hashes. The interface is purposefully limited to prevent future use of
hash methods that may not work as expected. Refinements are tricky,
though, since you lose them whenever they're out of scope (eg, passing a
hash to `include?` doesn't call the refinement's `==` method).

I initially tried using activesupport's `HashWithIndifferentAccess` but
it doesn't work for data hashes because `insert_property_defaults` needs
to operate on the actual hash. Requiring all of activesupport just for
`HashWithIndifferentAccess` also felt like a pain.

The refinement works slightly different from `HashWithIndifferentAccess`
because it leaves the keys as whatever type they already are. That might
be a little confusing in error objects and `insert_property_defaults`
but at least it's consistent with whatever is passed in.

Symbol keys are mostly being tested by running the
json-schema-test-suite tests again with symbol keys (using
`symbolize_names: true` when parsing JSON). I also pulled those tests
into their own file and cleaned things up.

Addresses:

- #91
- #123
davishmcclurg added a commit that referenced this issue Jul 27, 2023
This deep stringifies input hashes in order to support symbol (and other
non-string) keys. Symbol keys have been a common issue for people
forever (as evidenced by `InvalidSymbolKey`) and I've been hesitant to
address it because duplicating hashes and stringifying keys hurts
performance, but I think it's probably worth it.

The tricky thing here is that `insert_property_defaults`,
`before_property_validation`, and `after_property_validation` need
access to the actual hashes instead of the stringified version. To work
around that, the original instance is passed around in `Context` and can
be accessed by location using `original_instance`. Values passed in
hooks need to be re-stringified in case the user added non-string keys.

If this becomes a performance bottleneck, it may make sense to add a way
to turn it off for people that know they're using string keys.

Related:

- #91
- #123
davishmcclurg added a commit that referenced this issue Jul 29, 2023
This deep stringifies input hashes in order to support symbol (and other
non-string) keys. Symbol keys have been a common issue for people
forever (as evidenced by `InvalidSymbolKey`) and I've been hesitant to
address it because duplicating hashes and stringifying keys hurts
performance, but I think it's probably worth it.

The tricky thing here is that `insert_property_defaults`,
`before_property_validation`, and `after_property_validation` need
access to the actual hashes instead of the stringified version. To work
around that, the original instance is passed around in `Context` and can
be accessed by location using `original_instance`. Values passed in
hooks need to be re-stringified in case the user added non-string keys.

If this becomes a performance bottleneck, it may make sense to add a way
to turn it off for people that know they're using string keys.

Related:

- #91
- #123
davishmcclurg added a commit that referenced this issue Jul 31, 2023
Features:

- Draft 2020-12 support
- Draft 2019-09 support
- Output formats
- Annotations
- OpenAPI 3.1 schema support
- OpenAPI 3.0 schema support
- `insert_property_defaults` in conditional subschemas
- Error messages
- Non-string schema and data keys

See individual commits for more details.

Closes:

- #27
- #44
- #55
- #91
- #94
- #116
- #123
davishmcclurg added a commit that referenced this issue Aug 1, 2023
Features:

- Draft 2020-12 support
- Draft 2019-09 support
- Output formats
- Annotations
- OpenAPI 3.1 schema support
- OpenAPI 3.0 schema support
- `insert_property_defaults` in conditional subschemas
- Error messages
- Non-string schema and data keys

See individual commits for more details.

Closes:

- #27
- #44
- #55
- #91
- #94
- #116
- #123
@davishmcclurg davishmcclurg mentioned this issue Aug 1, 2023
davishmcclurg added a commit that referenced this issue Aug 19, 2023
This deep stringifies input hashes in order to support symbol (and other
non-string) keys. Symbol keys have been a common issue for people
forever (as evidenced by `InvalidSymbolKey`) and I've been hesitant to
address it because duplicating hashes and stringifying keys hurts
performance, but I think it's probably worth it.

The tricky thing here is that `insert_property_defaults`,
`before_property_validation`, and `after_property_validation` need
access to the actual hashes instead of the stringified version. To work
around that, the original instance is passed around in `Context` and can
be accessed by location using `original_instance`. Values passed in
hooks need to be re-stringified in case the user added non-string keys.

If this becomes a performance bottleneck, it may make sense to add a way
to turn it off for people that know they're using string keys.

Related:

- #91
- #123
davishmcclurg added a commit that referenced this issue Aug 19, 2023
Features:

- Draft 2020-12 support
- Draft 2019-09 support
- Output formats
- Annotations
- OpenAPI 3.1 schema support
- OpenAPI 3.0 schema support
- `insert_property_defaults` in conditional subschemas
- Error messages
- Non-string schema and data keys
- Schema bundling

See individual commits for more details.

Closes:

- #27
- #44
- #55
- #91
- #94
- #116
- #123
- #136
davishmcclurg added a commit that referenced this issue Aug 20, 2023
This deep stringifies input hashes in order to support symbol (and other
non-string) keys. Symbol keys have been a common issue for people
forever (as evidenced by `InvalidSymbolKey`) and I've been hesitant to
address it because duplicating hashes and stringifying keys hurts
performance, but I think it's probably worth it.

The tricky thing here is that `insert_property_defaults`,
`before_property_validation`, and `after_property_validation` need
access to the actual hashes instead of the stringified version. To work
around that, the original instance is passed around in `Context` and can
be accessed by location using `original_instance`. Values passed in
hooks need to be re-stringified in case the user added non-string keys.

If this becomes a performance bottleneck, it may make sense to add a way
to turn it off for people that know they're using string keys.

Related:

- #91
- #123
davishmcclurg added a commit that referenced this issue Aug 20, 2023
Features:

- Draft 2020-12 support
- Draft 2019-09 support
- Output formats
- Annotations
- OpenAPI 3.1 schema support
- OpenAPI 3.0 schema support
- `insert_property_defaults` in conditional subschemas
- Error messages
- Non-string schema and data keys
- Schema bundling

See individual commits for more details.

Closes:

- #27
- #44
- #55
- #91
- #94
- #116
- #123
- #136
@davishmcclurg
Copy link
Owner Author

Released in 2.0.0. Data keys are deep-stringified to support symbols. Symbol values are not stringified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants