Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support generating JsonSchema for Polymorphic fields. #3986

Closed
wants to merge 2 commits into from

Conversation

christophstrobl
Copy link
Member

Using polymorphic elements within the domain model can lead to inaccurate schema representation for Object and generic <T> types, which are likely to represented as { type : 'object' } without further specification.
MongoJsonSchemaCreator.specify(...) allows to define additional types that should be considered when rendering the schema.

public class Root {
	Object value;
}

public class A {
	String aValue;
}

public class B {
	String bValue;
}

MongoJsonSchemaCreator.create()
    .specify("value").types(A.class, B.class)
    ...
{
    'type' : 'object',
    'properties' : {
        'value' : {
            'type' : 'object',
            'properties' : {
                'aValue' : { 'type' : 'string' },
                'bValue' : { 'type' : 'string' }
            }
        }
    }
}

MongoDBs schema free approach allows to store documents of different structure in one collection.
Those may be modelled having a common base class.
Regardless of the chosen approach MongoJsonSchemaCreator.combine(...) is can help circumvent the need of combining multiple schema into one.

public abstract class Root {
	String rootValue;
}

public class A extends Root {
	String aValue;
}

public class B extends Root {
	String bValue;
}

MongoJsonSchemaCreator.combined(A.class, B.class) 
{
    'type' : 'object',
       'properties' : {
           'rootValue' : { 'type' : 'string' },
           'aValue' : { 'type' : 'string' },
           'bValue' : { 'type' : 'string' }
       }
    }
}

Equally named properties need to refer to the same json schema in order to be combined.
The following example shows a definition that cannot be combined automatically because of a data type mismatch.
In this case a ConflictResolutionFunction has to be provided to MongoJsonSchemaCreator.

public class A extends Root {
	String value;
}

public class B extends Root {
	Integer value;
}

This commit introduces CombinedJsonSchema and CombinedJsonSchemaProperty that can be used to merge properties of multiple objects into one as long as the additions do not conflict with another (eg. due to usage of different types).
To resolve previously mentioned errors it is required to provide aa ConflictResolutionFunction.
@christophstrobl christophstrobl requested a review from mp911de March 8, 2022 06:27
@mp911de mp911de self-assigned this Mar 18, 2022
@mp911de mp911de marked this pull request as ready for review March 18, 2022 08:25
@mp911de mp911de added this to the 3.4 M4 (2021.2.0) milestone Mar 18, 2022
mp911de pushed a commit that referenced this pull request Mar 18, 2022
This commit introduces MergedJsonSchema and MergedJsonSchemaProperty that can be used to merge properties of multiple objects into one as long as the additions do not conflict with another (eg. due to usage of different types).
To resolve previously mentioned errors it is required to provide a ConflictResolutionFunction.

Closes #3870
Original pull request: #3986.
mp911de added a commit that referenced this pull request Mar 18, 2022
Refine API naming towards merge/property instead of combine/specify. Tweak documentation. Introduce Resolution.ofValue(…) for easier creation.

See #3870
Original pull request: #3986.
mp911de pushed a commit that referenced this pull request Mar 18, 2022
This commit introduces MergedJsonSchema and MergedJsonSchemaProperty that can be used to merge properties of multiple objects into one as long as the additions do not conflict with another (eg. due to usage of different types).
To resolve previously mentioned errors it is required to provide a ConflictResolutionFunction.

Closes #3870
Original pull request: #3986.
mp911de added a commit that referenced this pull request Mar 18, 2022
Refine API naming towards merge/property instead of combine/specify. Tweak documentation. Introduce Resolution.ofValue(…) for easier creation.

See #3870
Original pull request: #3986.
@mp911de
Copy link
Member

mp911de commented Mar 18, 2022

That's merged, polished and forward-ported now.

@mp911de mp911de closed this Mar 18, 2022
@mp911de mp911de deleted the issue/3870 branch March 18, 2022 13:16
@Pastissad
Copy link

Hello,

Thank you for working on that issue.

If I understand correctly, with that implementation, you'd still need to manually specify subclasses or interfaces implementations that you might encounter in your model.
That seems a little counter productive with the idea of having a json schema automatically generated.
As per the documentation, the base usage would be to generate an encryption schema for the model at startup to push that to mongocryptd. If you need to maintain both that custom specification in addition to your model, that kind of defeat the purpose of having it generated automatically to begin with.

I'd have a hard time selling that solution to a team with the instruction to keep in mind to not forget to update that schema generation configuration up to date or risk exposing sensitive data due to the lack of expected encryption.

Isn't there a solution that would avoid doing that manually ?

@mp911de
Copy link
Member

mp911de commented Apr 1, 2022

That seems a little counter productive with the idea of having a json schema automatically generated.

The utility helps with generating a proper descriptor along its Document structure to express schema rules. From your comment I understand that you want to use schema generation to feed the encryption configuration.

Generally speaking, using polymorphism in the context of encryption opens up a class of bugs where encryption details may not be considered because subtypes are no longer considered if the configuration goes wrong (just think of package renames where the base package to scan isn't changed). From that perspective, one should rather not use subtypes that contain details to be encrypted or move the encrypted fields into the parent type.

Isn't there a solution that would avoid doing that manually ?

Each type that can have subtypes for a property would require scanning. When it comes to scanning, the typical arrangements consider base packages and include/exclude filters. In a typical MongoDB setup, subdocuments are not annotated so each property that points to a complex type needs to initiate a class path scan to identify subtypes. Because of automation, any missing types can easily go unnoticed and that isn't something that you immediately identify as missing encryption.

Therefore we recommend the usage of the schema generation utility to generate a schema document and render it into a file. Keep the schema file along with your resources on the classpath and test it to make sure that the fields you want to encrypt are going to be encrypted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants