Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2 - loading the document requires knowing the format in advance #1918

Closed
baywet opened this issue Nov 12, 2024 · 5 comments · Fixed by #1975
Closed

v2 - loading the document requires knowing the format in advance #1918

baywet opened this issue Nov 12, 2024 · 5 comments · Fixed by #1975
Labels
priority:p0 Blocking issue/ loss of critical functions. An ICM may be filed to communicate urgency. SLA<=48hrs
Milestone

Comments

@baywet
Copy link
Member

baywet commented Nov 12, 2024

new OpenApiStreamReader().Read(stream, out var diagnostic);

(or its async variant)

Did not require knowing the format of the document (JSON/YAML) in advance. This was handy as when building general purpose tools, it's not something that I know in advance. Parsing the file extension/url extension/response content type header might help discovering it, but it's not reliable.

The equivalent method

OpenApiDocument.LoadAsync(input, "unknown", settings)

Requires passing the format, which I believe is a regression in functionality. We should work to restore the same level of functionality.

Also, there are a number of places in the documentation and doc comments that still refer to the old initialization and require to be updated before GA.

https://github.com/search?q=repo%3Amicrosoft%2FOpenAPI.NET%20OpenApiStreamReader&type=code

@baywet baywet added the priority:p0 Blocking issue/ loss of critical functions. An ICM may be filed to communicate urgency. SLA<=48hrs label Nov 12, 2024
@darrelmiller
Copy link
Member

We could create an overload that looks like this:

OpenApiDocument.LoadAsync(input, settings);

We first try and parse as JSON. If the JSON fails, we can say "either you have bad JSON, or you gave me YAML, and you have not registered the YAML reader, in which case you need to take a dependency on the OpenAPI.Readers package, and register the YAML reader."

  1. Try and parse as JSON
  2. If it fails, check to see if YAML parser is registered
  3. If it is try parsing with the YAML parser

@MaggieKimani1
Copy link
Contributor

MaggieKimani1 commented Nov 25, 2024

We could create an overload that looks like this:

OpenApiDocument.LoadAsync(input, settings);

We first try and parse as JSON. If the JSON fails, we can say "either you have bad JSON, or you gave me YAML, and you have not registered the YAML reader, in which case you need to take a dependency on the OpenAPI.Readers package, and register the YAML reader."

  1. Try and parse as JSON
  2. If it fails, check to see if YAML parser is registered
  3. If it is try parsing with the YAML parser

This approach seems a bit problematic especially for large YAML documents. Once JSON parsing fails here:

try
{
jsonNode = LoadJsonNodes(input);
}
catch (JsonException ex)
{

Assuming we attempt the YAML parsing within the catch block, we can't reset the TextReader's input position once it shifts as it doesn't support seeking and the input has either been partially or fully consumed at this point leaving no content for the YAML parser.

We could try buffering the content into memory for reuse but this becomes expensive especially for large YAML files.
I'd suggest we check whether the YAML parser has been registered already in the factory then use it right off the bat before using the JSON parser.
Thoughts?

@baywet
Copy link
Member Author

baywet commented Nov 25, 2024

Thank you for the additional information.

I don't understand why we to direct the parsing between two different parsing logics:

The YAML 1.23 specification was published in 2009. Its primary focus was making YAML a strict superset of JSON

source

We could simply feed everything to the YAML parser, couldn't we? It it happens to "only be JSON" there's extra parsing logic that would be used. It might have a performance impact since yaml parsing ultimately converts back to JSON nodes.

It's also worth noting this is what the library used to do before the recent changes.

Then for people who really care about the performance impact, we could still leave in place the overloads that accept a format argument, bypassing this overhead. Also for any method that accepts a URI, we could look at the response content type and call the right format parser.

Thoughts?

@darrelmiller
Copy link
Member

@baywet Parsing Graph beta as JSON takes ~450ms. Using YamlSharp it take around 5secs. I think that is a difference we should try harder to take advantage of. Especially because many consumers of the library will not tell us what format they are sending.

@Maggie. Maybe we should revisit sniffing the content. We might be able to "peek" at the first few bytes without consuming the content. Also, after reviewing the full set of APIs yesterday I have a few suggested changes. One of them includes making the TextReader go away. I don't think it is useful for JSON. Scenarios. We may be able to support it just in the OpenAPIYamlReader.

@MaggieKimani1
Copy link
Contributor

@baywet Parsing Graph beta as JSON takes ~450ms. Using YamlSharp it take around 5secs. I think that is a difference we should try harder to take advantage of. Especially because many consumers of the library will not tell us what format they are sending.

@Maggie. Maybe we should revisit sniffing the content. We might be able to "peek" at the first few bytes without consuming the content. Also, after reviewing the full set of APIs yesterday I have a few suggested changes. One of them includes making the TextReader go away. I don't think it is useful for JSON. Scenarios. We may be able to support it just in the OpenAPIYamlReader.

Then I think we can continue reviewing the work I did in this PR which includes sniffing the content to detect the format #1929

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:p0 Blocking issue/ loss of critical functions. An ICM may be filed to communicate urgency. SLA<=48hrs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants