Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove serializer todos and add comments #37170

Merged
merged 2 commits into from
Jun 2, 2020

Conversation

steveharter
Copy link
Member

Non-functional changes. Remove the "todos" and add appropriate comments.

fixes #32356
fixes #32358

@steveharter steveharter added this to the 5.0 milestone May 29, 2020
@steveharter steveharter requested a review from layomia May 29, 2020 18:50
@steveharter steveharter requested a review from jozkee as a code owner May 29, 2020 18:50
@steveharter steveharter self-assigned this May 29, 2020
@@ -96,15 +96,12 @@ public JsonPropertyInfo GetPolymorphicJsonPropertyInfo()
/// </summary>
public JsonConverter InitializeReEntry(Type type, JsonSerializerOptions options, string? propertyName = null)
{
JsonClassInfo newClassInfo = options.GetOrAddClass(type);

// todo: check if type==newtype and skip below?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping the logic below would only have a minimal impact on perf, so it is not worth the check \ risk of doing that.

@@ -111,8 +130,6 @@ private static async Task WriteAsyncCore<TValue>(Stream utf8Json, TValue value,
bufferWriter.Clear();
} while (!isFinalBlock);
}

// todo: verify that we do want to call FlushAsync here (or above). It seems like leaving it to the caller would be best.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any notes on this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added during the original design\coding when we discussed a possible callback model (caller must call flush) instead of auto-flush. I don't believe we received any further feedback to add that capability.

{
// We flush the Stream when the buffer is >=90% of capacity.
// With the default buffer size of 16K, if a single JSON value is >=1.6K it is possible for
// the buffer to max out before it can be flushed, causing the buffer to expand.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to understand the magic behind the 90% number. The sentence would make just as much sense if it said "We flush the Stream when the buffer is >= 80% of capacity ... if a single JSON value is >= 3.2K it is possible..."... so how was the 90% chosen? Below it says it's a compromise... was this just based on experimentation and this value found to be ideal for a wide-variety of scenarios?

Copy link
Member Author

@steveharter steveharter Jun 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look at adding some more explanation.

Note that we do check for flushing as much as reasonable since we check for flushing after every property and element is written to the buffer, not whole objects or whole arrays.

Background on tradeoffs:

  • The initial buffer size (smaller takes up less RAM). We are currently optimizing for payloads < 16K (the default buffer size).
  • Minimizing buffer doubling. There are two reasons why the buffer is doubled:
    • To get the buffer big enough to handle the largest single element\property.
    • To deal with the buffer being partially used from smaller prior elements\properties that are not yet flushed. The current algorithm is flush at >= .9.
  • Buffer utilization % -- i.e. flushing on every small property write would be wasteful for memory utilization plus the underlying Stream may not appreciate having Flush() called so often.

Consider a scenario of reading in large sets of fixed-size elements, such as bytes arrays. Assuming a single JSON element fits into the current buffer already:

  • If threshold is 1-(1\(2^0)) == 0 there would be no doubling (flush every time)
  • If threshold is 1-(1\(2^1)) == .5 there would be at most 1 doubling
  • If threshold is 1-(1\(2^2)) == .75 there would be at most 2 doublings
  • If threshold is 1-(1\(2^3)) == .875 there would be at most 3 doublings
  • If threshold is 1-(1\(2^4)) == .9375 there would be at most 4 doublings

So at .9 we sit in the "4 doublings" which also leaves some wiggle room for JSON token overhead and other smaller, "normal" properties\elements. We haven't received any direct reports of this threshold causing perf issues.

Some other options considered at the time the original todo was created:

  • Add a flush threshold knob (in addition to the existing buffer size knob) on the options class.
  • Add a heuristic and auto-control the % based on that (such as guessing based on largest prior size).

Both seem like overkill because:

  • There is already a knob on the options class to specify the default buffer size. It is useful to increase this if you know certain payloads are going to be large to avoid some doubling or to adjust to the underlying Stream's optimal buffer size (if it also has a buffer).
  • Since the buffer size is automatically doubled, the buffer will ultimately expand a max of 4 times (for fixed-size elements\properties) which seems like a reasonable number.

Finally we also considered a callback mechanism to have the caller do the flush, instead of "auto-flushing" like we do today. So far we haven't received any requests for that (AFAIK).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details.

@steveharter steveharter merged commit 8fd2f08 into dotnet:master Jun 2, 2020
@steveharter steveharter deleted the SerApiTodo branch June 2, 2020 18:44
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants