Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Allow specifying indent size and whitespace character when writing JSON with Utf8JsonWriter #63882

Closed
noqn opened this issue Jan 17, 2022 · 26 comments · Fixed by #95292
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Milestone

Comments

@noqn
Copy link

noqn commented Jan 17, 2022

Background and motivation

This proposal would enable Utf8JsonWriter to write indented JSON with either spaces or tabs, and with a variable number of whitespace characters per indentation.

To preface, indentation size was brought up around two years ago in #1174. Although the area owners were open for input, there were little discussion at the time and the issue was closed. Now that more eyes are on System.Text.Json I am hoping to revisit the topic.


As is well known, people form strong opinions about indentation. In a context where the serialized JSON will be inspected or manually edited by the users, there will inevitably be a demand for the application to conform to their indentation preferences. Adding this functionality to System.Text.Json would in turn let application developers provide indentation settings to their users.

This would also be useful for when the serialized JSON must match the indentation of the source files which were deserialized e.g., for version control purposes.

API Proposal

namespace System.Text.Json
{
    public sealed class JsonSerializerOptions 
    {
        public char IndentCharacter { get; set; }
        public int IndentSize { get; set; }
    }
    
    public struct JsonWriterOptions 
    {
        public char IndentCharacter { get; set; }
        public int IndentSize { get; set; }
    }
}

IndentCharacter would define which whitespace character is used and IndentSize would define the number of whitespace characters that would be written per indentation level.

To match the behavior of JsonWriterOptions.MaxDepth, the default values of IndentCharacter ('\0') and IndentSize (0) would represent fallback values (which would be properly assigned in the Utf8JsonWriter constructor).

API Usage

var jsonWriterOptions = new JsonWriterOptions() 
{
    Indented = true,
    IndentCharacter = '\t',
    IndentSize = 1
};

var jsonSerializerOptions = new JsonSerializerOptions()
{
    WriteIndented = true,
    IndentSize = 4
};

Alternative Designs

For the sake of comparison, this is how Json.NET does it:

namespace Newtonsoft.Json 
{
    public class JsonTextWriter ...
    {
        public int Indentation { get; set; }
        public char IndentChar { get; set; }
    }
}

Json.NET's Indentation property combines the behavior of Indented and IndentSize by using the default value of 0 to disable formating altogether. Such a solution would however clash with our current API.

@noqn noqn added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jan 17, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Jan 17, 2022
@ghost
Copy link

ghost commented Jan 17, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

This proposal would enable Utf8JsonWriter to write indented JSON with either spaces or tabs, and with a variable number of whitespace characters per indentation.

To preface, indentation size was brought up around two years ago in #1174. Although the area owners were open for input, there were little discussion at the time and the issue was closed. Now that more eyes are on System.Text.Json I am hoping to revisit the topic.


As is well known, people form strong opinions about indentation. In a context where the serialized JSON will be inspected or manually edited by the users, there will inevitably be a demand for the application to conform to their indentation preferences. Adding this functionality to System.Text.Json would in turn let application developers provide indentation settings to their users.

This would also be useful for when the serialized JSON must match the indentation of the source files which were deserialized e.g., for version control purposes.

API Proposal

namespace System.Text.Json
{
     public sealed class JsonSerializerOptions 
     {
+         public bool IndentWithTabs { get; set; }
+         public int? IndentSize { get; set; }
     }

     public struct JsonWriterOptions 
     {
+         public bool IndentWithTabs { get; set; }
+         public int? IndentSize { get; set; }
     }
}

IndentWithTabs would define which whitespace character is used (false for spaces, true for tabs), and IndentSize would define the number of whitespace characters that would be inserted per indentation.

Declaring IndentSize as nullable allows us to use a fallback value (2) if IndentSize is not assigned to. Similarily, if IndentWithTabs is not assigned to, Utf8JsonWriter will use spaces. This ensures consistent behavior in applications where these properties had not been assigned to.

API Usage

var jsonWriterOptions = new JsonWriterOptions() 
{
    Indented = true,
    IndentWithTabs = true,
    IndentSize = 1
};

var jsonSerializerOptions = new JsonSerializerOptions()
{
    WriteIndented = true,
    IndentSize = 4
};

Alternative Designs

For the sake of comparison, this is how Json.NET does it:

namespace Newtonsoft.Json 
{
    public class JsonTextWriter ...
    {
        public int Indentation { get; set; }
        public char IndentChar { get; set; }
    }
}

Json.NET defines the indentation whitespace character with a char property. Since spaces and tabs are the only valid (non-newline) JSON whitespace characters, we can more elegantly cover all allowed values with a simple boolean.

Json.NET's Indentation property combines the behavior of Indented and IndentSize by using the default value of 0 to disable formating altogether. Such a solution would clash with our current API.

Risks

In #1174, the area owners of System.Text.Json understandably voiced concern over introducing complexity or performance degradation to the library. I would like to demonstrate that this proposal is achievable without such compromises.

The gist of it:

  • Add private properties byte IndentChar and int IndentSize to Utf8JsonWriter for getting the actual values from the options struct.
  • Replace references to JsonConstants.SpacesPerIndent with IndentSize.
  • Add a byte parameter to JsonWriterHelper.WriteIndentation for defining the whitespace character and pass IndentChar to it.
  • Fix Debug.Assert statements presuming we will always indent with 2 spaces.

(I'd also propose to rename JsonConstants.SpacesPerIndent to DefaultIndentSize.)

namespace System.Text.Json
{
    public class Utf8JsonWriter
    {
+       private int IndentSize => _options.IndentSize ?? JsonConstants.SpacesPerIndent;
+       private byte IndentChar => _options.IndentWithTabs ? JsonConstants.Tab : JsonConstants.Space;

-       private int Indentation => CurrentDepth * JsonConstants.SpacesPerIndent;
+       private int Indentation => CurrentDepth * IndentSize;

        private void WriteValueIndentedMethods(...)
        {
            int indent = Indentation;
-           Debug.Assert(indent <= 2 * JsonConstants.MaxWriterDepth);
+           Debug.Assert(indent <= IndentSize * JsonConstants.MaxWriterDepth);
            ...
-           JsonWriterHelper.WriteIndentation(output.Slice(BytesPending), indent);
+           JsonWriterHelper.WriteIndentation(output.Slice(BytesPending), indent, IndentChar);
        }
    }

    internal static class JsonWriterHelper
    {
-       public static void WriteIndentation(Span<byte> buffer, int indent)
+       public static void WriteIndentation(Span<byte> buffer, int indent, byte indentChar)
        {
-           Debug.Assert(indent % JsonConstants.SpacesPerIndent == 0);
            Debug.Assert(buffer.Length >= indent);

            if (indent < 8)
            {
-               int i = 0;
-               while (i < indent)
-               {
-                   buffer[i++] = JsonConstants.Space;
-                   buffer[i++] = JsonConstants.Space;
-               }
+               for (int i = 0; i < indent; i++)
+               {
+                   buffer[i] = indentChar;
+               }
            }
            else
            {
-               buffer.Slice(0, indent).Fill(JsonConstants.Space);
+               buffer.Slice(0, indent).Fill(indentChar);
            }
        }
    }
}
Author: noqn
Assignees: -
Labels:

api-suggestion, area-System.Text.Json, untriaged

Milestone: -

@eiriktsarpalis
Copy link
Member

Json.NET defines the indentation whitespace character with a char property. Since spaces and tabs are the only valid (non-newline) JSON whitespace characters, we can more elegantly cover all allowed values with a simple boolean.

I personally find the Json.NET API design to be cleaner, but we'd need to check if we can use that without sacrificing performance. We also need to make sure that the existing Indented property interacts correctly with the proposed IndentSize property (setting one should generally update the other).

In #1174, the area owners of System.Text.Json understandably voiced concern over introducing complexity or performance degradation to the library. I would like to demonstrate that this proposal is achievable without such compromises.

Would it be possible to share benchmark results comparing performance of a prototype vs the current implementation?

Slightly related: #54715

@eiriktsarpalis eiriktsarpalis added the needs-author-action An issue or pull request that requires more info or actions from the author. label Feb 3, 2022
@ghost
Copy link

ghost commented Feb 3, 2022

This issue has been marked needs-author-action since it may be missing important information. Please refer to our contribution guidelines for tips on how to report issues effectively.

@noqn
Copy link
Author

noqn commented Feb 8, 2022

Json.NET defines the indentation whitespace character with a char property. Since spaces and tabs are the only valid (non-newline) JSON whitespace characters, we can more elegantly cover all allowed values with a simple boolean.

I personally find the Json.NET API design to be cleaner, but we'd need to check if we can use that without sacrificing performance.

If we move away from bool IndentWithTabs, should we use a byte or char property for the indent character?

  • byte IndentCharacter might be preferable for performance since Utf8JsonWriter could use it as-is without casting.
  • char IndentCharacter on the other hand would be clearer and more convenient for the user.
// char IndentCharacter
var options = new JsonWriterOptions
{
    Indented = true,
    IndentCharacter = '\t'
};
// byte IndentCharacter
var options = new JsonWriterOptions
{
    Indented = true,
    IndentCharacter = (byte) '\t',
};

A third option is a char property in the public API that is backed by an internal byte property.
Utf8JsonWriter could access the byte property, without compromising the clarity of the public API.

Something like this:

public struct JsonWriterOptions
{
    internal byte IndentByte { get; private set; }

    public char IndentCharacter
    { 
        get => (char)IndentByte;
        set
        {
            if (value is not ' ' and not '\t' and not '\0') 
            {
                throw new ArgumentException("...");
            }

            IndentByte = (byte)value;
        }
    }
}

Perhaps a terrible idea, idk.

We also need to make sure that the existing Indented property interacts correctly with the proposed IndentSize property (setting one should generally update the other).

I think this might introduce confusion, for example

var options = new JsonWriterOptions();

options.Indented = true;  // IndentSize: 2 (JsonConstants.SpacesPerIndent)
options.IndentSize = 4;   // IndentSize: 4
options.Indented = false; // IndentSize: 0?
options.Indented = true;  // IndentSize: 2?

It would be preferable if setting one does not affect the other. If Indented is false, the value of IndentSize won't matter anyways.

Would it be possible to share benchmark results comparing performance of a prototype vs the current implementation?

Slightly related: #54715

I ran the Utf8JsonWriter suite in dotnet/performance: https://github.com/dotnet/performance/tree/main/src/benchmarks/micro/libraries/System.Text.Json/Utf8JsonWriter

Here's the output of the ResultComparison tool

68 executed benchmarks 

summary:
better: 8, geomean: 1.036
worse: 10, geomean: 1.177
total diff: 18
Slower diff/base Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Tests.Perf_Ctor.Ctor(Formatted: False, SkipValidation: False) 1.46 15.00 21.83
System.Text.Json.Tests.Perf_Ctor.Ctor(Formatted: True, SkipValidation: False) 1.44 15.25 22.03
System.Text.Json.Tests.Perf_Ctor.Ctor(Formatted: False, SkipValidation: True) 1.42 15.27 21.70
System.Text.Json.Tests.Perf_Ctor.Ctor(Formatted: True, SkipValidation: True) 1.41 15.44 21.83
System.Text.Json.Tests.Perf_Booleans.WriteBooleans(Formatted: False, SkipValidat 1.04 1596606.25 1667521.88
System.Text.Json.Tests.Perf_Strings.WriteStringsUtf16(Formatted: True, SkipValid 1.04 7734531.25 8055377.42
System.Text.Json.Tests.Perf_Strings.WriteStringsUtf16(Formatted: False, SkipVali 1.03 12569154.17 12942183.33
System.Text.Json.Tests.Perf_Strings.WriteStringsUtf16(Formatted: False, SkipVali 1.03 6868138.89 7044231.43
System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: True, SkipValidatio 1.02 2100920.17 2151054.78
System.Text.Json.Tests.Perf_Basic.WriteBasicUtf8(Formatted: False, SkipValidatio 1.02 1606743.59 1644223.53
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Tests.Perf_Booleans.WriteBooleans(Formatted: True, SkipValidati 1.05 2115260.16 2007234.77
System.Text.Json.Tests.Perf_Strings.WriteStringsUtf8(Formatted: True, SkipValida 1.05 4736417.31 4510072.73
System.Text.Json.Tests.Perf_Strings.WriteStringsUtf8(Formatted: True, SkipValida 1.04 9439903.85 9036607.69
System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: False, SkipValidati 1.04 1652555.92 1592353.50
System.Text.Json.Tests.Perf_Strings.WriteStringsUtf8(Formatted: True, SkipValida 1.03 63244625.00 61240750.00
System.Text.Json.Tests.Perf_Strings.WriteStringsUtf8(Formatted: True, SkipValida 1.03 4789055.77 4666719.81
System.Text.Json.Tests.Perf_Deep.WriteDeepUtf16(Formatted: False, SkipValidation 1.02 4623970.37 4514632.73
System.Text.Json.Tests.Perf_Strings.WriteStringsUtf8(Formatted: False, SkipValid 1.02 8717124.14 8523317.24

The Utf8JsonWriter constructor is slower due to setting its JsonWriterOptions' IndentSize and IndentCharacter to fallback values if they were left at default (what is done to JsonWriterOptions.MaxDepth).

The remaining 14 benchmarks with a performance difference >2% (out of 68 total) are split fairly evenly between slower and faster than the main branch. I presume my system is not producing very stable results in those benchmarks...

I imagined that the WriteDeepUtf8 benchmark might reflect accurately on changes to indentation logic, so I ran it again on the main branch, two branches which implements IndentSize and IndentCharacter respectively, and one which implements both.

Method Branch Formatted SkipValidation Mean Ratio Mean Error StdDev Median Ratio Median Min Max Allocated
WriteDeepUtf8 Main False False 1.000 4.724 ms 0.0111 ms 0.0104 ms 1.000 4.721 ms 4.708 ms 4.740 ms 345 B
WriteDeepUtf8 IndentChar + IndentSize False False 0.981 4.632 ms 0.0089 ms 0.0083 ms 0,981 4.630 ms 4.621 ms 4.646 ms 353 B
WriteDeepUtf8 IndentSize False False 0.968 4.571 ms 0.0057 ms 0.0053 ms 0,968 4.570 ms 4.562 ms 4.581 ms 353 B
WriteDeepUtf8 IndentChar False False 0.963 4.547 ms 0.0090 ms 0.0080 ms 0,963 4.546 ms 4.538 ms 4.567 ms 353 B
Method Branch Formatted SkipValidation Mean Ratio Mean Error StdDev Median Ratio Median Min Max Allocated
WriteDeepUtf8 Main False True 1.000 4.269 ms 0.0045 ms 0.0042 ms 1.000 4.269 ms 4.260 ms 4.277 ms 128 B
WriteDeepUtf8 IndentChar + IndentSize False True 1.005 4.292 ms 0.0057 ms 0.0054 ms 1,006 4.293 ms 4.284 ms 4.301 ms 136 B
WriteDeepUtf8 IndentSize False True 1.036 4.423 ms 0.0090 ms 0.0084 ms 1,036 4.424 ms 4.410 ms 4.438 ms 148 B
WriteDeepUtf8 IndentChar False True 1.051 4.485 ms 0.0064 ms 0.0057 ms 1,050 4.484 ms 4.475 ms 4.493 ms 137 B
Method Branch Formatted SkipValidation Mean Ratio Mean Error StdDev Median Ratio Median Min Max Allocated
WriteDeepUtf8 Main True False 1.000 41.355 ms 5.9859 ms 6.8933 ms 1.000 36.945 ms 36.344 ms 56.320 ms 480 B
WriteDeepUtf8 IndentChar + IndentSize True False 0.992 41.013 ms 6.3328 ms 7.2929 ms 0,985 36.394 ms 35.840 ms 57.018 ms 824 B
WriteDeepUtf8 IndentSize True False 1.002 41.450 ms 5.8866 ms 6.7790 ms 1,004 37.086 ms 36.137 ms 51.664 ms 488 B
WriteDeepUtf8 IndentChar True False 0.973 40.231 ms 4.4017 ms 5.0690 ms 0,990 36.563 ms 35.930 ms 47.736 ms 824 B
Method Branch Formatted SkipValidation Mean Ratio Mean Error StdDev Median Ratio Median Min Max Allocated
WriteDeepUtf8 Main True True 1.000 37.772 ms 1.8500 ms 2.1305 ms 1.000 36.752 ms 35.870 ms 43.240 ms 600 B
WriteDeepUtf8 IndentChar + IndentSize True True 1.081 40.835 ms 5.4765 ms 6.3067 ms 0,988 36.303 ms 36.083 ms 50.221 ms 608 B
WriteDeepUtf8 IndentSize True True 1.084 40.940 ms 5.3766 ms 6.1917 ms 1,007 36.997 ms 36.669 ms 50.426 ms 608 B
WriteDeepUtf8 IndentChar True True 1.090 41.183 ms 5.9331 ms 6.8325 ms 1,004 36.904 ms 36.165 ms 55.615 ms 608 B

Thoughts?

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Feb 8, 2022
@ds5678
Copy link

ds5678 commented Mar 4, 2022

+1

@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Mar 7, 2022
@layomia layomia added this to the Future milestone Mar 7, 2022
noqn added a commit to noqn/runtime that referenced this issue Mar 19, 2022
@noqn
Copy link
Author

noqn commented Mar 19, 2022

@layomia @eiriktsarpalis any more input?

Feel free to view the implementation here: noqn@e7a73b3

Current proposal:

namespace System.Text.Json
{
     public sealed class JsonSerializerOptions 
     {
+         public char IndentCharacter { get; set; }
+         public int IndentSize { get; set; }
     }

     public struct JsonWriterOptions 
     {
+         public char IndentCharacter { get; set; }
+         public int IndentSize { get; set; }
     }
}

Is there any chance this could be added to #63762? It seems realistic to implement in time for .NET 7

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Mar 23, 2022

Let me know if you want me to post the full benchmarks.

Yes please. Please also share benchmark results using statistical tests. We need to make sure no regressions are introduced in Utf8JsonWriter since it's part of the serialization hot path.

Is there any chance this could be added to #63762? It seems realistic to implement in time for .NET 7

It's unlikely we'll be able to drive API review for this in time for 7. If anything we'll probably end up cutting features from #63762.

@eiriktsarpalis eiriktsarpalis removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 23, 2022
@noqn
Copy link
Author

noqn commented Mar 29, 2022

Let me know if you want me to post the full benchmarks.

Yes please. Please also share benchmark results using statistical tests. We need to make sure no regressions are introduced in Utf8JsonWriter since it's part of the serialization hot path.

Thanks, here are the WriteDeep benchmarks results with DataSize decreased (from 100_000) to 10_000 for more stable results:


Method Job Toolchain Formatted SkipValidation Mean Error StdDev Median Min Max Ratio MannWhitney(2%) Allocated Alloc Ratio
WriteDeepUtf8 Job-ZOSJRV \dotnet-runtime-upstream\...\corerun.exe False False 473.2 μs 1.43 μs 1.26 μs 472.9 μs 471.6 μs 475.9 μs 1.00 Base 337 B 1.00
WriteDeepUtf8 Job-KXWMUA \dotnet-runtime\...\corerun.exe False False 475.2 μs 0.42 μs 0.39 μs 475.2 μs 474.6 μs 475.8 μs 1.00 Same 346 B 1.03
WriteDeepUtf16 Job-ZOSJRV \dotnet-runtime-upstream\...\corerun.exe False False 508.3 μs 0.58 μs 0.51 μs 508.4 μs 507.5 μs 509.3 μs 1.00 Base 337 B 1.00
WriteDeepUtf16 Job-KXWMUA \dotnet-runtime\...\corerun.exe False False 485.1 μs 0.94 μs 0.88 μs 484.8 μs 483.8 μs 486.8 μs 0.95 Faster 345 B 1.02
WriteDeepUtf8 Job-ZOSJRV \dotnet-runtime-upstream\...\corerun.exe False True 434.5 μs 0.54 μs 0.48 μs 434.6 μs 433.8 μs 435.2 μs 1.00 Base 120 B 1.00
WriteDeepUtf8 Job-KXWMUA \dotnet-runtime\...\corerun.exe False True 440.0 μs 0.61 μs 0.51 μs 440.0 μs 439.2 μs 440.8 μs 1.01 Same 129 B 1.07
WriteDeepUtf16 Job-ZOSJRV \dotnet-runtime-upstream\...\corerun.exe False True 452.9 μs 0.50 μs 0.44 μs 453.0 μs 451.9 μs 453.5 μs 1.00 Base 121 B 1.00
WriteDeepUtf16 Job-KXWMUA \dotnet-runtime\...\corerun.exe False True 454.7 μs 0.55 μs 0.52 μs 454.8 μs 453.7 μs 455.5 μs 1.00 Same 129 B 1.07
WriteDeepUtf8 Job-ZOSJRV \dotnet-runtime-upstream\...\corerun.exe True False 2,777.1 μs 14.37 μs 13.44 μs 2,775.2 μs 2,754.8 μs 2,799.0 μs 1.00 Base 341 B 1.00
WriteDeepUtf8 Job-KXWMUA \dotnet-runtime\...\corerun.exe True False 2,771.5 μs 13.58 μs 12.70 μs 2,768.8 μs 2,753.5 μs 2,794.0 μs 1.00 Same 356 B 1.04
WriteDeepUtf16 Job-ZOSJRV \dotnet-runtime-upstream\...\corerun.exe True False 2,770.8 μs 9.08 μs 8.49 μs 2,772.1 μs 2,758.6 μs 2,782.5 μs 1.00 Base 348 B 1.00
WriteDeepUtf16 Job-KXWMUA \dotnet-runtime\...\corerun.exe True False 2,773.4 μs 10.95 μs 10.24 μs 2,769.9 μs 2,762.4 μs 2,792.4 μs 1.00 Same 356 B 1.02
WriteDeepUtf8 Job-ZOSJRV \dotnet-runtime-upstream\...\corerun.exe True True 2,761.1 μs 14.45 μs 13.52 μs 2,763.8 μs 2,743.0 μs 2,782.1 μs 1.00 Base 122 B 1.00
WriteDeepUtf8 Job-KXWMUA \dotnet-runtime\...\corerun.exe True True 2,763.5 μs 11.53 μs 10.79 μs 2,767.3 μs 2,743.9 μs 2,779.9 μs 1.00 Same 133 B 1.09
WriteDeepUtf16 Job-ZOSJRV \dotnet-runtime-upstream\...\corerun.exe True True 2,764.1 μs 10.26 μs 9.60 μs 2,764.5 μs 2,748.2 μs 2,782.8 μs 1.00 Base 132 B 1.00
WriteDeepUtf16 Job-KXWMUA \dotnet-runtime\...\corerun.exe True True 2,761.1 μs 9.51 μs 8.90 μs 2,759.8 μs 2,749.1 μs 2,774.9 μs 1.00 Same 133 B 1.01

@eiriktsarpalis
Copy link
Member

Thanks. We can take a look at this in API review. Could you update your OP with updated API proposal?

@noqn
Copy link
Author

noqn commented Mar 29, 2022

Thanks. We can take a look at this in API review. Could you update your OP with updated API proposal?

Awesome, I've updated the proposal.

@noqn
Copy link
Author

noqn commented Jun 7, 2022

@eiriktsarpalis Can this be tagged with api-ready-for-review or did I miss something?

@eiriktsarpalis eiriktsarpalis added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Jun 7, 2022
@eiriktsarpalis
Copy link
Member

@noqn done.

@amercer-fdi
Copy link

Any movement on this? It's a strange inconsistency that when I open Visual Studio it formats JSON to 4 spaces but I can only Serialize to 2 spaces.

@noqn
Copy link
Author

noqn commented Mar 12, 2023

@eiriktsarpalis can this be moved to the 8.0 milestone? It has been stuck in the API review backlog for 9 months due to being in "future". Thanks.

@bartonjs
Copy link
Member

bartonjs commented May 18, 2023

Video

  • Because char+int is limiting (it doesn't allow anything with varied indentation, like "-> ", or anything outside the Basic Multilingual Plane) we combined them into one field.
  • While we typed the field as string, JsonEncodedText might be more appropriate
  • The property is proposed as non-nullable, defaulting to whatever the current indentation is (two spaces? four?)
namespace System.Text.Json
{
    public sealed class JsonSerializerOptions 
    {
        public string IndentText { get; set; }
    }
    
    public struct JsonWriterOptions 
    {
        public string IndentText { get; set; }
    }
}

@bartonjs bartonjs removed the api-ready-for-review API is ready for review, it is NOT ready for implementation label May 18, 2023
@noqn
Copy link
Author

noqn commented May 19, 2023

Hi, thanks for reviewing 😄

  • Because char+int is limiting (it doesn't allow anything with varied indentation, like "-> ", or anything outside the Basic Multilingual Plane) we combined them into one field.

I have major concerns about implementing IndentText without guardrails. JsonSerializer is strict about not writing invalid JSON and I firmly believe the user should only be allowed to indent with valid JSON whitespace characters (tabs and spaces - presuming we exclude CR & LF).

With that in mind, char IndentCharacter being limited to BMP is a non-issue if we would've put proper guardrails on IndentText anyways. Since that felt like the main argument in favor of the IndentText proposal, I think it's worth reconsidering the old proposal.

As someone mentioned in the review, it would be hard to parse (and adjust) the exact indentation with IndentText:

var jsonWriterOptions = new JsonWriterOptions() 
{
    Indented = true,
    IndentText = "        "
};

As opposed to:

var jsonWriterOptions = new JsonWriterOptions() 
{
    Indented = true,
    IndentSize = 8
};

Overall I feel like the old proposal is a more pleasing.

In terms of performance, defining a single indent char allows us to keep utilizing Span.Fill when writing the indentations. But this might be negligible, idk.


Someone mentioned in the review about defining the indent character as an enum. I like this since compared to the old proposal it makes it more clear to the users which chars are permitted. I'll write this down as an alternative proposal:

namespace System.Text.Json
{
    public sealed class JsonSerializerOptions 
    {
        public JsonIndentCharacter IndentCharacter { get; set; }
        public int IndentSize { get; set; }
    }
    
    public struct JsonWriterOptions 
    {
        public JsonIndentCharacter IndentCharacter { get; set; }
        public int IndentSize { get; set; }
    }

    public enum JsonIndentCharacter
    {
        Space,
        Tab
    }
}

Usage:

var jsonSerializerOptions = new JsonSerializerOptions()
{
    WriteIndented = true,
    IndentSize = 1
    IndentCharacter = JsonIndentCharacter.Tab
};

@JamesNK
Copy link
Member

JamesNK commented May 19, 2023

Because char+int is limiting (it doesn't allow anything with varied indentation, like "-> ", or anything outside the Basic Multilingual Plane) we combined them into one field.

FYI Newtonsoft.Json has Indentation and IndentChar. I don't remember anyone asking for something more complicated.

@eiriktsarpalis
Copy link
Member

var jsonWriterOptions = new JsonWriterOptions() 
{
    Indented = true,
    IndentText = "        "
};

That certainly seems a bit awkward to me. @bartonjs I think we should reconsider the design here.

@stephentoub stephentoub added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-approved API was approved in API review, it can be implemented labels May 19, 2023
@bartonjs
Copy link
Member

bartonjs commented Jul 27, 2023

Video

After a long discussion, we seem to have returned to using a single string property.

We did update the shape to account for the new source generator attribute, though.

namespace System.Text.Json
{
    public sealed class JsonSerializerOptions 
    {
        public string IndentText { get; set; }
    }
    
    public struct JsonWriterOptions 
    {
        public string IndentText { get; set; }
    }
}

namespace System.Text.Json.Serialization
{
    public partial class JsonSourceGenerationOptionsAttribute
    {
        public string IndentText { get; set; }
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jul 27, 2023
@eiriktsarpalis eiriktsarpalis modified the milestones: Future, 9.0.0 Jul 27, 2023
@eiriktsarpalis eiriktsarpalis removed the help wanted [up-for-grabs] Good issue for external contributors label Jul 27, 2023
@Neme12
Copy link

Neme12 commented Jul 29, 2023

I don't think 0 indent size (or empty IndentText) should be considered an invalid value or be translated to the default values. What if you want to write everything on a separate line, as if it was indented, but without any indentation? That would translate to WriteIndented being true and an indent size of 0. That seems like a perfectly valid and reasonable configuration.

I think it's fine if null IndentText represents the default, but an empty string should definitely be valid.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 27, 2023
@eiriktsarpalis eiriktsarpalis added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-approved API was approved in API review, it can be implemented labels Dec 6, 2023
@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Dec 6, 2023

After reviewing the implementation PR kindly provided by @manandre, I'm having more concerns about the API shape as approved (i.e. configuration accepting an indentation string vs. a char/length pair):

  1. The JSON spec only permits four whitespace characters: space, tab, CR and LF. With line breaks being an unlikely choice for an indentation string, from a flexibility perspective the only configuration that a string-based API admits is mixing and matching space and tab characters for indentation -- a scenario I find unlikely to be required by users.
  2. The IndentationString approach forces a lot of complexity on the serialization hot path: the current implementation uses a single span.Fill() call to write all indentation for a given line. In the implementation PR this has been replaced with this type which analyzes the indentation string and uses different strategies depending on the value being used. While we're still waiting on benchmark numbers, there's a good chance that this could regress performance for indented JSON serialization. A design using char/length configuration does not introduce such complexity.

Because of the above, I would again propose that we change the API shape to be as follows:

namespace System.Text.Json
{
    public sealed class JsonSerializerOptions
    {
        public char IndentCharacter { get; set; }
        public int IndentSize { get; set; }
    }

    public struct JsonWriterOptions
    {
        public char IndentCharacter { get; set; }
        public int IndentSize { get; set; }
    }
}

namespace System.Text.Json.Serialization
{
    public partial class JsonSourceGenerationOptionsAttribute
    {
        public char IndentCharacter { get; set; }
        public int IndentSize { get; set; }
    }
}

@terrajobst
Copy link
Contributor

terrajobst commented Dec 12, 2023

  • Looks good as proposed
  • The default is going to be a single space and indent size of 2
  • We should probably limit the inputs to something reasonable (e.g. character can be tab, space but no line breaks and intent size should be something like < 128).
namespace System.Text.Json
{
    public sealed class JsonSerializerOptions
    {
        public char IndentCharacter { get; set; }
        public int IndentSize { get; set; }
    }

    public struct JsonWriterOptions
    {
        public char IndentCharacter { get; set; }
        public int IndentSize { get; set; }
    }
}

namespace System.Text.Json.Serialization
{
    public partial class JsonSourceGenerationOptionsAttribute
    {
        public char IndentCharacter { get; set; }
        public int IndentSize { get; set; }
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Dec 12, 2023
@noqn
Copy link
Author

noqn commented Dec 13, 2023

Not meant to snatch this from under @manandre's nose, but since I had an implementation for the now-approved API laying around for ~2 years I figured I could just brush it off and make a pull request : )

@eiriktsarpalis
Copy link
Member

Thanks, but I think the nice thing to do here would be to give @manandre a chance to finish their work.

@noqn
Copy link
Author

noqn commented Dec 15, 2023

Hi. hope I don't come across as too blunt, but you asked me two years ago to implement and benchmark this proposal before it was even allowed for review, against the API review process guideline.

I've now published a pull request, which currently sits with all checks passed. Respectfully, there is no reason to wait around for another implementation that would have to be rewritten from scratch anyways, when I've had this implementation in working condition for two years. I'd greatly appreciate it if you started reviewing the pull request I've delivered. Thanks.

@eiriktsarpalis
Copy link
Member

Hi. hope I don't come across as too blunt, but you asked me two years ago to implement and benchmark this proposal

Respectfully, there is no reason to wait around for another implementation that would have to be rewritten from scratch anyways, when I've had this implementation in working condition for two years.

I understand, and we very much appreciate your prototyping effort which contributed to the final proposal being marked for review. However please keep in mind that:

  1. The team deals with literally hundreds of issues so it's really not easy for us to have visibility on discussions that transpired almost two years ago in this thread and whether production-ready prototypes exist in other peoples' forks for us to use.
  2. The API was first approved back in May, however the first PR attempting to implement it was Allow specifying IndentCharacter and IndentSize when writing JSON #95292.
  3. Multiple team members have expended review cycles and provided feedback on Allow specifying IndentCharacter and IndentSize when writing JSON #95292 which the author has been responding to. It is more fair, as well as better use of the team's time if we continue iterating on that PR.
  4. Ultimately, what matters is that the feature gets shipped in a way that works for everyone. Such features are almost always collective efforts in that they include the design discussions, prototyping work and final implementation. In any case, I'll personally make sure your name is credited when the feature gets announced.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 15, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Feb 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Projects
None yet
10 participants