-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Allow specifying indent size and whitespace character when writing JSON with Utf8JsonWriter #63882
Comments
Tagging subscribers to this area: @dotnet/area-system-text-json Issue DetailsBackground and motivationThis proposal would enable Utf8JsonWriter to write indented JSON with either spaces or tabs, and with a variable number of whitespace characters per indentation. To preface, indentation size was brought up around two years ago in #1174. Although the area owners were open for input, there were little discussion at the time and the issue was closed. Now that more eyes are on System.Text.Json I am hoping to revisit the topic. As is well known, people form strong opinions about indentation. In a context where the serialized JSON will be inspected or manually edited by the users, there will inevitably be a demand for the application to conform to their indentation preferences. Adding this functionality to System.Text.Json would in turn let application developers provide indentation settings to their users. This would also be useful for when the serialized JSON must match the indentation of the source files which were deserialized e.g., for version control purposes. API Proposalnamespace System.Text.Json
{
public sealed class JsonSerializerOptions
{
+ public bool IndentWithTabs { get; set; }
+ public int? IndentSize { get; set; }
}
public struct JsonWriterOptions
{
+ public bool IndentWithTabs { get; set; }
+ public int? IndentSize { get; set; }
}
}
Declaring API Usagevar jsonWriterOptions = new JsonWriterOptions()
{
Indented = true,
IndentWithTabs = true,
IndentSize = 1
};
var jsonSerializerOptions = new JsonSerializerOptions()
{
WriteIndented = true,
IndentSize = 4
}; Alternative DesignsFor the sake of comparison, this is how Json.NET does it: namespace Newtonsoft.Json
{
public class JsonTextWriter ...
{
public int Indentation { get; set; }
public char IndentChar { get; set; }
}
} Json.NET defines the indentation whitespace character with a char property. Since spaces and tabs are the only valid (non-newline) JSON whitespace characters, we can more elegantly cover all allowed values with a simple boolean. Json.NET's RisksIn #1174, the area owners of System.Text.Json understandably voiced concern over introducing complexity or performance degradation to the library. I would like to demonstrate that this proposal is achievable without such compromises. The gist of it:
(I'd also propose to rename namespace System.Text.Json
{
public class Utf8JsonWriter
{
+ private int IndentSize => _options.IndentSize ?? JsonConstants.SpacesPerIndent;
+ private byte IndentChar => _options.IndentWithTabs ? JsonConstants.Tab : JsonConstants.Space;
- private int Indentation => CurrentDepth * JsonConstants.SpacesPerIndent;
+ private int Indentation => CurrentDepth * IndentSize;
private void WriteValueIndentedMethods(...)
{
int indent = Indentation;
- Debug.Assert(indent <= 2 * JsonConstants.MaxWriterDepth);
+ Debug.Assert(indent <= IndentSize * JsonConstants.MaxWriterDepth);
...
- JsonWriterHelper.WriteIndentation(output.Slice(BytesPending), indent);
+ JsonWriterHelper.WriteIndentation(output.Slice(BytesPending), indent, IndentChar);
}
}
internal static class JsonWriterHelper
{
- public static void WriteIndentation(Span<byte> buffer, int indent)
+ public static void WriteIndentation(Span<byte> buffer, int indent, byte indentChar)
{
- Debug.Assert(indent % JsonConstants.SpacesPerIndent == 0);
Debug.Assert(buffer.Length >= indent);
if (indent < 8)
{
- int i = 0;
- while (i < indent)
- {
- buffer[i++] = JsonConstants.Space;
- buffer[i++] = JsonConstants.Space;
- }
+ for (int i = 0; i < indent; i++)
+ {
+ buffer[i] = indentChar;
+ }
}
else
{
- buffer.Slice(0, indent).Fill(JsonConstants.Space);
+ buffer.Slice(0, indent).Fill(indentChar);
}
}
}
}
|
I personally find the Json.NET API design to be cleaner, but we'd need to check if we can use that without sacrificing performance. We also need to make sure that the existing
Would it be possible to share benchmark results comparing performance of a prototype vs the current implementation? Slightly related: #54715 |
This issue has been marked |
If we move away from
// char IndentCharacter
var options = new JsonWriterOptions
{
Indented = true,
IndentCharacter = '\t'
}; // byte IndentCharacter
var options = new JsonWriterOptions
{
Indented = true,
IndentCharacter = (byte) '\t',
}; A third option is a Something like this: public struct JsonWriterOptions
{
internal byte IndentByte { get; private set; }
public char IndentCharacter
{
get => (char)IndentByte;
set
{
if (value is not ' ' and not '\t' and not '\0')
{
throw new ArgumentException("...");
}
IndentByte = (byte)value;
}
}
} Perhaps a terrible idea, idk.
I think this might introduce confusion, for example var options = new JsonWriterOptions();
options.Indented = true; // IndentSize: 2 (JsonConstants.SpacesPerIndent)
options.IndentSize = 4; // IndentSize: 4
options.Indented = false; // IndentSize: 0?
options.Indented = true; // IndentSize: 2? It would be preferable if setting one does not affect the other. If
I ran the Utf8JsonWriter suite in dotnet/performance: https://github.com/dotnet/performance/tree/main/src/benchmarks/micro/libraries/System.Text.Json/Utf8JsonWriter Here's the output of the ResultComparison tool
The The remaining 14 benchmarks with a performance difference >2% (out of 68 total) are split fairly evenly between slower and faster than the main branch. I presume my system is not producing very stable results in those benchmarks... I imagined that the WriteDeepUtf8 benchmark might reflect accurately on changes to indentation logic, so I ran it again on the main branch, two branches which implements
Thoughts? |
+1 |
@layomia @eiriktsarpalis any more input? Feel free to view the implementation here: noqn@e7a73b3 Current proposal: namespace System.Text.Json
{
public sealed class JsonSerializerOptions
{
+ public char IndentCharacter { get; set; }
+ public int IndentSize { get; set; }
}
public struct JsonWriterOptions
{
+ public char IndentCharacter { get; set; }
+ public int IndentSize { get; set; }
}
} Is there any chance this could be added to #63762? It seems realistic to implement in time for .NET 7 |
Yes please. Please also share benchmark results using statistical tests. We need to make sure no regressions are introduced in
It's unlikely we'll be able to drive API review for this in time for 7. If anything we'll probably end up cutting features from #63762. |
Thanks, here are the WriteDeep benchmarks results with DataSize decreased (from 100_000) to 10_000 for more stable results:
|
Thanks. We can take a look at this in API review. Could you update your OP with updated API proposal? |
Awesome, I've updated the proposal. |
@eiriktsarpalis Can this be tagged with |
@noqn done. |
Any movement on this? It's a strange inconsistency that when I open Visual Studio it formats JSON to 4 spaces but I can only Serialize to 2 spaces. |
@eiriktsarpalis can this be moved to the 8.0 milestone? It has been stuck in the API review backlog for 9 months due to being in "future". Thanks. |
namespace System.Text.Json
{
public sealed class JsonSerializerOptions
{
public string IndentText { get; set; }
}
public struct JsonWriterOptions
{
public string IndentText { get; set; }
}
} |
Hi, thanks for reviewing 😄
I have major concerns about implementing With that in mind, As someone mentioned in the review, it would be hard to parse (and adjust) the exact indentation with var jsonWriterOptions = new JsonWriterOptions()
{
Indented = true,
IndentText = " "
}; As opposed to: var jsonWriterOptions = new JsonWriterOptions()
{
Indented = true,
IndentSize = 8
}; Overall I feel like the old proposal is a more pleasing. In terms of performance, defining a single indent char allows us to keep utilizing Someone mentioned in the review about defining the indent character as an enum. I like this since compared to the old proposal it makes it more clear to the users which chars are permitted. I'll write this down as an alternative proposal: namespace System.Text.Json
{
public sealed class JsonSerializerOptions
{
public JsonIndentCharacter IndentCharacter { get; set; }
public int IndentSize { get; set; }
}
public struct JsonWriterOptions
{
public JsonIndentCharacter IndentCharacter { get; set; }
public int IndentSize { get; set; }
}
public enum JsonIndentCharacter
{
Space,
Tab
}
} Usage: var jsonSerializerOptions = new JsonSerializerOptions()
{
WriteIndented = true,
IndentSize = 1
IndentCharacter = JsonIndentCharacter.Tab
}; |
FYI Newtonsoft.Json has |
That certainly seems a bit awkward to me. @bartonjs I think we should reconsider the design here. |
After a long discussion, we seem to have returned to using a single string property. We did update the shape to account for the new source generator attribute, though. namespace System.Text.Json
{
public sealed class JsonSerializerOptions
{
public string IndentText { get; set; }
}
public struct JsonWriterOptions
{
public string IndentText { get; set; }
}
}
namespace System.Text.Json.Serialization
{
public partial class JsonSourceGenerationOptionsAttribute
{
public string IndentText { get; set; }
}
} |
I don't think 0 indent size (or empty I think it's fine if |
After reviewing the implementation PR kindly provided by @manandre, I'm having more concerns about the API shape as approved (i.e. configuration accepting an indentation string vs. a char/length pair):
Because of the above, I would again propose that we change the API shape to be as follows: namespace System.Text.Json
{
public sealed class JsonSerializerOptions
{
public char IndentCharacter { get; set; }
public int IndentSize { get; set; }
}
public struct JsonWriterOptions
{
public char IndentCharacter { get; set; }
public int IndentSize { get; set; }
}
}
namespace System.Text.Json.Serialization
{
public partial class JsonSourceGenerationOptionsAttribute
{
public char IndentCharacter { get; set; }
public int IndentSize { get; set; }
}
} |
namespace System.Text.Json
{
public sealed class JsonSerializerOptions
{
public char IndentCharacter { get; set; }
public int IndentSize { get; set; }
}
public struct JsonWriterOptions
{
public char IndentCharacter { get; set; }
public int IndentSize { get; set; }
}
}
namespace System.Text.Json.Serialization
{
public partial class JsonSourceGenerationOptionsAttribute
{
public char IndentCharacter { get; set; }
public int IndentSize { get; set; }
}
} |
Not meant to snatch this from under @manandre's nose, but since I had an implementation for the now-approved API laying around for ~2 years I figured I could just brush it off and make a pull request : ) |
Thanks, but I think the nice thing to do here would be to give @manandre a chance to finish their work. |
Hi. hope I don't come across as too blunt, but you asked me two years ago to implement and benchmark this proposal before it was even allowed for review, against the API review process guideline. I've now published a pull request, which currently sits with all checks passed. Respectfully, there is no reason to wait around for another implementation that would have to be rewritten from scratch anyways, when I've had this implementation in working condition for two years. I'd greatly appreciate it if you started reviewing the pull request I've delivered. Thanks. |
I understand, and we very much appreciate your prototyping effort which contributed to the final proposal being marked for review. However please keep in mind that:
|
Background and motivation
This proposal would enable Utf8JsonWriter to write indented JSON with either spaces or tabs, and with a variable number of whitespace characters per indentation.
To preface, indentation size was brought up around two years ago in #1174. Although the area owners were open for input, there were little discussion at the time and the issue was closed. Now that more eyes are on System.Text.Json I am hoping to revisit the topic.
As is well known, people form strong opinions about indentation. In a context where the serialized JSON will be inspected or manually edited by the users, there will inevitably be a demand for the application to conform to their indentation preferences. Adding this functionality to System.Text.Json would in turn let application developers provide indentation settings to their users.
This would also be useful for when the serialized JSON must match the indentation of the source files which were deserialized e.g., for version control purposes.
API Proposal
IndentCharacter
would define which whitespace character is used andIndentSize
would define the number of whitespace characters that would be written per indentation level.To match the behavior of
JsonWriterOptions.MaxDepth
, the default values ofIndentCharacter
('\0'
) andIndentSize
(0
) would represent fallback values (which would be properly assigned in theUtf8JsonWriter
constructor).API Usage
Alternative Designs
For the sake of comparison, this is how Json.NET does it:
Json.NET's
Indentation
property combines the behavior ofIndented
andIndentSize
by using the default value of 0 to disable formating altogether. Such a solution would however clash with our current API.The text was updated successfully, but these errors were encountered: