Including FastFloat in parsing process #62301

CarlVerret · 2021-12-02T21:58:41Z

This PR adds a new fast path to the processing of the floating-point numbers. It is used when the existing fast path (Clinger's) fails to apply, but before the existing 'slow path' is used. There's a corresponding issue ##48646.

To assess the performance effect, we have three realistic data sets (canada.txt, mesh.txt and synthetic.txt).

' This PR 
|         Method |      FileName |      Mean |     Error |    StdDev |    Median |       Min |       Max | MFloat/s |
|--------------- |-------------- |----------:|----------:|----------:|----------:|----------:|----------:|---------:|
| Double.Parse() |    canada.txt | 14.301 ms | 0.0465 ms | 0.0389 ms | 14.308 ms | 14.239 ms | 14.372 ms |     7,80 |
| Double.Parse() |      mesh.txt |  5.081 ms | 0.0130 ms | 0.0122 ms |  5.086 ms |  5.058 ms |  5.102 ms |    14,44 |
| Double.Parse() | synthetic.txt | 18.323 ms | 0.0356 ms | 0.0315 ms | 18.325 ms | 18.265 ms | 18.364 ms |     8,21 |

' Upstream main
|         Method |      FileName |      Mean |     Error |    StdDev |    Median |       Min |       Max | MFloat/s |
|--------------- |-------------- |----------:|----------:|----------:|----------:|----------:|----------:|---------:|
| Double.Parse() |    canada.txt | 29.391 ms | 0.0505 ms | 0.0448 ms | 29.385 ms | 29.322 ms | 29.464 ms |     3,79 |
| Double.Parse() |      mesh.txt |  4.924 ms | 0.0153 ms | 0.0128 ms |  4.924 ms |  4.905 ms |  4.943 ms |    14,89 |
| Double.Parse() | synthetic.txt | 39.016 ms | 0.0423 ms | 0.0353 ms | 39.012 ms | 38.961 ms | 39.070 ms |     3,85 |

The mesh.txt data set is a scenario where we rely on Clinger's fast path so we would not expect this PR to improve performance in that case. And indeed, we see no significant difference.

For canada.txt and synthetic.txt, this PR brings substantial gains: about 2x. Further gains are possible as demonstrated by the csFastFloat library, but they would be maybe more invasive and are maybe best done by distinct pull requests.

links :
csFastFloat repo : https://github.com/CarlVerret/csFastFloat
Test data can be found here : https://github.com/CarlVerret/csFastFloat/tree/master/Benchmark/data

Benchmarks had been executed on my personnal computer
Intel Core i5-5900 2.90GHz / 12 GB Ram
under Windows 11

@EgorBo @tannergooding @lemire

ghost · 2021-12-02T21:58:49Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR is a contribution to runtime regarding issue ##48646 to insert Lemire's Fast-float algorithm to doube/single/half parsing process.

Steps in the draft will be :

Integration of fast-float algorithm with minimum impact on the actual code for the 3 data types.
Prevention of bound checking for table lookup (powers of 5/10)
String parsing optimisation with SIMD vectorization

@EgorBo @tannergooding @lemire

Author:	CarlVerret
Assignees:	-
Labels:	`area-System.Numerics`, `community-contribution`
Milestone:	-

dnfadmin · 2021-12-02T21:58:53Z

All CLA requirements met.

EgorBo · 2021-12-03T09:00:53Z

Awesome!
Looks like some asserts are fired:

Process terminated. Assertion failed.
   at System.Number.DigitsToUInt64(Byte* p, Int32 count) in /_/src/libraries/System.Private.CoreLib/src/System/Number.NumberToFloatingPointBits.cs:line 1021

src/libraries/System.Private.CoreLib/src/System/Number.NumberToFloatingPointBits.cs

CarlVerret · 2021-12-03T21:54:31Z

Awesome! Looks like some asserts are fired:

Process terminated. Assertion failed.
   at System.Number.DigitsToUInt64(Byte* p, Int32 count) in /_/src/libraries/System.Private.CoreLib/src/System/Number.NumberToFloatingPointBits.cs:line 1021

yes, it looks like something' trying to read above 19 digits...

src/libraries/System.Private.CoreLib/src/System/Number.NumberToFloatingPointBits.cs

…oFloatingPointBits.cs Co-authored-by: Günther Foidl <[email protected]>

src/libraries/System.Private.CoreLib/src/System/Number.NumberToFloatingPointBits.cs

lemire · 2021-12-06T18:43:24Z

@EgorBo @CarlVerret

A lot of the original code has been transformed to look like heap allocations (new ...). This is especially true of the constants...

https://github.com/CarlVerret/csFastFloat/blob/master/csFastFloat/Constants/Constants.cs

I am no C# expert but representing constant arrays as heap-allocated content seems likely to trigger heap allocations during the execution of the code which will surely harm the performance.

Of course, maybe this gets optimized away. But are we sure it is the case?

tannergooding · 2021-12-06T18:50:32Z

@lemire is correct and these should stay as static readonly until the language support for the ReadOnlySpan<T> pattern is added for other primitive types.

lemire · 2021-12-06T20:21:07Z

Reading the comments, I understand that @EgorBo suggested writing the code for the upcoming language support. Of course, this might imply that the current draft PR should be expected to have suboptimal performance with the current language support.

…Verret/runtime into 48646-FastFloat-integration

This reverts commit 1b5fd51.

CarlVerret · 2021-12-07T13:12:03Z

I ran existing benchmarks on both double and float : comparing main branch with this PR

This PR:

| Method |                    value |      Mean |    Error |   StdDev |    Median |       Min |       Max | Allocated |
|------- |------------------------- |----------:|---------:|---------:|----------:|----------:|----------:|----------:|
|  Parse | -1,7976931348623157e+308 | 149.68 ns | 0.929 ns | 0.869 ns | 149.31 ns | 148.28 ns | 150.93 ns |         - |
|  Parse |  1,7976931348623157e+308 | 149.58 ns | 1.713 ns | 1.603 ns | 150.13 ns | 146.93 ns | 152.03 ns |         - |
|  Parse |                    12345 |  63.43 ns | 0.109 ns | 0.091 ns |  63.44 ns |  63.30 ns |  63.58 ns |         - |


Main :

| Method |                    value |      Mean |    Error |   StdDev |    Median |       Min |       Max | Allocated |
|------- |------------------------- |----------:|---------:|---------:|----------:|----------:|----------:|----------:|
|  Parse | -1,7976931348623157e+308 | 388.55 ns | 3.163 ns | 2.959 ns | 387.14 ns | 385.30 ns | 395.41 ns |         - |
|  Parse |  1,7976931348623157e+308 | 388.25 ns | 3.735 ns | 3.311 ns | 386.89 ns | 385.13 ns | 396.69 ns |         - |
|  Parse |                    12345 |  62.62 ns | 0.338 ns | 0.316 ns |  62.49 ns |  62.33 ns |  63.18 ns |         - |


Single

This PR :

| Method |          value |      Mean |    Error |   StdDev |    Median |       Min |       Max | Allocated |
|------- |--------------- |----------:|---------:|---------:|----------:|----------:|----------:|----------:|
|  Parse | -3,4028235E+38 | 111.35 ns | 0.886 ns | 0.829 ns | 111.33 ns | 110.06 ns | 113.02 ns |         - |
|  Parse |          12345 |  64.66 ns | 0.333 ns | 0.295 ns |  64.59 ns |  64.21 ns |  65.27 ns |         - |
|  Parse |  3,4028235E+38 | 111.72 ns | 0.838 ns | 0.784 ns | 111.52 ns | 110.68 ns | 112.92 ns |         - |

Main :

| Method |          value |      Mean |    Error |   StdDev |    Median |       Min |       Max | Allocated |
|------- |--------------- |----------:|---------:|---------:|----------:|----------:|----------:|----------:|
|  Parse | -3,4028235E+38 | 150.71 ns | 1.136 ns | 1.063 ns | 150.64 ns | 149.16 ns | 152.24 ns |         - |
|  Parse |          12345 |  63.86 ns | 0.550 ns | 0.515 ns |  63.85 ns |  63.31 ns |  64.95 ns |         - |
|  Parse |  3,4028235E+38 | 148.15 ns | 0.646 ns | 0.604 ns | 147.80 ns | 147.51 ns | 149.51 ns |         - |

stephentoub · 2021-12-07T15:09:22Z

I understand that @EgorBo suggested writing the code for the upcoming language support. Of course, this might imply that the current draft PR should be expected to have suboptimal performance with the current language support.

I agree with @tannergooding and @lemire that we should stick with static readonly array fields until compiler support is actually in place. At that point, there are many other places we'll be updating, and we can just update these as part of that.

src/libraries/System.Private.CoreLib/src/System/Number.NumberToFloatingPointBits.cs

CarlVerret · 2022-02-03T23:25:52Z

Thank you Tanner. It has been a nice learning occasion for me.

tannergooding · 2022-02-04T00:03:11Z

Thank you as well, excited to get this in once CI is passing (the OSX failure was unrelated; I've not looked at the other 2 yet).

CarlVerret · 2022-02-07T15:01:15Z

Thank you as well, excited to get this in once CI is passing (the OSX failure was unrelated; I've not looked at the other 2 yet).

is there something I can do to help with failing checks ?

tannergooding · 2022-02-07T15:37:12Z

We need to ensure they aren't related and ideally get CI passing.

It looks like there is an arm32 timeout that should have been resolved in #63357
and a WASM failure that doesn't look familiar (possibly #64759 ?)

If you push a merge commit integrating the current dotnet/main then it should pick up the fixes that have gone in and will ensure the rest of CI can finish.

I'm decently confident that the remaining failures are not related to your changes here and are instead CI issues on our side, so I can also take care of pushing the relevant merge commit if you would prefer

danmoseley · 2022-02-07T15:53:50Z

Simply clicking close then reopen will restart validation on rebased code of course. Assuming no conflicts.

danmoseley · 2022-02-07T15:54:54Z

@tannergooding historically you ran I think an extra JavaScript test corpus - is that necessary too?

tannergooding · 2022-02-07T16:09:48Z

@danmoseley, those run implicitly now: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Runtime/tests/System/DoubleTests.cs#L378-L419

There are always more test suites we could run on top, but I wouldn't expect they would catch anything additional here.

Simply clicking close then reopen will restart validation on rebased code of course. Assuming no conflicts.

I've seen issues with that in the past and have found pushing a merge commit to be much more reliable.

CarlVerret · 2022-02-07T16:51:58Z

just to make sure I'm doing the right thing here : you mean pulling the main into my branch then push back my branch to update this PR?

tannergooding · 2022-02-07T17:01:44Z

you mean pulling the main into my branch then push back my branch to update this PR?

Right, something roughly equivalent to:

git fetch --all
git merge dotnet/main
git push

Where dotnet/main could also be called upstream/main or something similar (whatever matches git remote -v and gives you https://github.com/dotnet/runtime)

tannergooding · 2022-02-08T19:33:41Z

OSX failures are #65000.

Going to merge this, thanks a lot for the contribution @CarlVerret (and @lemire for the original algorithm/base implementation)!

CarlVerret · 2022-02-08T21:17:55Z

@tannergooding maybe just a little question regarding this PR. Will this contribution be available only for a future version (7.x) of the .net core framework or it will be also included for previous versions ? (5.x, 6.x, etc...) Is there an ETA for a public release of the runtime including this PR ?

tannergooding · 2022-02-08T21:26:14Z

Only on future versions of .NET (7.x and later), we don't tend to backport perf improvements to past releases (even LTS) due to the risk.

This should be available in nightly runtime builds as soon as later today (https://github.com/dotnet/runtime/blob/main/docs/project/dogfooding.md has some more info on how to try out nightly builds) and in the nightly SDK likely in a few days (same link has info on this as well).

It should ship publicly in .NET 7 Preview 2

CarlVerret · 2022-02-08T21:35:30Z

Fine, thanks! I was asking because csFastFloat was built upon .net core 5 and I'd like to run my benchmarks again to compare with 7.x. The goal is to evaluate the impact of another contribution we might suggest for the parsing process.

danmoseley · 2022-02-08T21:41:35Z

I probably missed it, but do the inputs to our existing benchmarks in dotnet/performance suffice here? Or should we extend with some of the inputs that proved this change? (your point taken about functional tests)

tannergooding · 2022-02-08T21:44:07Z

but do the inputs to our existing benchmarks in dotnet/performance suffice here?

There are lots of improvements we could potentially make to the perf tests, but the current should be at least somewhat representative.

I'd be more interested in what things like ML.NET show with this change.

EgorBo · 2022-02-15T16:29:18Z

Nice improvements: dotnet/perf-autofiling-issues#3531 and dotnet/perf-autofiling-issues#3543 cc @CarlVerret

EgorBo · 2022-02-15T16:48:31Z

Potential regression: https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fSystem.Buffers.Text.Tests.Utf8ParserTests.TryParseSingle(value%3a%2012345).html

tannergooding · 2022-02-15T16:52:16Z

Noting that the improvement is for long strings and is going from 300ns to 100ns while the regression is for a short string and is going from 35ns to 37ns.

This regression is acceptable, but we should do some minimal investigation to see what is slower here (its probably just a branch or some other minor thing and if its trivially "fixable", then we should do so; but its not worth spending a lot of time on it).

CarlVerret · 2022-02-15T17:38:16Z

While programming the PR, i've tried to identify what could cause this little regression without any luck. I am really interested if you can provide some insight on how to fix that kind of variation.

tannergooding · 2022-02-15T17:44:00Z

I'll run this through AMD uProf (https://developer.amd.com/amd-uprof/) and see if I can see anything interesting.

Including FastFloat in parsing process

0b22f35

ghost added the community-contribution Indicates that the PR has been added by a community member label Dec 2, 2021

dotnet-issue-labeler bot added the area-System.Numerics label Dec 2, 2021

EgorBo reviewed Dec 3, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Number.NumberToFloatingPointBits.cs Show resolved Hide resolved

gfoidl reviewed Dec 3, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Number.NumberToFloatingPointBits.cs Outdated Show resolved Hide resolved

CarlVerret and others added 2 commits December 3, 2021 20:37

Merge branch 'dotnet:main' into 48646-FastFloat-integration

9e04dd0

PR step 1 - adjusting code in regards to received comments.

ba592e0

CarlVerret commented Dec 4, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Number.NumberToFloatingPointBits.cs Outdated Show resolved Hide resolved

DigitsToUInt64 : Parsing batches of 8 digits with SWAR

c1bf2df

gfoidl reviewed Dec 4, 2021

View reviewed changes

CarlVerret and others added 2 commits December 4, 2021 16:29

Update src/libraries/System.Private.CoreLib/src/System/Number.NumberT…

93a4caa

…oFloatingPointBits.cs Co-authored-by: Günther Foidl <[email protected]>

Update src/libraries/System.Private.CoreLib/src/System/Number.NumberT…

458476e

…oFloatingPointBits.cs Co-authored-by: Günther Foidl <[email protected]>

CarlVerret marked this pull request as ready for review December 5, 2021 13:56

CarlVerret marked this pull request as draft December 5, 2021 18:54

lemire reviewed Dec 6, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Number.NumberToFloatingPointBits.cs Outdated Show resolved Hide resolved

lemire reviewed Dec 6, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Number.NumberToFloatingPointBits.cs Outdated Show resolved Hide resolved

CarlVerret added 4 commits December 6, 2021 19:41

MaxMantissaFastPath fix

a47557d

Merge branch '48646-FastFloat-integration' of https://github.com/Carl…

a288f30

…Verret/runtime into 48646-FastFloat-integration

merge problem...

1b5fd51

Revert "merge problem..."

5ff2eef

This reverts commit 1b5fd51.

lemire reviewed Dec 7, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Number.NumberToFloatingPointBits.cs Outdated Show resolved Hide resolved

Handle endianness swapping for BigEndian systems.

c86080b

tannergooding approved these changes Feb 3, 2022

View reviewed changes

runfoapp bot mentioned this pull request Feb 4, 2022

[wasm][aot] System.Text.Json tests fail while linking due to OOM #61524

Closed

Merge branch 'dotnet:main' into 48646-FastFloat-integration

5d0efb9

This was referenced Feb 8, 2022

System.Net.NetworkInformation.Tests.PingTest.SendPingWithHostAndTimeoutAndBuffer failing on OSX #64963

Closed

ThreadPoolTests.CooperativeBlockingCanCreateThreadsFaster timed out #64964

Closed

tannergooding merged commit a702712 into dotnet:main Feb 8, 2022

CarlVerret deleted the 48646-FastFloat-integration branch February 8, 2022 21:20

EgorBo mentioned this pull request Feb 17, 2022

[Perf] Changes at 2/8/2022 11:58:21 PM dotnet/perf-autofiling-issues#3563

Closed

ghost locked as resolved and limited conversation to collaborators Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Including FastFloat in parsing process #62301

Including FastFloat in parsing process #62301

CarlVerret commented Dec 2, 2021 •

edited

Loading

ghost commented Dec 2, 2021

dnfadmin commented Dec 2, 2021 •

edited

Loading

EgorBo commented Dec 3, 2021

CarlVerret commented Dec 3, 2021

lemire commented Dec 6, 2021

tannergooding commented Dec 6, 2021

lemire commented Dec 6, 2021

CarlVerret commented Dec 7, 2021 •

edited

Loading

stephentoub commented Dec 7, 2021

CarlVerret commented Feb 3, 2022

tannergooding commented Feb 4, 2022

CarlVerret commented Feb 7, 2022

tannergooding commented Feb 7, 2022

danmoseley commented Feb 7, 2022

danmoseley commented Feb 7, 2022

tannergooding commented Feb 7, 2022

CarlVerret commented Feb 7, 2022

tannergooding commented Feb 7, 2022

tannergooding commented Feb 8, 2022

CarlVerret commented Feb 8, 2022

tannergooding commented Feb 8, 2022

CarlVerret commented Feb 8, 2022

danmoseley commented Feb 8, 2022

tannergooding commented Feb 8, 2022

EgorBo commented Feb 15, 2022 •

edited

Loading

EgorBo commented Feb 15, 2022

tannergooding commented Feb 15, 2022

CarlVerret commented Feb 15, 2022

tannergooding commented Feb 15, 2022

Including FastFloat in parsing process #62301

Including FastFloat in parsing process #62301

Conversation

CarlVerret commented Dec 2, 2021 • edited Loading

ghost commented Dec 2, 2021

dnfadmin commented Dec 2, 2021 • edited Loading

EgorBo commented Dec 3, 2021

CarlVerret commented Dec 3, 2021

lemire commented Dec 6, 2021

tannergooding commented Dec 6, 2021

lemire commented Dec 6, 2021

CarlVerret commented Dec 7, 2021 • edited Loading

stephentoub commented Dec 7, 2021

CarlVerret commented Feb 3, 2022

tannergooding commented Feb 4, 2022

CarlVerret commented Feb 7, 2022

tannergooding commented Feb 7, 2022

danmoseley commented Feb 7, 2022

danmoseley commented Feb 7, 2022

tannergooding commented Feb 7, 2022

CarlVerret commented Feb 7, 2022

tannergooding commented Feb 7, 2022

tannergooding commented Feb 8, 2022

CarlVerret commented Feb 8, 2022

tannergooding commented Feb 8, 2022

CarlVerret commented Feb 8, 2022

danmoseley commented Feb 8, 2022

tannergooding commented Feb 8, 2022

EgorBo commented Feb 15, 2022 • edited Loading

EgorBo commented Feb 15, 2022

tannergooding commented Feb 15, 2022

CarlVerret commented Feb 15, 2022

tannergooding commented Feb 15, 2022

CarlVerret commented Dec 2, 2021 •

edited

Loading

dnfadmin commented Dec 2, 2021 •

edited

Loading

CarlVerret commented Dec 7, 2021 •

edited

Loading

EgorBo commented Feb 15, 2022 •

edited

Loading