Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use csFastFloat instead of double.Parse #1745

Closed
CarlVerret opened this issue Mar 16, 2021 · 6 comments
Closed

Use csFastFloat instead of double.Parse #1745

CarlVerret opened this issue Mar 16, 2021 · 6 comments
Labels

Comments

@CarlVerret
Copy link

Hi Josh!

As CSVHelper is one of the fastest library available, we thought that you'd be interested to get even faster!

We recently published csFastFloat, a fast and accurate float parser. It is almost 7 times faster than the standard library in some cases while providing exact results. It is a C# port of Daniel Lemire's fast_float originaly written in C++.

Our benchmark demonstrates that replacing double.Parse with FastDoubleParser results in a real peformance improvement. Results are shown in million of float parsed per second.

We parsed both single and multiple columns files using CSVHelper (with custom DefaultTypeConverter) :

  • Canada.txt and mesh.txt are common data files used to test float parsing.
  • Syntethic.csv is composed of 150 000 random floats.
  • World cities population data (100k/300k) are real data obtained from OpenDataSoft.

csFastFloat is available as a NuGet package .
Benchmark repo can be found here

I'll be pleased to submit a Pull Request.

BenchmarkDotNet=v0.12.1, OS=ubuntu 20.04 (container)
AMD EPYC 7262, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.102
  [Host]        : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
  .NET Core 5.0 : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT

Job=.NET Core 5.0  Runtime=.NET Core 5.0  

|                                Method |               fileName | fileSize |      Mean |       Min | Ratio | MFloat/s |
|-------------------------------------- |----------------------- |--------- |----------:|----------:|------:|---------:|
|          'Double.Parse() - singlecol' |    TestData/canada.txt |     2088 |  84.20 ms |  83.62 ms |  1.00 |     1.33 |
| 'FastFloat.ParseDouble() - singlecol' |    TestData/canada.txt |     2088 |  41.44 ms |  41.00 ms |  0.49 |     2.71 |
|                                       |                        |          |           |           |       |          |
|          'Double.Parse() - singlecol' |      TestData/mesh.txt |      691 |  29.95 ms |  29.75 ms |  1.00 |     2.45 |
| 'FastFloat.ParseDouble() - singlecol' |      TestData/mesh.txt |      691 |  20.19 ms |  20.00 ms |  0.67 |     3.65 |
|                                       |                        |          |           |           |       |          |
|          'Double.Parse() - singlecol' | TestData/synthetic.csv |     2969 | 111.79 ms | 109.92 ms |  1.00 |     1.36 |
| 'FastFloat.ParseDouble() - singlecol' | TestData/synthetic.csv |     2969 |  54.57 ms |  53.86 ms |  0.49 |     2.79 |
|                                       |                        |          |           |           |       |          |
|           'Double.Parse() - multicol' |  TestData/w-c-100K.csv |     4842 | 187.54 ms | 185.87 ms |  1.00 |     1.08 |
|        'FastFloat.Parse() - multicol' |  TestData/w-c-100K.csv |     4842 | 166.85 ms | 163.05 ms |  0.89 |     1.23 |
|                                       |                        |          |           |           |       |          |
|           'Double.Parse() - multicol' |  TestData/w-c-300K.csv |    14526 | 593.10 ms | 579.37 ms |  1.00 |     1.04 |
|        'FastFloat.Parse() - multicol' |  TestData/w-c-300K.csv |    14526 | 502.65 ms | 494.41 ms |  0.85 |     1.21 |

@JoshClose
Copy link
Owner

That's quite impressive. I'm not sure I want to take on a dependency though. This would be a great thing to add to the contrib library. Someone could use a FastDoubleConverter if they wanted more speed. The contrib lib doesn't exist yet because no one has wanted add features they want in there. There is a repo though. https://github.com/CsvHelperContrib/CsvHelperContrib

Out of curiosity, why not submit a pull request to .NET and speed up the native implementation?

@CarlVerret
Copy link
Author

Thanks. It is already filed : dotnet/runtime#48646

@JoshClose
Copy link
Owner

I'll consider adding a converter for this in the contrib library. I don't have time at the moment, but I'll keep this open. I'm also watching the .NET framework issue you referenced.

@CarlVerret
Copy link
Author

Let me know if I can be of any help !

@jzabroski
Copy link

@CarlVerret It looks like this is merged in the runtime and thus can be closed as resolved for .net 7 dotnet/runtime#62301

@JoshClose
Copy link
Owner

Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants