Optimize floating-point formatting #1426

vitaut · 2019-11-27T16:00:11Z

{fmt} is now ~10% faster than double-conversion and 15x faster than libc++ std::ostringstream on floating-point formatting according to the dtoa-benchmark but there is still a lot of room for improvement.

#1097 gives some optimization ideas.

The text was updated successfully, but these errors were encountered:

erthink · 2020-02-19T17:58:52Z

Related to #1559 (There is a three times faster double-to-string implementation).

SanderBouwhuis · 2020-03-30T19:05:52Z

Has this materialized? I use a LOT of those conversions, so this would be good news.
Having said that, I'm currently using MSVC printf, so presumably converting to fmt would give me a speed boost even without this(?)

vitaut · 2020-03-30T19:18:20Z

No but {fmt}'s floating-point formatter is already quite fast (see e.g. the benchmark results in https://github.com/fmtlib/fmt/releases/tag/6.1.0) so if you switch from printf you should see perf improvement.

SanderBouwhuis · 2020-03-30T19:32:09Z

Thanks.

Should I use v6.1.0 which is 'somewhat' old, or can I use the nightlies?
I.e., are the nightlies safe, or are there regressions or known bugs?
Is fmt still fast if I supply a wchar_t buffer that fmt should use? Everything in my source code makes use of wchar_t arrays, and I want a 'smooth' / nearly in-place transition of sprintf to fmt.
I asked Erthink whether (s)he would consider donating the faster code to fmt.

vitaut · 2020-03-31T01:02:15Z

Should I use v6.1.0 which is 'somewhat' old, or can I use the nightlies?
I.e., are the nightlies safe, or are there regressions or known bugs?

I recommend using 6.1.2 or master which includes the same optimization as 6.1.0 + some fixes.

Is fmt still fast if I supply a wchar_t buffer that fmt should use?

Yes.

SanderBouwhuis · 2020-03-31T06:55:25Z

@vitaut, maybe fmt could use the new charconv in STL? They use Ryu which is (according to Stephan T. Lavavej, the implementer) 2x to 3x as fast as doubleconv (which you seem to be more or less on par with).
This would make the implementation quite clean, albeit a bit slower than Erthink's optimization. Also, because STL gets vetted MUCH better than any other library, I trust it more.

vitaut · 2020-03-31T13:45:46Z

maybe fmt could use the new charconv in STL?

It could be possible once charconv becomes widely available. Note that Ryu is only marginally faster than optimized Grisu (or slower than some implementations) on independent benchmarks with random input and has several times larger data tables. Grisu also has nice property that it's faster on short output. In real applications the difference will be negligible because of other factors.

SanderBouwhuis · 2020-03-31T14:48:23Z

Ok, awesome. I've started making the switch to fmtlib. Thanks a BUNCH for this library!

vitaut · 2020-06-20T15:41:36Z

{fmt}'s Grisu3 is now on par or better than Milo Yip's and the original Grisu2 implementations despite providing shortness guarantees:

but there is still a lot of optimization opportunities.

vitaut · 2020-06-21T18:57:30Z

@data-man, sorry for deleting your comment - trying to avoid links to spammy repos.

vitaut · 2020-09-14T20:03:17Z

Another option is to integrate https://github.com/jk-jeon/dragonbox by @jk-jeon. The only problem here is introducing of another license.

SanderBouwhuis · 2020-09-14T22:00:32Z

FMT currently has the permissive MIT license, which means just about everyone in the world can use it. If more licenses are piled on, this project will become largely useless to many people in the world (I'm looking at you GPL3).
So, please keep the MIT license if at all possible!

jk-jeon · 2020-09-15T01:35:09Z

@SanderBouwhuis I don't have any good knowledge on licensing stuffs, but BSL (Boost Software License) is even more permissive license than MIT. But probably having to depend on a library with a different license can be a problem, and might degrade the current easiness of integration?

@vitaut I'm afraid dragonbox is not a perfect fit for fmt as it does not provide the option for fixed-precision conversion. Not like Grisu3, it does not perform any iterative search for the shortest representation. Rather, it somehow finds the shortest decimal number with just one trial-and-error, so it would be more difficult (if not impossible) to modify it for fixed-precision conversion. If you want to have separate implementations for shortest-roundtrip and fixed-precision, then I can consider rewriting the whole thing (excluding the features that fmt does not want) and license it under MIT, but having two implementations feels like a big trade-off.

vitaut · 2020-09-15T14:06:54Z

it does not provide the option for fixed-precision conversion.

That's OK, {fmt} can continue using Grisu3+Dragon4 for this.

If you want to have separate implementations for shortest-roundtrip and fixed-precision, then I can consider rewriting the whole thing (excluding the features that fmt does not want) and license it under MIT

If dragonbox or some part of it could be distributed under the same license as {fmt} (MIT with an optional exception for binary distribution) that would be awesome.

jk-jeon · 2020-09-15T17:56:33Z

That's OK, {fmt} can continue using Grisu3+Dragon4 for this.

Great😁 Let's start discussing what's needed and what's not needed then. Let's cross out big things first. Could you let me know the things you think worth keeping among these?

Support for various rounding modes.
Various options for what to do when there is tie for correct rounding
Full (10kb) cache table for double
Compressed (1.1kb) cache table for double
Conversion routine for float

(I didn't implement compressed cache version for float, but it would be not difficult, so you can consider using that version as well. I expect that the trade-offs between size vs speed should be smaller in terms of percentages to that for double's)

There are also stuffs I made for possible extensions to 16-bit, 128-bit, or 80-bit floating-point numbers. The thing is, it is not possible to just look at the type and figure out what encoding format the type is internally using, because it depends very, very much on the specific platform. Especially 128-bit or 80-bit float's are kind of messy. Hence, the idea was to eliminate explicit mentions to float or double completely and allow users to provide details of encoding format of their types by themselves.

But if you care only about "sane" platforms where float is IEEE-754 binary32 and double is IEEE-754 binary64, and if you are happy to just use the fallback for other formats, then these stuffs can be removed as well.

Given that lots of things in the list above will be removed, I expect that it would be not so painful to port the code so that the list you wrote here can stay the same.

And about licensing. I believe it should be okay to copy-and-paste-and-modify my own code and distribute it under a different license. But to be sure, I'll write a question on software engineering stack exchange.

Thanks!

jk-jeon · 2020-09-15T21:28:37Z

I did some experiment on how much fmt can be made faster by adapting dragonbox. The following (and some namespace hack to avoid name clash) is the change I made to fmt to do this experiment:

if (precision < 0) {
#if 0
    fp fp_value;
    auto boundaries = specs.binary32
                          ? fp_value.assign_float_with_boundaries(value)
                          : fp_value.assign_with_boundaries(value);
    fp_value = normalize(fp_value);
    // Find a cached power of 10 such that multiplying value by it will bring
    // the exponent in the range [min_exp, -32].
    const fp cached_pow = get_cached_power(
        min_exp - (fp_value.e + fp::significand_size), cached_exp10);
    // Multiply value and boundaries by the cached power of 10.
    fp_value = fp_value * cached_pow;
    boundaries.lower = multiply(boundaries.lower, cached_pow.f);
    boundaries.upper = multiply(boundaries.upper, cached_pow.f);
    assert(min_exp <= fp_value.e && fp_value.e <= -32);
    --boundaries.lower;  // \tilde{M}^- - 1 ulp -> M^-_{\downarrow}.
    ++boundaries.upper;  // \tilde{M}^+ + 1 ulp -> M^+_{\uparrow}.
    // Numbers outside of (lower, upper) definitely do not round to value.
    grisu_shortest_handler handler{buf.data(), 0,
                                   boundaries.upper - fp_value.f};
    auto result =
        grisu_gen_digits(fp(boundaries.upper, fp_value.e),
                         boundaries.upper - boundaries.lower, exp, handler);
    if (result == digits::error) {
      exp += handler.size - cached_exp10 - 1;
      fallback_format(value, -1, specs.binary32, buf, exp);
      return exp;
    }
    buf.try_resize(to_unsigned(handler.size));
#else
    // full cache verion
    auto dec = jkj::dragonbox::to_decimal(value,
      jkj::dragonbox::policy::sign::ignore);
    // compressed cache version
    /*auto dec = jkj::dragonbox::to_decimal(value,
      jkj::dragonbox::policy::sign::ignore,
      jkj::dragonbox::policy::cache::compressed);*/
    int length = detail::count_digits(dec.significand);
    buf.try_resize(to_unsigned(length));
    uint32_t dec_significand;
    int i = 0;
    if (dec.significand >= 100000000) {
      dec_significand = uint32_t(dec.significand / 100000000);
      uint32_t r = uint32_t(dec.significand) - 100000000 * dec_significand;
      uint32_t r1 = r / 10000;
      uint32_t r2 = r % 10000;
      uint32_t a = r1 / 100;
      uint32_t b = r1 % 100;
      uint32_t c = r2 / 100;
      uint32_t d = r2 % 100;
      
      copy2(buf.data() + length - 8, data::digits[a]);
      copy2(buf.data() + length - 6, data::digits[b]);
      copy2(buf.data() + length - 4, data::digits[c]);
      copy2(buf.data() + length - 2, data::digits[d]);
      i += 8;
    }
    else {
      dec_significand = uint32_t(dec.significand);
    }
    while (dec_significand >= 10000) {
        uint32_t r = dec_significand % 10000;
        dec_significand /= 10000;

        copy2(buf.data() + length - i - 4, data::digits[r / 100]);
        copy2(buf.data() + length - i - 2, data::digits[r % 100]);
        i += 4;
    }
    if (dec_significand >= 100) {
        uint32_t r = dec_significand % 100;
        dec_significand /= 100;

        copy2(buf.data() + length - i - 2, data::digits[r]);
        i += 2;
    }
    if (dec_significand >= 10) {
        copy2(buf.data() + length - i - 2, data::digits[dec_significand]);
    }
    else {
        *buf.data() = char('0' + dec_significand);
    }
    
    return dec.exponent;
#endif
  }

(format-inl.h, in function format_float)

The following is the result:

(Source code: https://github.com/jk-jeon/dtoa-benchmark/tree/fmt_dragonbox_test)

So the speed-up is about 30% for the compressed version, and is about 50% for the full version. I think the main reason why it is slower than the corresponding upstream versions of dragonbox is unnecessary intermediate buffer copies for prettifying. But in order to fix that maybe we should change the code a lot.

erthink · 2020-09-15T21:51:56Z

@jk-jeon, It would be great if you added my implementation to your benchmarks.
This will allow me and other participants to evaluate my code (currently, it seems to me to be the most compact and about the same performance as the best competitors.).

jk-jeon · 2020-09-15T22:28:41Z

@jk-jeon, It would be great if you added my implementation to your benchmarks.
This will allow me and other participants to evaluate my code (currently, it seems to me to be the most compact and about the same performance as the best competitors.).

Good. I included yours and reran benchmark with clang-cl (which better optimizes virtually all algorithms contained in the benchmark than msvc):

But I'm not sure why yours do not perform as advertised in your repo. Did you use some intrinsics that GCC knows but Clang doesn't? Or might be because your code wrongly assumes that the compiler is msvc as _MSC_VER is defined?

FYI: here is the result of the same benchmark compiled with msvc:

erthink · 2020-09-15T23:03:34Z

Good. I included yours and reran benchmark

Thanks a lot.

But I'm not sure why yours do not perform as advertised in your repo. Did you use some intrinsics that GCC knows but Clang doesn't?

Hmm, well, I was definitely not trying to cheat...
I think it depends significantly on the compiler (and the specific version), libc version, and the CPU.

Instantly I runs with gcc 9.3.0 (Ubuntu 9.3.0-10ubuntu2) on the i7-4600U CPU @ 2.10GHz:

Function	Min ns	RMS ns	Max ns	Sum ns	Speedup
null	1.5	1.506	1.6	25.6	×609.2
erthink	25.4	40.358	53.1	671.0	×23.2
erthink_shodan	27.8	43.428	60.3	723.8	×21.5
ryu	45.2	60.106	73.6	1010.0	×15.4
milo	41.2	62.357	75.5	1050.1	×14.9
emyg	41.8	62.714	75.8	1056.2	×14.8
floaxie	29.5	76.898	101.1	1242.8	×12.5
grisu2	71.5	88.986	108.4	1502.6	×10.4
fmt	62.0	92.026	121.2	1544.6	×10.1
doubleconv	72.2	113.108	145.3	1898.0	×8.2
fpconv	107.8	156.169	179.4	2634.1	×5.9
stb	157.0	160.634	163.8	2730.6	×5.7
sprintf	842.0	918.346	979.0	15594.7	×1.0
ostrstream	1286.4	1359.673	1419.4	23103.9	×0.7
ostringstream	1408.3	1499.231	1580.1	25467.3	×0.6

So, I see fmt now is x10 faster than sprint, rather than x7.7 as before (gcc 9.2, fedora 30, i7-7820HQ).

Or might be because your code wrongly assumes that the compiler is msvc as _MSC_VER is defined?

I don't use MSVC for my daily work, much less clang-cl (additional toolchain from MSVC-2019).
However, as far as I remember, I made some corrections somewhere to define the compiler as MSVC if _MSC_VER is defined and __clang__ is not defined.

erthink · 2020-09-15T23:20:39Z

clang version 10.0.0 (Fedora 10.0.0-2.fc32) on the Fedora 32, i7-7820HQ CPU

Function	Min ns	RMS ns	Max ns	Sum ns	Speedup
null	1.2	1.200	1.2	20.4	×651.0
erthink	37.1	50.521	62.6	849.9	×15.6
ryu	38.1	50.891	64.2	854.2	×15.5
erthink_shodan	38.9	53.804	70.6	904.1	×14.7
emyg	46.9	65.674	79.9	1108.4	×12.0
floaxie	27.0	72.207	97.5	1157.8	×11.5
fmt	57.2	74.883	100.5	1262.0	×10.5
milo	53.4	77.634	99.7	1303.7	×10.2
doubleconv	65.1	99.226	129.3	1664.6	×8.0
grisu2	99.2	116.581	136.0	1974.6	×6.7
stb	126.6	129.911	133.9	2208.3	×6.0
fpconv	103.6	144.810	167.9	2445.6	×5.4
sprintf	710.3	782.257	840.9	13279.6	×1.0
ostrstream	1011.0	1079.121	1136.0	18332.3	×0.7
ostringstream	1159.5	1247.095	1326.8	21177.7	×0.6

gcc version 10.2.1 20200723 (Red Hat 10.2.1-1) on the same machine (on the Fedora 32, i7-7820HQ CPU)

Function	Min ns	RMS ns	Max ns	Sum ns	Speedup
null	1.2	1.206	1.3	20.5	×648.2
erthink	22.8	34.331	44.8	573.6	×23.2
erthink_shodan	26.5	40.953	58.4	680.5	×19.5
ryu	36.6	49.335	62.8	828.7	×16.0
emyg	40.1	59.694	71.9	1005.6	×13.2
milo	39.9	62.649	81.2	1048.5	×12.7
floaxie	24.9	68.544	91.9	1099.9	×12.1
fmt	56.6	82.959	110.2	1393.4	×9.5
doubleconv	74.6	115.409	145.6	1937.9	×6.9
grisu2	95.7	117.441	137.5	1985.5	×6.7
stb	126.0	128.549	131.6	2185.2	×6.1
fpconv	101.6	143.204	163.0	2418.3	×5.5
sprintf	710.8	782.770	841.2	13288.4	×1.0
ostrstream	1024.5	1095.819	1155.5	18615.3	×0.7
ostringstream	1056.4	1136.396	1204.2	19300.7	×0.7

So, clang code is ~x1.5 slower than gcc.
This is probably the result of some flaw in the optimizer (like the % bug, i.e. modulus operation).

jk-jeon · 2020-09-15T23:26:50Z

Ah. I remember clang has a bug regarding modular operation. That should be the prime reason I guess. Could you update your code and rerun the benchmark with clang? The fix should be simple; just replace a % b to a - b * (a/b), like here.

Also,

It should be easy to include dragonbox to your benchmark yourself.
I think there is no reason to test ostrinsgream or sprintf, as they are obviously much, much, much slower. Why not temporarily disable them from the benchmark and save your precious time?

EDIT:
I think OS nor libc version nor CPU can matter a lot. We are not allocating memory/thread or anything, not writing to file/network/etc., so no reason to depend largely on OS. Optimized code seems not calling any std library function so no reason to depend on libc version. Also, Haswell for mobile (4600U) and Kaby Lake for mobile (7820HQ, 7700HQ (mine)) should not be that different.

EDIT2:
I've just confirmed that clang issues the dreaded div instruction in your code. Something should be terribly wrong!

erthink · 2020-09-16T00:03:07Z

Ah. I remember clang has a bug regarding modular operation. That should be the prime reason I guess. Could you update your code and rerun the benchmark with clang? The fix should be simple; just replace a % b to a - b * (a/b), like here.

The first try.
Speedup for clang-10: ×15.5 -> ×18.2.
In the same env: gcc-9.3 ×23.4, gcc-10 ×22.6.

It's too late now for me (3am by my local time).
c u 2mr.

jk-jeon · 2020-09-16T00:12:32Z

@erthink Yeah, c u 2mr.

~~FYI: The performance has been improved, but clang still issues div instruction, I don't know why. Might be great if you can check if gcc also issues div. (msvc doesn't.)~~

EDIT: never mind, was my mistake. The performance improvement was about 4-5ns compared to the original version.

vitaut · 2020-09-16T13:44:00Z

Could you let me know the things you think worth keeping among these?

These two look essential:

Compressed (1.1kb) cache table for double
Conversion routine for float

{fmt} always uses round to nearest even mode (e.g. when resolving ties). The full cache is not essential but could be included as an option if it's not too difficult.

Right now only IEEE 754 binary32 and binary64 are handled since those are most widely used. I wouldn't worry about exotic formats right now and let them be handled by the fallback.

So the speed-up is about 30% for the compressed version, and is about 50% for the full version.

That's a cool speedup. Prettification code and integration with digit generator have a lot of room for improvement and can be optimized separately.

And about licensing. I believe it should be okay to copy-and-paste-and-modify my own code and distribute it under a different license.

If you are the author, you can definitely distribute your code under any license. Your contribution to {fmt} will of course be acknowledged.

jk-jeon · 2020-09-16T16:27:30Z

The full cache is not essential but could be included as an option if it's not too difficult.

I think it depends on how you provide the option to the users. Maybe the easiest (and dirtiest) way is to use a macro and conditional compilation. But I'm concerned about ODR violation hell if we choose that path. Exactly for this reason, I personally dislike any library using macros and conditional compilation for the configuration, but it would be very easy to include the full cache version if we do that.

So the speed-up is about 30% for the compressed version, and is about 50% for the full version.

That's a cool speedup. Prettification code and integration with digit generator have a lot of room for improvement and can be optimized separately.

(To be precise, the speed-up was about 20% for the compressed version and about 30-40% for the full version when compiled with clang, and I think these are probably more relevant numbers as clang currently optimizes the code better, which means the amount of improvement for the case of msvc might decrease in the future.)

Currently, {fmt} first generates all the digit characters and then prettify them, but I think there are not many reasons to do that if you adapt dragonbox. But that's probably another topic.

If you are the author, you can definitely distribute your code under any license. Your contribution to {fmt} will of course be acknowledged.

Yeah, I was too over-worried lol

Anyway, I think now I can start porting the code:)

SanderBouwhuis · 2020-09-16T17:58:59Z

@jk-jeon
I would definitely prefer to have the option to use the full cache for performance reasons on beefy machines while also having the option for the less performant but smaller compressed cache version. Maybe the choice could depend on which of 2 header files you include? E.g., fmt_dragon.h vs fmt_dragon_full.h.

jk-jeon · 2020-09-16T18:10:44Z

@SanderBouwhuis I think using multiple headers is not better than macro/conditional compilation solution regarding the ODR issue. But maybe I'm too sensitive to that issue.

jk-jeon · 2020-09-17T05:51:27Z

@vitaut I have a question regarding 128-bit integer types. It seems that currently,

If FMT_USE_INT128 is defined and is not 0, then assume the user have defined int128_t and uint128_t with the correct semantics.
If FMT_USE_INT128 is not defined or is 0, but the compiler supports 128-bit integer types, then declare int128_t and uint128_t as aliases to those types.
Otherwise, define int128_t and uint128_t as empty structs.

Is this correct?

Dragonbox requires some 128-bit integer arithmetic so I will define a struct for that. Can I assume that if FMT_USE_INT128 is 1, then uint128_t is a correctly functioning type?

More precisely, here is the definition of uint128 in the upstream dragonbox implementation. Can I replace the line

#if (defined(__GNUC__) || defined(__clang__)) && defined(__SIZEOF_INT128__) && defined(__x86_64__)

with

#if FMT_USE_INT128

and __int128 with fmt::detail::uint128_t?

struct uint128 {
	uint128() = default;

#if (defined(__GNUC__) || defined(__clang__)) && defined(__SIZEOF_INT128__) && defined(__x86_64__)
	unsigned __int128	internal_;

	constexpr uint128(std::uint64_t high, std::uint64_t low) noexcept :
		internal_{ ((unsigned __int128)low) | (((unsigned __int128)high) << 64) } {}

	constexpr uint128(unsigned __int128 u) : internal_{ u } {}

	constexpr std::uint64_t high() const noexcept {
		return std::uint64_t(internal_ >> 64);
	}
	constexpr std::uint64_t low() const noexcept {
		return std::uint64_t(internal_);
	}

	uint128& operator+=(std::uint64_t n) & noexcept {
		internal_ += n;
		return *this;
	}
#else
	std::uint64_t	high_;
	std::uint64_t	low_;

	constexpr uint128(std::uint64_t high, std::uint64_t low) noexcept :
		high_{ high }, low_{ low } {}

	constexpr std::uint64_t high() const noexcept {
		return high_;
	}
	constexpr std::uint64_t low() const noexcept {
		return low_;
	}

	uint128& operator+=(std::uint64_t n) & noexcept {
#if defined(_MSC_VER) && defined(_M_X64)
		auto carry = _addcarry_u64(0, low_, n, &low_);
		_addcarry_u64(carry, high_, 0, &high_);
		return *this;
#else
		auto sum = low_ + n;
		high_ += (sum < low_ ? 1 : 0);
		low_ = sum;
		return *this;
#endif
	}
#endif
};

vitaut · 2020-09-17T14:40:47Z

Maybe the easiest (and dirtiest) way is to use a macro and conditional compilation.

I think a macro is fine, there is a bunch of configuration macros already which are mainly for targeting resource-constrained systems. It's possible to turn ODR violations into compile (or link) errors if desired but we can address it later. In general users are expected to not mix configurations.

I have a question regarding 128-bit integer types ... Is this correct?

Looks correct.

Can I assume that if FMT_USE_INT128 is 1, then uint128_t is a correctly functioning type?

Yes.

Can I replace the line ... with ... and __int128 with fmt::detail::uint128_t?

Yes.

Thanks for working on this!

jk-jeon · 2020-09-17T19:53:06Z

@vitaut I think I'm almost done, but here is another question. As I understood, the function format_float<T> is instantiated for float, double, and long double. First of all, I currently don't have an implementation for 80-bit/128-bit float's, so for platforms where long double is not IEEE-754 binary64, it will cause a problem. But as far as I understood your original Grisu3+Dragon4 also works only for IEEE-754 binary32/64, so what exactly happens here? Do you just ignore 80/128-bit float's and just treat long double as double, or there are some fallback mechanisms outside of format_float<T> that I couldn't find?

Couple of other small issues:

Should I use basic_data<T> and data for static data? If that is the case, then to avoid ambiguously similar names, maybe I want to change names of some existing tables, e.g., pow10_significands into grisu_pow10_significands.
I need TZCNT intrinsics, so copied/modified LZCNT detection codes in format.h. But I realized that while __builtin_clz and friends return int, the MSVC version returns uint32_t. Is this intended?

Sorry for spamming..

vitaut · 2020-09-18T15:02:33Z

what exactly happens here?

It falls back to snprintf for exotic floating point:

fmt/include/fmt/format-inl.h

Line 1131 in 45da432

if (!specs.use_grisu) return snprintf_float(value, precision, specs, buf);

but we should probably separate format_float instantiation for long double.

Should I use basic_data and data for static data?

Yes, please.

If that is the case, then to avoid ambiguously similar names, maybe I want to change names of some existing tables

Sure.

Is this intended?

No, it's accidental. Feel free to change the MSVC version to return int.

vitaut · 2020-09-20T14:27:45Z

Thanks to @jk-jeon, I guess we are done here. I'll open a new issue to separate shortest and fixed FP formatting (and make integration of the former more efficient).

vitaut mentioned this issue Nov 27, 2019

A possible improvement on Grisu #1097

Closed

SanderBouwhuis mentioned this issue Mar 30, 2020

Integration in fmt (basically c++20 std::format) erthink/erthink#1

Closed

fmtlib deleted a comment from data-man Jun 20, 2020

AlbertoEAF mentioned this issue Sep 11, 2020

[fix] Locale independent model load/save microsoft/LightGBM#3267

Closed

This was referenced Sep 18, 2020

Fix msvc version of clz & clzll #1880

Merged

Implement Dragonbox #1881

Closed

Implement Dragonbox #1882

Merged

vitaut closed this as completed Sep 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize floating-point formatting #1426

Optimize floating-point formatting #1426

vitaut commented Nov 27, 2019 •

edited

Loading

erthink commented Feb 19, 2020

SanderBouwhuis commented Mar 30, 2020 •

edited

Loading

vitaut commented Mar 30, 2020

SanderBouwhuis commented Mar 30, 2020 •

edited

Loading

vitaut commented Mar 31, 2020

SanderBouwhuis commented Mar 31, 2020 •

edited

Loading

vitaut commented Mar 31, 2020

SanderBouwhuis commented Mar 31, 2020

vitaut commented Jun 20, 2020

vitaut commented Jun 21, 2020

vitaut commented Sep 14, 2020 •

edited

Loading

SanderBouwhuis commented Sep 14, 2020

jk-jeon commented Sep 15, 2020

vitaut commented Sep 15, 2020

jk-jeon commented Sep 15, 2020

jk-jeon commented Sep 15, 2020

erthink commented Sep 15, 2020

jk-jeon commented Sep 15, 2020 •

edited

Loading

erthink commented Sep 15, 2020 •

edited

Loading

erthink commented Sep 15, 2020

jk-jeon commented Sep 15, 2020 •

edited

Loading

erthink commented Sep 16, 2020

jk-jeon commented Sep 16, 2020 •

edited

Loading

vitaut commented Sep 16, 2020 •

edited

Loading

jk-jeon commented Sep 16, 2020 •

edited

Loading

SanderBouwhuis commented Sep 16, 2020 •

edited

Loading

jk-jeon commented Sep 16, 2020

jk-jeon commented Sep 17, 2020

vitaut commented Sep 17, 2020 •

edited

Loading

jk-jeon commented Sep 17, 2020 •

edited

Loading

vitaut commented Sep 18, 2020

vitaut commented Sep 20, 2020

Optimize floating-point formatting #1426

Optimize floating-point formatting #1426

Comments

vitaut commented Nov 27, 2019 • edited Loading

erthink commented Feb 19, 2020

SanderBouwhuis commented Mar 30, 2020 • edited Loading

vitaut commented Mar 30, 2020

SanderBouwhuis commented Mar 30, 2020 • edited Loading

vitaut commented Mar 31, 2020

SanderBouwhuis commented Mar 31, 2020 • edited Loading

vitaut commented Mar 31, 2020

SanderBouwhuis commented Mar 31, 2020

vitaut commented Jun 20, 2020

vitaut commented Jun 21, 2020

vitaut commented Sep 14, 2020 • edited Loading

SanderBouwhuis commented Sep 14, 2020

jk-jeon commented Sep 15, 2020

vitaut commented Sep 15, 2020

jk-jeon commented Sep 15, 2020

jk-jeon commented Sep 15, 2020

erthink commented Sep 15, 2020

jk-jeon commented Sep 15, 2020 • edited Loading

erthink commented Sep 15, 2020 • edited Loading

erthink commented Sep 15, 2020

jk-jeon commented Sep 15, 2020 • edited Loading

erthink commented Sep 16, 2020

jk-jeon commented Sep 16, 2020 • edited Loading

vitaut commented Sep 16, 2020 • edited Loading

jk-jeon commented Sep 16, 2020 • edited Loading

SanderBouwhuis commented Sep 16, 2020 • edited Loading

jk-jeon commented Sep 16, 2020

jk-jeon commented Sep 17, 2020

vitaut commented Sep 17, 2020 • edited Loading

jk-jeon commented Sep 17, 2020 • edited Loading

vitaut commented Sep 18, 2020

vitaut commented Sep 20, 2020

vitaut commented Nov 27, 2019 •

edited

Loading

SanderBouwhuis commented Mar 30, 2020 •

edited

Loading

SanderBouwhuis commented Mar 30, 2020 •

edited

Loading

SanderBouwhuis commented Mar 31, 2020 •

edited

Loading

vitaut commented Sep 14, 2020 •

edited

Loading

jk-jeon commented Sep 15, 2020 •

edited

Loading

erthink commented Sep 15, 2020 •

edited

Loading

jk-jeon commented Sep 15, 2020 •

edited

Loading

jk-jeon commented Sep 16, 2020 •

edited

Loading

vitaut commented Sep 16, 2020 •

edited

Loading

jk-jeon commented Sep 16, 2020 •

edited

Loading

SanderBouwhuis commented Sep 16, 2020 •

edited

Loading

vitaut commented Sep 17, 2020 •

edited

Loading

jk-jeon commented Sep 17, 2020 •

edited

Loading