Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-musl f32 functions? #298

Open
jdh8 opened this issue Aug 11, 2024 · 13 comments
Open

Non-musl f32 functions? #298

jdh8 opened this issue Aug 11, 2024 · 13 comments

Comments

@jdh8
Copy link

jdh8 commented Aug 11, 2024

Are you interested in non-musl implementations, especially f32 functions faster on modern architectures? I am writing math functions in Rust from scratch, based on my own research since 2017:
https://github.com/jdh8/metallic-rs

My functions are tested to be faithfully rounded (error < 1 ulp). Unlike most C libraries, I assume that f64 instructions are available even for f32 functions and also try to use FMA if possible. I also keep LUTs to a minimum with WebAssembly in mind.

Benchmarks show that my implementations are faster than musl (libm) and glibc (std on Linux) except for libm::logf
https://github.com/jdh8/metallic-rs/actions/runs/10342894582/job/28626374065

@tgross35
Copy link
Contributor

@jdh8 do you have any specific examples of which specific functions you are talking about? Supplying math symbols that are not exposed in std is more or less a non-goal of our libm.

@jdh8
Copy link
Author

jdh8 commented Aug 12, 2024

I have published metallic, so its docs are publicly available.
https://docs.rs/metallic/latest/metallic/index.html

For now I have implemented cbrt, exp, exp2, exp10, exp_m1, frexp, hypot, ldexp, ln, ln_1p, log2, log10, round for f32.

Functions not yet in std

exp10

f32::exp10/metallic::f32::exp10
                        time:   [1.4450 ns 1.4468 ns 1.4491 ns]
f32::exp10/libm::exp10f time:   [8.7490 ns 8.7607 ns 8.7763 ns]

frexp

f32::frexp/metallic::f32::frexp
                        time:   [522.51 ps 523.83 ps 525.21 ps]
f32::frexp/libm::frexpf time:   [2.1766 ns 2.1782 ns 2.1802 ns]

ldexp

f32::ldexp/metallic::f32::ldexp
                        time:   [731.73 ps 732.56 ps 733.40 ps]
f32::ldexp/libm::ldexpf time:   [8.5147 ns 8.5303 ns 8.5475 ns]

Examples of functions already in std

cbrt

f32::cbrt/metallic::f32::cbrt
                        time:   [1.2175 ns 1.2204 ns 1.2240 ns]
f32::cbrt/libm::cbrtf   time:   [1.5684 ns 1.5699 ns 1.5717 ns]
f32::cbrt/f32::cbrt     time:   [15.171 ns 15.177 ns 15.184 ns]

exp

f32::exp/metallic::f32::exp
                        time:   [1.4475 ns 1.4538 ns 1.4635 ns]
f32::exp/libm::expf     time:   [9.4960 ns 9.5048 ns 9.5178 ns]
f32::exp/f32::exp       time:   [4.2191 ns 4.2225 ns 4.2269 ns]

hypot

f32::hypot/metallic::f32::hypot
                        time:   [770.30 ps 771.21 ps 772.27 ps]
f32::hypot/libm::hypotf time:   [3.4183 ns 3.4193 ns 3.4204 ns]
f32::hypot/f32::hypot   time:   [3.7229 ns 3.7271 ns 3.7328 ns]

log2

f32::log2/metallic::f32::log2
                        time:   [1.6325 ns 1.6335 ns 1.6347 ns]
f32::log2/libm::log2f   time:   [5.7100 ns 5.7170 ns 5.7267 ns]
f32::log2/f32::log2     time:   [3.2606 ns 3.2647 ns 3.2721 ns]

@burrbull
Copy link
Contributor

Is your crate no_std compatible?

@beetrees
Copy link
Contributor

Which platform are you doing the performance comparison on? Looking at the performance numbers, you've math builtins don't appear to be coming from Rust libm (and are presumably coming from the system libc); it would be good to know which libc your system is using for comparison purposes.

You've stated that the accuracy of all the functions is error < 1 ULP; is this just an upper bound or can this amount of error actually occur for each functions? Using this paper it's possible to compare to the accuracy of the MUSL versions of cbrtf, expf, hypotf, log2f and exp10f (note that Rust libm's expf and log2f implementations are out-of-date in comparison to MUSL upstream); notably all MUSL's functions except for exp10f have a maximum error smaller than 1 ULP (with the caveat that it's impossible to exhaustively test the bivariate hypotf; see the paper for details), with cbrtf being correctly rounded (error <= 0.5 ULP) in all cases.

I ask because both the CORE-MATH project and LLVM libc both provide correctly rounded (error <= 0.5 ULP) implementations of the functions that have competitive performance (a performance comparison with glibc is displayed on the CORE-MATH project website). All else being equal, it would be good to move towards Rust libm having correctly-rounded implementations where possible.

@jdh8
Copy link
Author

jdh8 commented Aug 12, 2024

I'm using ubuntu-latest, the GitHub Actions runner image.
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/.github/workflows/rust.yml#L23

I just tested if the errors are < 1 ulp. I have not yet computed the maximum error for each function.
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/tests/f32/main.rs#L36

Wow, correctly rounded transcendental functions exist! I dare not dream about this because of the Table Maker's Dilemma.

@beetrees
Copy link
Contributor

beetrees commented Aug 13, 2024

Re. the correctly rounded transcendental functions and the table maker's dilemma: this paper goes in to how the dilemma is dealt with. (For single-argument f32 functions in particular, it's possible to do an exhaustive computation using every possible input.)

@jdh8
Copy link
Author

jdh8 commented Aug 13, 2024

True, exhaustive computation for univariate f32 functions is pretty feasible as I've already tested them exhaustively with their f64 counterparts.

If correct rounding is a goal, I don't think I can make anything better than CORE-MATH. Maybe we should port CORE-MATH or LLVM libc instead? I've read several math functions from CORE-MATH. These functions are already using my assumptions such as calling f64 instructions in f32 functions. (C has floating-point contraction so it does not need special care for FMA.)

@beetrees
Copy link
Contributor

beetrees commented Aug 14, 2024

I've also been thinking of porting from CORE-MATH: the implementations seem to be self contained (making it easier to port) and well tested against the GNU MPFR library. FYI per rust-lang/rust#128386 (comment) I believe @alexpyattaev is currently looking into porting a new implementation of f64 log.

@burrbull
Copy link
Contributor

For note, Rust has good wrappers for MPFR. See rug

@jdh8
Copy link
Author

jdh8 commented Aug 14, 2024

I gain my confidence back! There is still room for optimization because we can ignore the floating-point environment. I'm upgrading my library to correct rounding.

@JSorngard
Copy link
Contributor

I also have a project that's semi-related to this issue: I have developed an implementation of the Lambert W function: https://crates.io/crates/lambert_w. Would integrating this capability be interesting? It is no_std and has 50 bits of accuracy according to the paper it is based on. It currently only has support for f64, hence "semi-related".

@alexpyattaev
Copy link

Thanks, but as far as I know Lambert's W is not in standard library, and this issue is about functions in std. Whether it belongs in std or not is a separate question entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants