Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Crystal slower than Ruby in generating randoms? #6340

Closed
lxxxvi opened this issue Jul 6, 2018 · 40 comments
Closed

Is Crystal slower than Ruby in generating randoms? #6340

lxxxvi opened this issue Jul 6, 2018 · 40 comments

Comments

@lxxxvi
Copy link

lxxxvi commented Jul 6, 2018

Hey there. I've just got a question...

Is Crystal slower than Ruby in generating randoms?

While playing around with Crystal I noticed that Ruby is quite faster than Crystal in generating randoms.

I wrote a Crystal script that performs 10 million randoms (hex, base64, rand and random_bytes) and measures the times.

I did the same in Ruby and found that the Ruby script is significantly faster than Crystal.

Is this expected? Are Crystal randoms more secure and thus more expensive?

Generator Amount Time in Crystal Time in Ruby
hex 10M ~ 47.4s ~ 8.8s
base64 10M ~ 52.1s ~ 10.6s
rand 10M ~ 43.8s ~ 6.1s
random_bytes 10M ~ 47.8s ~ 5.8s

(In Crystal I used Random, in Ruby I used SecureRandom)

$ crystal -v
Crystal 0.25.1 (2018-06-29)

LLVM: 5.0.2
Default target: x86_64-apple-macosx
$ ruby -v
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]

Environment

  • macOS High Sierra 10.13.5 (17F77)
  • 1.4 GHz Intel Core i7
  • 8 GB 1867 MHz LPDDR3

(This description and all scripts are available at https://github.com/lxxxvi/ruby-vs-crystal-random-performance )

@ysbaddaden
Copy link
Contributor

  1. You must use --release to benchmark anything, this is especially true for CPU-heavy computations.
  2. You must compare what's comparable. For example the global rand function or SecureRandom (Ruby) vs Random::Secure (Crystal).

@kostya
Copy link
Contributor

kostya commented Jul 6, 2018

did you compile it with --release? for me 0.25.1 is 10-20% faster than ruby.

@ysbaddaden
Copy link
Contributor

Comparing what's comparable, namely the default random and the secure random source (/dev/urandom):

# ruby
require "benchmark"
require "securerandom"

N = 1_000_000
RANGE = 0...999_999

Benchmark.bm do |x|
  x.report("rand()") { N.times { rand(RANGE) } }
  x.report("SecureRandom.rand") { N.times { SecureRandom.rand(RANGE) } }
  x.report("SecureRandom.random_bytes") { N.times { SecureRandom.random_bytes(16) } }
end
# crystal
require "benchmark"
require "random/secure"

N = 1_000_000
RANGE = 0...999_999

Benchmark.bm do |x|
  x.report("rand()") { N.times { rand(RANGE) } }
  x.report("Random::Secure.rand") { N.times { Random::Secure.rand(RANGE) } }
  x.report("Random::Secure.random_bytes") { N.times { Random::Secure.random_bytes(16) } }
end

No comment:

$ ruby bench.rb
                           user     system      total        real
rand()                     0.134299   0.000000   0.134299 (  0.134408)
SecureRandom.rand          1.159312   2.414552   3.573864 (  3.580610)
SecureRandom.random_bytes  0.782822   3.070608   3.853430 (  3.860538)

$ crystal run --release bench.cr
                                  user     system      total        real
rand()                        0.010000   0.000000   0.010000 (  0.012934)
Random::Secure.rand           0.130000   1.200000   1.330000 (  1.337170)
Random::Secure.random_bytes   0.190000   1.990000   2.180000 (  2.128027)

@lxxxvi
Copy link
Author

lxxxvi commented Jul 6, 2018

@ysbaddaden Thanks for helping me comparing it properly, I appreciate that.

I ran your benchs, but used N = 10_000_000 and got different results.

$ ruby bench.rb

                                  user     system      total        real
rand()                        1.485554   0.004912   1.490466 (  1.497553)
SecureRandom.rand             6.764457   0.008698   6.773155 (  6.783643)
SecureRandom.random_bytes     4.735023   0.004706   4.739729 (  4.745405)


$ crystal run --release bench.cr

                                  user      system       total         real
rand()                        0.050000    0.000000    0.050000 (   0.053843)
Random::Secure.rand           5.330000    8.670000   14.000000 (  14.022774)
Random::Secure.random_bytes   5.900000   18.360000   24.260000 (  24.136340)

What could I be doing wrong now?

@asterite
Copy link
Member

asterite commented Jul 6, 2018

@lxxxvi What's your OS? I'm on OSX and my results are:

# ruby
                               user     system      total        real
rand()                     1.860000   0.000000   1.860000 (  1.875924)
SecureRandom.rand         15.350000   0.030000  15.380000 ( 15.447070)
SecureRandom.random_bytes 14.920000   0.040000  14.960000 ( 15.049611)

# crystal
                                  user     system      total        real
rand()                        0.090000   0.000000   0.090000 (  0.089923)
Random::Secure.rand           5.400000   9.760000   15.160000 (  15.251411)
Random::Secure.random_bytes   6.340000   21.350000   27.690000 (  27.635482)

So on OSX Crystal is definitely slower than Ruby, at least for random_bytes.

@yxhuvud
Copy link
Contributor

yxhuvud commented Jul 6, 2018

Hmm, is there some sort of precision cutoff going on for the crystal values? The values seem suspiciously round.

@asterite
Copy link
Member

asterite commented Jul 6, 2018

@yxhuvud Indeed!

It seems Ruby 2.5.1 uses libc getrusage instead of times. Before that, Ruby used times and it gave the same bad precision. Maybe we should use getrusage, but it seems it's not always defined.

@asterite
Copy link
Member

asterite commented Jul 6, 2018

@lxxxvi Also, what version of Ruby are you using? It seems 2.5.1 now uses urandom if available, previoulsy it used openssl... I'm installing 2.5.1 right now to run the benchmark again.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Jul 6, 2018

I used the very slow random_bytes(Int) which allocates a 16 bytes slice on the HEAP on each call, which will trigger the GC at some point, because there is no Ruby equivalent to Crystal's recommendation to fill a Slice (HEAP or stack allocated):

N = 10_000_000
bytes = Bytes.new(16)
N.times { Random::Secure.random_bytes(bytes) }
$ crystal run --release bench.cr
                                  user     system      total        real
Random::Secure.random_bytes   1.060000   19.240000   20.300000 (  20.316772)

(can't compare with Ruby anymore)

@lxxxvi
Copy link
Author

lxxxvi commented Jul 6, 2018

@asterite

I use(d) these versions (see the issue's description)

$ crystal -v
Crystal 0.25.1 (2018-06-29)

LLVM: 5.0.2
Default target: x86_64-apple-macosx
$ ruby -v
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]

Environment

  • macOS High Sierra 10.13.5 (17F77)
  • 1.4 GHz Intel Core i7
  • 8 GB 1867 MHz LPDDR3

@asterite
Copy link
Member

asterite commented Jul 6, 2018

It seems Ruby is using arc4random_buf, which might be faster than reading from urandom:

https://github.com/ruby/ruby/blob/trunk/random.c#L472-L479

So maybe that's the reason Ruby is faster?

/cc @ysbaddaden

@asterite
Copy link
Member

asterite commented Jul 6, 2018

(I wonder how can we detect if a C function is present, when compiling... otherwise I don't know how can we conditionally use arc4random_buf if it's available)

@asterite
Copy link
Member

asterite commented Jul 6, 2018

Doing this in Crystal:

lib LibC
  fun arc4random_buf(buf : UInt8*, bytes : SizeT)
end

time = Time.now
10_000_000.times do
  bytes = Bytes.new(16)
  LibC.arc4random_buf(bytes, 16)
end
puts Time.now - time

Takes 0.24 seconds. Assuming Ruby uses that (I think so), doing it in Ruby:

require "securerandom"

time = Time.now
10_000_000.times do
  SecureRandom.random_bytes(16)
end
puts Time.now - time

Takes 13.8 seconds.

So here's our chance to optimize :-)

By the way, arc4random_buf seems to be cryptographically secure, according to the man:

These functions use a cryptographic pseudo-random number generator to generate high quality random bytes very quickly. One data pool is used for all consumers in a process, so that consumption under program flow can act as additional stirring. The subsystem is re-seeded from the kernel random number subsystem on a regular basis, and also upon fork(2).

@lxxxvi
Copy link
Author

lxxxvi commented Jul 6, 2018

@asterite That looks promising 👍

What could be the reason that Random::Secure.rand isn't faster?

Looking at your benchmark results, the times for SecureRandom.rand vs Random::Secure.rand is quite interesting too... Crystal is faster in user but slower in system, overall it's about as fast (or slow) as Ruby.

@asterite
Copy link
Member

asterite commented Jul 6, 2018

@lxxxvi Random::Secure.rand also uses urandom in Crystal

@chris-huxtable
Copy link
Contributor

chris-huxtable commented Jul 6, 2018

@asterite - arc4random and friends were originally built by the folks at OpenBSD. Its designed to be a easier to use, better, and faster source of cryptographically secure random. MacOS has had it for ages though they moved away from rc4 and use yarrow internally (OpenBSD also moved from rc4 to ChaCHa internally). The code for this should already be in the stdlib as the OpenBSD Random::Secure (implementation) already uses it.

Adding:

{% elsif flag?(:darwin) %}
  require "./unix/arc4random"

to this and changing a few other things in lib c should cover it.

I would also like to see arc4random_uniform like methods be added to the stdlib as its free of modulo bias an easy mistake often made. Perhaps when I have a bit more time I will open a PR.

arc4random for anyone thats interested:

@ysbaddaden
Copy link
Contributor

Well, arc4random is only deemed secure on OpenBSD (because ChaCha, and slow). See libsodium sysrandom, which I trust the most.

Different systems have different secure sources. Linux 4.x added the getrandom syscall, which may be faster than going through /dev/urandom for example (already implemented in Crystal).

Anyway:

  1. A secure random source doesn't have to be fast, but of cryptography quality (hence slow);
  2. In most use cases, non crytographic, a good PRNG (PCG, xoshiro**, siphash, ...) seeded from Random::Secure is enough, and will boost performances.

Of course if /dev/urandom and arc4random are identical on macOS and the later is faster, I could revise, but I need proof of it, for all supported macOS releases 😄

@chris-huxtable we already have uniformisation, for any PRNG. See https://github.com/crystal-lang/crystal/blob/master/src/random.cr

@chris-huxtable
Copy link
Contributor

@ysbaddaden - Syscalls (arc4random,getrandom) also have the added benefit of working when in a chroot and when you run out of file descriptors (which can be done intentionally by an attacker).

@ysbaddaden
Copy link
Contributor

Sure, but the main issue remains: is arc4random the same as urandom on all supported macOS versions ?

That being said, we only ever open a single fd to urandom.

@chris-huxtable
Copy link
Contributor

chris-huxtable commented Jul 7, 2018

@ysbaddaden - macOS has had arc4random for quite some time. Unfortunately the online man pages have been scrubbed with the Swift overhaul of the ADC. As such I have found nothing from Apple. From third parties one could deduce that arc4random has been on they system since macOS 10.6 (2009) and arc4random_buf since macOS 10.7 (2011). Additionally they now (since 10.12) use an AES cipher (according to man arc4random).

With regards to arc4random equivalence with /dev/urandom/. The man pages say:

The same random data is also available from getentropy(2). Using the getentropy(2) system call interface will provide resiliency to file descriptor exhaustion, chroot, or sandboxing which can make /dev/random unavailable. Additionally, the arc4random(3) API provides a fast userspace random number generator built on the random data source and is preferred over directly accessing the system's random device.
[...]
The random device implements the Yarrow pseudo random number generator algorithm and maintains its entropy pool. The kernel automatically seeds the algorithm with additional entropy during normal execution.

I would note that man getentropy says:

However, it should be noted that getentropy() is primarily intended for use in the construction and seeding of userspace PRNGs like arc4random(3) [...].

and man random says:

Applications requiring cryptographic quality randomness should use arc4random(3).

While this doesn't explicitly say it is based on how OpenBSD does it and how I remember reading something to effect a number of years ago (I could be mistaking the source/context), I think its a safe bet.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Jul 9, 2018

I spent too much time already on this issue. I don't place bets on a CSPRNG.

  • Maybe macOS 10.12+ is fine, but older releases are very bad.
  • Maybe we can drop support for macOS < 10.12, after all Apple only supports its latest shiny release.
  • Maybe a userspace CSPRNG is secure enough (or maybe not).

What's sure is that the "kernel automatically seeds the algorithm with additional entropy during normal execution" for the random device (i.e. /dev/urandom). What about arc4random? Is it seeded once or is its entropy regularly updated? Who knows? 😢

@konovod
Copy link
Contributor

konovod commented Jul 9, 2018

According to source it is reseeded regularly, after each 6.4Mb of random data. It uses genentropy, but this basically is the same as /dev/urandom.

@RX14
Copy link
Member

RX14 commented Jul 9, 2018

6.4MiB of random is a LOT of random

@ysbaddaden
Copy link
Contributor

@konovod sadly, the most recent version of the file is dated of 2008 and specifically refers to RC4 😭

@konovod
Copy link
Contributor

konovod commented Jul 9, 2018

https://opensource.apple.com
shows the list of OSX versions. Following the link to version you can find files that match corresponding version.
As for 2008 and RC4 - well, comments are outdated. The interesting part is under #if defined(__APPLE__) && !defined(VARIANT_STATIC) and it includes AES.

@asterite
Copy link
Member

asterite commented Jul 9, 2018

Is there any John Apple we can ask?

@yxhuvud
Copy link
Contributor

yxhuvud commented Jul 9, 2018

Apple running hideously old versions of open source code is sadly not uncommon :(

@chris-huxtable
Copy link
Contributor

I may be wrong here but, it also appears that it is reseeded after a fork.
source

in _arc4_fork_child:

if (rng_state != NULL) {
    arc4_count = 0;

in arc4random and arc4random_buf:

if (arc4_count <= 0) {
    arc4_stir();
}

@chris-huxtable
Copy link
Contributor

@ysbaddaden - Unfortunately when Apple updated the ADC for Swift a number of years ago they killed off a ton of c and systems documentation. :(

@98-f355-f1
Copy link

For Crystal: N = 1_000_000
saleen@TESTAROSSA:/mnt/c/Users/MaraTom/Desktop$ crystal build --release bench.cr
saleen@TESTAROSSA:/mnt/c/Users/MaraTom/Desktop$ ./bench
user system total real
rand() 0.000000 0.000000 0.000000 ( 0.007522)
Random::Secure.rand 0.280000 1.280000 1.560000 ( 1.560843)
Random::Secure.random_bytes 0.360000 1.150000 1.510000 ( 1.515625)

For Ruby
saleen@TESTAROSSA:/mnt/c/Users/MaraTom/Desktop$ ruby bench.rb
user system total real
rand() 0.156250 0.000000 0.156250 ( 0.146042)
SecureRandom.rand 0.781250 1.265625 2.046875 ( 2.049006)
SecureRandom.random_bytes 0.484375 1.234375 1.718750 ( 1.733307)

N = 10_000_000
Ruby:
saleen@TESTAROSSA:/mnt/c/Users/MaraTom/Desktop$ ruby bench.rb
user system total real
rand() 1.375000 0.000000 1.375000 ( 1.374348)
SecureRandom.rand 7.031250 12.984375 20.015625 ( 20.036401)
SecureRandom.random_bytes 4.453125 12.796875 17.250000 ( 17.265022)

Crystal:
saleen@TESTAROSSA:/mnt/c/Users/MaraTom/Desktop$ crystal build --release bench.cr
saleen@TESTAROSSA:/mnt/c/Users/MaraTom/Desktop$ ./bench
user system total real
rand() 0.070000 0.000000 0.070000 ( 0.070488)
Random::Secure.rand 2.770000 12.530000 15.300000 ( 15.299145)
Random::Secure.random_bytes 2.890000 12.150000 15.040000 ( 15.056012)

Crystal seems to be the winner here.... it cut through the rand() like butter.

System: Intel I5, Windows 10, the tests were conducted using the WSL Linux Substem for both Ruby and Crystal.

@RX14
Copy link
Member

RX14 commented Jul 17, 2018

@98-f355-f1 yeah thats expected on linux systems, osx benchmark results will be slower

@rdp
Copy link
Contributor

rdp commented Nov 15, 2019

All I get is SecureRandom.randtest.rb:10:in block (3 levels) in

': private method rand' called for SecureRandom:Module (NoMethodError) :)

@ysbaddaden
Copy link
Contributor

Closing. Security trumps performance on this topic.

Also, Ruby made changes, and:

  1. restricted arc4random_buf(3) to a few BSD (OpenBSD, NetBSD, FreeBSD 12) — see ruby/ruby@b120f5e#diff-a962031992b7b5934db41bf925b3487e;
  2. leverages SecRandomCopyBytes on macOS 10.7+ which wraps /dev/random under the hood — see ruby/ruby@68f9d7b#diff-a962031992b7b5934db41bf925b3487e;
  3. uses getrandom(2) when available —which may include FreeBSD 12?

Note that libsodium now uses getrandom(2) on FreeBSD 12.

@jkthorne
Copy link
Contributor

for those following this issue if you are interested here are the 2 most active libsodium libraries that include multiple ChaCha algorithms and other cryspographic algorithms.

https://github.com/didactic-drunk/sodium.cr
https://github.com/watzon/nacl

@rdp
Copy link
Contributor

rdp commented Nov 15, 2019

Just poking around, this change seems to effect about a 2x speedup?

diff --git a/src/crystal/system/unix/urandom.cr b/src/crystal/system/unix/urandom.cr
index f2998b772..82e9688d4 100644
--- a/src/crystal/system/unix/urandom.cr
+++ b/src/crystal/system/unix/urandom.cr
@@ -11,7 +11,7 @@ module Crystal::System::Random
     return unless urandom.info.type.character_device?
 
     urandom.close_on_exec = true
-    urandom.read_buffering = false
+    urandom.read_buffering = true
     @@urandom = urandom
   end

benchs (OS X, git master)

$ ./bench.old 
Random::Secure
                                  user     system      total        real
rand()                        0.011046   0.000009   0.011055 (  0.011061)
Random::Secure.rand           0.272780   0.349765   0.622545 (  0.622883)
Random::Secure.random_bytes   0.266101   0.633775   0.899876 (  0.881925)
$ ./bench.new
Random::Secure
                                  user     system      total        real
rand()                        0.011446   0.000007   0.011453 (  0.011462)
Random::Secure.rand           0.020545   0.075565   0.096110 (  0.096123)
Random::Secure.random_bytes   0.021922   0.364581   0.386503 (  0.368568)

Though I don't understand fully what's going on...doesn't seem to affect startup time, can be synchronized with a mutex and doesn't really drop in speed. FWIW...

@ysbaddaden
Copy link
Contributor

Please see other pull requests about why we don't buffer urandom.

@asterite
Copy link
Member

A comment in the source code might have been nice...

@rdp
Copy link
Contributor

rdp commented Nov 16, 2019

For followers, here's the PR referred to: #5848

Here's another attempt to speedup, by using OpenSSL's rand function (which I noticed in the ruby source). No speedup in linux, but OS X:

$ ./bench.urandom
                                  user     system      total        real
rand()                       0.019039   0.000025   0.019064 (  0.019074)
Random::Secure.rand          0.743042   1.436121   2.179163 (  2.203013)
Random::Secure.random_bytes  0.795708   3.055914   3.851622 (  3.904940)

$ ./bench.openssl
                                  user     system      total        real
rand()                        0.019009   0.000080   0.019089 (  0.019695)
Random::Secure.rand           0.202165   0.000737   0.202902 (  0.208178)
Random::Secure.random_bytes   0.517045   0.003730   0.520775 (  0.526210)

Implementation:

diff --git a/src/crystal/system/random.cr b/src/crystal/system/random.cr
index e03cc6ca6..08c9d00ac 100644
--- a/src/crystal/system/random.cr
+++ b/src/crystal/system/random.cr
@@ -15,7 +15,11 @@ end
 {% elsif flag?(:openbsd) %}
   require "./unix/arc4random"
 {% elsif flag?(:unix) %}
-  require "./unix/urandom"
+  {% if flag?(:without_openssl) %}
+    require "./unix/urandom"
+  {% else %}
+    require "./unix/openssl"
+  {% end %}
 {% elsif flag?(:win32) %}
   require "./win32/random"
 {% else %}
diff --git a/src/crystal/system/unix/openssl.cr b/src/crystal/system/unix/openssl.cr
new file mode 100644
index 000000000..547f8884b
--- /dev/null
+++ b/src/crystal/system/unix/openssl.cr
@@ -0,0 +1,18 @@
+require "openssl"
+
+module Crystal::System::Random
+
+  def self.random_bytes(buf : Bytes) : Nil
+      out = LibCrypto.rand_bytes(buf.to_unsafe.as(UInt8*), buf.size)
+      if out != 1
+        raise OpenSSL::Error.new("generating cryptographic rand_bytes failed")
+      end
+  end
+
+  def self.next_u : UInt8
+      buf = uninitialized UInt8[1]
+      random_bytes(buf.to_slice)
+      buf.unsafe_as(UInt8)
+ end
+
+end

Hard to exactly detect the startup penalty but it seems to be around 0.003s on my local box. FWIW...

And here's the long discussion of why ruby switched away from this kind of OpenSSL random default (blog links at top are interesting) https://bugs.ruby-lang.org/issues/9569 for followers...

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Nov 16, 2019

No, please. There are years of arguments against OpenSSL's rand function. Ruby finally changed to use proper secure sources, and only kept a fallback on OpenSSL that shall never happen in practice —that we didn't even bother to implement, on purpose.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Nov 16, 2019

Don't bother tweaking Random::Secure. It doesn't have to be fast, it has to be a CSPRNG, meant for cryptography and secure usages such as generating secret keys, or seeding a PRNG. Free performance is great, but it musn't impact the cryptography quality. It's already the best we can achieve today, secure & performance wise —except for FreeBSD 12 where we could leverage getrandom(2).

If you need high quality random numbers, for example to generate UUIDs, and never expose the numbers (or don't care that the seed can be reversed), you can use the default rand (PCG32) or preferably another PRNG with more state (e.g. xoshiro256**) if you generate lots of them. They're much faster than Random::Secure, but they're unsuitable for cryptography and secure usages.

straight-shoota added a commit to straight-shoota/crystal that referenced this issue Nov 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests