Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only init OpenCL once per Initialization #139

Merged
merged 16 commits into from
May 17, 2023
Merged

Conversation

fasmat
Copy link
Member

@fasmat fasmat commented May 12, 2023

This changes the API of internal/postrs to preserve an instance of post-rs's Initializer until explicitly freed.

  • ScryPositions has been removed and replaced by a new struct Scrypt that has to be Instantiated with NewScrypt. This instance holds a reference to the rust exposed initializer that has to be explicitly freed by calling Scrypt::Close. The change has been propagated up different layers of abstructions into the initialization.Initializer.
  • All code related to gpu-post has been removed due to becoming incompatible with the recent changes.

Do not merge before spacemeshos/post-rs#59

Closes #138

@fasmat fasmat requested review from pigmej and poszu May 12, 2023 12:09
@fasmat fasmat self-assigned this May 12, 2023
@fasmat fasmat marked this pull request as ready for review May 13, 2023 09:32
@fasmat fasmat marked this pull request as draft May 13, 2023 09:58
@fasmat fasmat force-pushed the only-init-once-for-opencl branch from 87cca30 to 4f1d8c3 Compare May 15, 2023 09:14
@fasmat fasmat marked this pull request as ready for review May 15, 2023 10:00
@pigmej
Copy link
Member

pigmej commented May 15, 2023

Now the speed is as expected but it still crashed with

2023/05/15 10:52:28     DEBUG   initialization: file #2 current position: 38797312, remaining: 28311552
2023/05/15 10:52:30     DEBUG   initialization: file #2 current position: 39845888, remaining: 27262976
2023/05/15 10:52:32     DEBUG   initialization: file #2 current position: 40894464, remaining: 26214400
2023/05/15 10:52:34     DEBUG   initialization: file #2 current position: 41943040, remaining: 25165824
2023/05/15 10:52:36     DEBUG   initialization: file #2 current position: 42991616, remaining: 24117248
2023/05/15 10:52:39     DEBUG   initialization: file #2 current position: 44040192, remaining: 23068672
2023/05/15 10:52:41     DEBUG   initialization: file #2 current position: 45088768, remaining: 22020096
2023/05/15 10:52:43     DEBUG   initialization: file #2 current position: 46137344, remaining: 20971520
2023/05/15 10:52:45     DEBUG   initialization: file #2 current position: 47185920, remaining: 19922944
2023/05/15 10:52:47     DEBUG   initialization: file #2 current position: 48234496, remaining: 18874368
2023/05/15 10:52:50     DEBUG   initialization: file #2 current position: 49283072, remaining: 17825792
2023/05/15 10:52:52     DEBUG   initialization: file #2 current position: 50331648, remaining: 16777216
2023/05/15 10:52:54     DEBUG   initialization: file #2 current position: 51380224, remaining: 15728640
2023/05/15 10:52:56     DEBUG   initialization: file #2 current position: 52428800, remaining: 14680064
2023/05/15 10:52:58     DEBUG   initialization: file #2 current position: 53477376, remaining: 13631488
2023/05/15 10:53:01     DEBUG   initialization: file #2 current position: 54525952, remaining: 12582912
2023/05/15 10:53:03     DEBUG   initialization: file #2 current position: 55574528, remaining: 11534336
2023/05/15 10:53:05     DEBUG   initialization: file #2 current position: 56623104, remaining: 10485760
2023/05/15 10:53:07     DEBUG   initialization: file #2 current position: 57671680, remaining: 9437184
2023/05/15 10:53:09     DEBUG   initialization: file #2 current position: 58720256, remaining: 8388608
2023/05/15 10:53:12     DEBUG   initialization: file #2 current position: 59768832, remaining: 7340032
2023/05/15 10:53:14     DEBUG   initialization: file #2 current position: 60817408, remaining: 6291456
2023/05/15 10:53:16     DEBUG   initialization: file #2 current position: 61865984, remaining: 5242880
2023/05/15 10:53:18     DEBUG   initialization: file #2 current position: 62914560, remaining: 4194304
2023/05/15 10:53:21     DEBUG   initialization: file #2 current position: 63963136, remaining: 3145728
Found new smallest nonce: Some(VrfNonce { index: 199214068, label: [0, 0, 0, 1, 10, 251, 93, 246, 161, 28, 201, 140, 231, 38, 180, 178, 27, 93, 64, 199, 96, 105, 160, 107, 8, 151, 1
54, 192, 67, 170, 150, 173] })
2023/05/15 10:53:23     INFO    initialization: file #2, found nonce: 199214068, value: 000000010afb5df6a11cc98ce726b4b2
2023/05/15 10:53:23     INFO    initialization: file #2, found new best nonce
2023/05/15 10:53:23     DEBUG   initialization: file #2 current position: 65011712, remaining: 2097152
2023/05/15 10:53:25     DEBUG   initialization: file #2 current position: 66060288, remaining: 1048576
2023/05/15 10:53:27     INFO    initialization: file #2 completed; number of labels written: 67108864
2023/05/15 10:53:27     INFO    initialization: starting to write file #3; target number of labels: 67108864, start position: 201326592
Using provider: [GPU] NVIDIA CUDA/NVIDIA GeForce RTX 3090
device memory: 24259 MB, max_mem_alloc_size: 6064 MB, max_compute_units: 82, max_wg_size: 1024
preferred_wg_size_multiple: 32, kernel_wg_size: 256
Using: global_work_size: 12128, local_work_size: 32
Allocating buffer for input: 32 bytes
Allocating buffer for output: 388096 bytes
Allocating buffer for lookup: 6358564864 bytes
2023/05/15 10:53:27     DEBUG   initialization: file #3 current position: 0, remaining: 67108864
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: OclError(OclCore(Api(

################################ OPENCL ERROR ###############################

Error executing function: clEnqueueNDRangeKernel("scrypt")

Status error code: CL_MEM_OBJECT_ALLOCATION_FAILURE (-4)

Please visit the following url for more information:

https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueNDRangeKernel.html#errors

#############################################################################
)))', ffi/src/initialization.rs:146:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
SIGABRT: abort
PC=0x7fd345a2fa7c m=9 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 1 [syscall]:
runtime.cgocall(0x523d16, 0xc0001495f8)
        /root/go/src/runtime/cgocall.go:158 +0x5c fp=0xc0001495d0 sp=0xc000149598 pc=0x40595c
github.com/spacemeshos/post/internal/postrs._Cfunc_initialize(0x7fd13cdcf8a0, 0xc000000, 0xc0fffff, 0x7fd13d0020d0, 0xc0002820b0)
        _cgo_gotypes.go:307 +0x4c fp=0xc0001495f8 sp=0xc0001495d0 pc=0x4f562c
github.com/spacemeshos/post/internal/postrs.cScryptPositions.func2(0x7fd13cdcf8a0, 0x1?, 0x7fd13d0020d0?, 0x657be0?, 0x0?)
        /root/post/internal/postrs/initializer.go:109 +0x7f fp=0xc000149658 sp=0xc0001495f8 pc=0x4f6bdf
github.com/spacemeshos/post/internal/postrs.cScryptPositions(0xc00013a000?, 0xc000149700?, 0xc000000, 0xc0fffff)
        /root/post/internal/postrs/initializer.go:109 +0xf1 fp=0xc000149708 sp=0xc000149658 pc=0x4f6831
github.com/spacemeshos/post/internal/postrs.(*Scrypt).Positions(0xc000280030, 0xc000000, 0xc0fffff)
        /root/post/internal/postrs/api.go:159 +0x7e fp=0xc000149760 sp=0xc000149708 pc=0x4f60fe
github.com/spacemeshos/post/oracle.(*WorkOracle).Positions(0x4000000?, 0x55dbd9?, 0x20?)
        /root/post/oracle/oracle.go:164 +0x33 fp=0xc0001497b8 sp=0xc000149760 pc=0x506513
github.com/spacemeshos/post/initialization.(*Initializer).initFile(0xc000176000, {0x588b18, 0xc00016c340}, 0xc000175a30?, 0x100000, 0xc000000, 0x4000000, {0xc00001a1a0, 0x20, 0x20})
        /root/post/initialization/initialization.go:479 +0xab0 fp=0xc0001499d8 sp=0xc0001497b8 pc=0x51f450
github.com/spacemeshos/post/initialization.(*Initializer).Initialize(0xc000176000, {0x588b18, 0xc00016c340})
        /root/post/initialization/initialization.go:266 +0x58a fp=0xc000149cb8 sp=0xc0001499d8 pc=0x51d9ea
main.main()
        /root/post/cmd/postcli/main.go:133 +0x3e5 fp=0xc000149f80 sp=0xc000149cb8 pc=0x522c85
runtime.main()
        /root/go/src/runtime/proc.go:250 +0x212 fp=0xc000149fe0 sp=0xc000149f80 pc=0x439052
runtime.goexit()
        /root/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc000149fe8 sp=0xc000149fe0 pc=0x465e41

goroutine 2 [force gc (idle), 6 minutes]:

@pigmej
Copy link
Member

pigmej commented May 15, 2023

That's probably caused by RUST side though.

@fasmat fasmat force-pushed the only-init-once-for-opencl branch 6 times, most recently from 9af84de to f75bc4c Compare May 15, 2023 15:45
@fasmat fasmat force-pushed the only-init-once-for-opencl branch from f75bc4c to 3a247c9 Compare May 15, 2023 15:46
@fasmat fasmat force-pushed the only-init-once-for-opencl branch from 88a6395 to bfc87c9 Compare May 15, 2023 16:12
initialization/initialization.go Outdated Show resolved Hide resolved
@fasmat fasmat merged commit f02d95c into develop May 17, 2023
@fasmat fasmat deleted the only-init-once-for-opencl branch May 17, 2023 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Post-rs integration creates new intializer per every batch
3 participants