Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for background currency fetch thread to finish before exiting #298

Merged
merged 3 commits into from
Jan 27, 2024

Conversation

eminence
Copy link
Contributor

When numbat is run with a quick-running --expression, it can finish evaluating the expression before the background currency fetch thread has finished.

On Windows, this is problematic because the main thread does some post-main winsock shutdown tasks, which conflict with the background task that's still trying to fetch exchange rate info from the network.

This commit keeps a reference to the currency fetching thread and joins on it before exiting.

When numbat is run with a quick-running --expression, it can finish
evaluating the expression before the background currency fetch thread
has finished.

On Windows, this is problematic because the main thread does some
post-main winsock shutdown tasks, which conflict with the background
task that's still trying to fetch exchange rate info from the network.

This commit keeps a reference to the currency fetching thread and
joins on it before exiting.
@eminence
Copy link
Contributor Author

eminence commented Jan 24, 2024

This should resolve the sporadic CI test failures reported in #290

When reproducing the failures, I saw two types: the first is a crash from numbat.exe, and the second was a hung numbat.exe.

In the case of the hung numbat, the main thread was blocked with a callstack that looks like this:

[0x0]   ntdll!NtWaitForAlertByThreadId+0x14   0xf42cf6eaa8   0x7ff8a339387d   
[0x1]   ntdll!RtlpWaitOnAddressWithTimeout+0x81   0xf42cf6eab0   0x7ff8a3393732   
[0x2]   ntdll!RtlpWaitOnAddress+0xae   0xf42cf6eae0   0x7ff8a339354d   
[0x3]   ntdll!RtlpWaitOnCriticalSection+0xfd   0xf42cf6eb50   0x7ff8a335fcb4   
[0x4]   ntdll!RtlpEnterCriticalSectionContended+0x1c4   0xf42cf6ec30   0x7ff8a335fae2   
[0x5]   ntdll!RtlEnterCriticalSection+0x42   0xf42cf6ec90   0x7ff8a01aea53   
[0x6]   mswsock!DeleteSockets+0x23   0xf42cf6ecc0   0x7ff8a2ccfa66   
[0x7]   ws2_32!WahEnumerateHandleContexts+0xf786   0xf42cf6ecf0   0x7ff8a019d40b   
[0x8]   mswsock!WSPCleanup+0x7b   0xf42cf6ed40   0x7ff8a2cb2c98   
[0x9]   ws2_32!DPROVIDER::WSPCleanup+0x48   0xf42cf6ed90   0x7ff8a2cb2c1c   
[0xa]   ws2_32!CleanupProtocolProviders+0x2c   0xf42cf6edc0   0x7ff8a2cbc758   
[0xb]   ws2_32!DCATALOG::EnumerateCatalogItems+0x50   0xf42cf6ee00   0x7ff8a2cbfae3   
[0xc]   ws2_32!DPROCESS::~DPROCESS+0x83   0xf42cf6ee40   0x7ff8a2cc03da   
[0xd]   ws2_32!DPROCESS::`scalar deleting destructor'+0xe   0xf42cf6ee70   0x7ff8a2cc04f6   
[0xe]   ws2_32!WSACleanup+0xe6   0xf42cf6eea0   0x7ff64afae102   
[0xf]   numbat!std::sys::windows::net::cleanup+0x19   0xf42cf6ef20   0x7ff64afe8db4   
[0x10]   numbat!std::sys::windows::cleanup+0x19   0xf42cf6ef20   0x7ff64afe8db4   
[0x11]   numbat!std::rt::cleanup::closure$0+0x151   0xf42cf6ef20   0x7ff64afe8db4   
[0x12]   numbat!std::sync::once::impl$2::call_once::closure$0<std::rt::cleanup::closure_env$0>+0x172   0xf42cf6ef20   0x7ff64afe8db4   
[0x13]   numbat!std::sys_common::once::queue::Once::call+0x3e4   0xf42cf6ef80   0x7ff64afa29ab   
[0x14]   numbat!std::sync::once::Once::call_once+0x47   0xf42cf6f080   0x7ff64afadf1c   
[0x15]   numbat!std::rt::cleanup+0x4b   0xf42cf6f080   0x7ff64afadf1c   
[0x16]   numbat!std::process::exit+0xc   0xf42cf6f0c0   0x7ff64a909770   
[0x17]   numbat!numbat::main+0x290   0xf42cf6f0f0   0x7ff64a8c77cb   
[0x18]   numbat!core::ops::function::FnOnce::call_once<void (*)(),tuple$<> >+0xb   0xf42cf6f4e0   0x7ff64a8f40de   
[0x19]   numbat!core::hint::black_box+0xa   0xf42cf6f520   0x7ff64a8da881   
[0x1a]   numbat!std::sys_common::backtrace::__rust_begin_short_backtrace<void (*)(),tuple$<> >+0xe   0xf42cf6f520   0x7ff64a8da881   
[0x1b]   numbat!std::rt::lang_start::closure$0<tuple$<> >+0x11   0xf42cf6f560   0x7ff64afa2a78   
[0x1c]   numbat!std::rt::lang_start_internal::closure$2+0xb   0xf42cf6f5a0   0x7ff64a8da85a   
[0x1d]   numbat!std::panicking::try::do_call+0xb   0xf42cf6f5a0   0x7ff64a8da85a   
[0x1e]   numbat!std::panicking::try+0xb   0xf42cf6f5a0   0x7ff64a8da85a   
[0x1f]   numbat!std::panic::catch_unwind+0xb   0xf42cf6f5a0   0x7ff64a8da85a   
[0x20]   numbat!std::rt::lang_start_internal+0xb8   0xf42cf6f5a0   0x7ff64a8da85a   
[0x21]   numbat!std::rt::lang_start<tuple$<> >+0x3a   0xf42cf6f6a0   0x7ff64a90c689   
[0x22]   numbat!main+0x19   0xf42cf6f710   0x7ff64afe717c   
[0x23]   numbat!invoke_main+0x22   0xf42cf6f740   0x7ff8a1a17344   
[0x24]   numbat!__scrt_common_main_seh+0x10c   0xf42cf6f740   0x7ff8a1a17344   
[0x25]   kernel32!BaseThreadInitThunk+0x14   0xf42cf6f780   0x7ff8a33826b1   
[0x26]   ntdll!RtlUserThreadStart+0x21   0xf42cf6f7b0   0x0   

Here we can see the main thread has exited, and now various exit handlers are running, including some windows socket shutdown functions.

Meanwhile, the currency fetching thread is still running:

[0x0]   ntdll!NtWaitForAlertByThreadId+0x14   0xf42d6f8968   0x7ff8a339387d   
[0x1]   ntdll!RtlpWaitOnAddressWithTimeout+0x81   0xf42d6f8970   0x7ff8a3393732   
[0x2]   ntdll!RtlpWaitOnAddress+0xae   0xf42d6f89a0   0x7ff8a339354d   
[0x3]   ntdll!RtlpWaitOnCriticalSection+0xfd   0xf42d6f8a10   0x7ff8a335fcb4   
[0x4]   ntdll!RtlpEnterCriticalSectionContended+0x1c4   0xf42d6f8af0   0x7ff8a335fae2   
[0x5]   ntdll!RtlEnterCriticalSection+0x42   0xf42d6f8b50   0x7ff8a2cb2696   
[0x6]   ws2_32!WPUModifyIFSHandle+0x96   0xf42d6f8b80   0x7ff8a01964ca   
[0x7]   mswsock!WSPSocket+0x28a   0xf42d6f8be0   0x7ff8a2cb57e0   
[0x8]   ws2_32!WSASocketW+0x130   0xf42d6f8ce0   0x7ff8a2cb565e   
[0x9]   ws2_32!socket+0x6e   0xf42d6f8d80   0x7ff8a2cb2d42   
[0xa]   ws2_32!GetProtocolStateForFamilyCb+0x12   0xf42d6f8dd0   0x7ff8a33738c0   
[0xb]   ntdll!RtlRunOnceExecuteOnce+0x90   0xf42d6f8e00   0x7ff8a0e932ab   
[0xc]   KERNELBASE!InitOnceExecuteOnce+0xb   0xf42d6f8e40   0x7ff8a2cb4bdc   
[0xd]   ws2_32!GetProtocolStateForFamily+0x50   0xf42d6f8e70   0x7ff8a2cb4b44   
[0xe]   ws2_32!IsProtoRunning+0x38   0xf42d6f8ef0   0x7ff8a2cb344e   
[0xf]   ws2_32!LookupAddressForName+0x96   0xf42d6f8f20   0x7ff8a2cb5eac   
[0x10]   ws2_32!GetAddrInfoW+0x38c   0xf42d6f9030   0x7ff8a2cb37d5   
[0x11]   ws2_32!getaddrinfo+0xc5   0xf42d6f91d0   0x7ff64afb8bfe   
[0x12]   numbat!std::sys::windows::c::getaddrinfo+0x8   0xf42d6f9250   0x7ff64afaa886   
[0x13]   numbat!std::sys_common::net::impl$6::try_from::closure$0+0x35   0xf42d6f9250   0x7ff64afaa886   
[0x14]   numbat!std::sys::common::small_c_string::run_with_cstr+0x91   0xf42d6f9250   0x7ff64afaa886   
[0x15]   numbat!std::sys_common::net::impl$6::try_from+0xbe   0xf42d6f9250   0x7ff64afaa886   
[0x16]   numbat!std::net::socket_addr::impl$10::to_socket_addrs+0x96   0xf42d6f9470   0x7ff64ad9e111   
[0x17]   numbat!attohttpc::happy::connect+0x121   0xf42d6f9510   0x7ff64ada4b1a   
[0x18]   numbat!enum2$<attohttpc::streams::BaseStream>::connect_tcp+0x6a   0xf42d6f9c90   0x7ff64ada4fc8   
[0x19]   numbat!enum2$<attohttpc::streams::BaseStream>::connect_tls+0x58   0xf42d6f9de0   0x7ff64ada36b7   
[0x1a]   numbat!enum2$<attohttpc::streams::BaseStream>::connect+0x3d7   0xf42d6faba0   0x7ff64ad7d871   
[0x1b]   numbat!attohttpc::request::PreparedRequest<attohttpc::request::body::Empty>::send<attohttpc::request::body::Empty>+0x2e1   0xf42d6fbba0   0x7ff64ad7b52e   
[0x1c]   numbat!attohttpc::request::builder::RequestBuilder<attohttpc::request::body::Empty>::send<attohttpc::request::body::Empty>+0x9e   0xf42d6fdc00   0x7ff64ad78310   
[0x1d]   numbat!numbat_exchange_rates::fetch_ecb_xml+0x40   0xf42d6fe450   0x7ff64ad783c9   
[0x1e]   numbat!numbat_exchange_rates::fetch_exchange_rates+0x29   0xf42d6ff380   0x7ff64ace0f58   
[0x1f]   numbat!numbat::currency::impl$0::fetch::closure$0+0x18   0xf42d6ff440   0x7ff64ace3aa8   
[0x20]   numbat!std::sync::once_lock::impl$0::get_or_init::closure$0<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > >,numbat::currency::impl$0::fetch::closure_env$0>+0x18   0xf42d6ff4b0   0x7ff64ace34c2   
[0x21]   numbat!std::sync::once_lock::impl$0::initialize::closure$0<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > >,std::sync::once_lock::impl$0::get_or_init::closure_env$0<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > >,numbat::currency::impl$0::fetch::closure_env$0>,never$>+0x72   0xf42d6ff530   0x7ff64ac8d4c5   
[0x22]   numbat!std::sync::once::impl$2::call_once_force::closure$0<std::sync::once_lock::impl$0::initialize::closure_env$0<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > >,std::sync::once_lock::impl$0::get_or_init::closure_env$0<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > >,numbat::currency::impl$0::fetch::closure_env$0>,never$> >+0xb5   0xf42d6ff6c0   0x7ff64afe8db4   
[0x23]   numbat!std::sys_common::once::queue::Once::call+0x3e4   0xf42d6ff750   0x7ff64ac8cda3   
[0x24]   numbat!std::sync::once::Once::call_once_force<std::sync::once_lock::impl$0::initialize::closure_env$0<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > >,std::sync::once_lock::impl$0::get_or_init::closure_env$0<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > >,numbat::currency::impl$0::fetch::closure_env$0>,never$> >+0x93   0xf42d6ff850   0x7ff64ace32a9   
[0x25]   numbat!std::sync::once_lock::OnceLock<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > > >::initialize<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > >,std::sync::once_lock::impl$0::get_or_init::closure_env$0<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > >,numbat::currency::impl$0::fetch::closure_env$0>,never$>+0x39   0xf42d6ff900   0x7ff64ace3d53   
[0x26]   numbat!std::sync::once_lock::OnceLock<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > > >::get_or_try_init<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > >,std::sync::once_lock::impl$0::get_or_init::closure_env$0<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > >,numbat::currency::impl$0::fetch::closure_env$0>,never$>+0x73   0xf42d6ff960   0x7ff64ace392e   
[0x27]   numbat!std::sync::once_lock::OnceLock<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > > >::get_or_init<std::sync::mutex::Mutex<enum2$<core::option::Option<std::collections::hash::map::HashMap<alloc::string::String,f64,std::collections::hash::map::RandomState> > > >,numbat::currency::impl$0::fetch::closure_env$0>+0xe   0xf42d6ff9e0   0x7ff64ac98df0   
[0x28]   numbat!numbat::currency::ExchangeRatesCache::fetch+0x10   0xf42d6ffa30   0x7ff64ababc79   
[0x29]   numbat!numbat::Context::prefetch_exchange_rates+0x9   0xf42d6ffa80   0x7ff64a89df49   
[0x2a]   numbat!numbat::impl$1::run::closure$0+0x9   0xf42d6ffac0   0x7ff64a8f40f9   
[0x2b]   numbat!core::hint::black_box   0xf42d6ffaf0   0x7ff64a8e5c19   
[0x2c]   numbat!std::sys_common::backtrace::__rust_begin_short_backtrace<numbat::impl$1::run::closure_env$0,tuple$<> >+0x9   0xf42d6ffaf0   0x7ff64a8e5c19   
[0x2d]   numbat!std::thread::impl$0::spawn_unchecked_::closure$1::closure$0<numbat::impl$1::run::closure_env$0,tuple$<> >+0x9   0xf42d6ffb20   0x7ff64a8c4d99   
[0x2e]   numbat!core::panic::unwind_safe::impl$23::call_once<tuple$<>,std::thread::impl$0::spawn_unchecked_::closure$1::closure_env$0<numbat::impl$1::run::closure_env$0,tuple$<> > >+0x9   0xf42d6ffb50   0x7ff64a8ff0b3   
[0x2f]   numbat!std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::impl$0::spawn_unchecked_::closure$1::closure_env$0<numbat::impl$1::run::closure_env$0,tuple$<> > >,tuple$<> >+0x13   0xf42d6ffb80   0x7ff64a8ff183   
[0x30]   numbat!std::panicking::try::do_catch<core::panic::unwind_safe::AssertUnwindSafe<std::thread::impl$0::spawn_unchecked_::closure$1::closure_env$0<numbat::impl$1::run::closure_env$0,tuple$<> > >,tuple$<> >+0xc3   0xf42d6ffbc0   0x7ff64a8ff056   
[0x31]   numbat!std::panicking::try<tuple$<>,core::panic::unwind_safe::AssertUnwindSafe<std::thread::impl$0::spawn_unchecked_::closure$1::closure_env$0<numbat::impl$1::run::closure_env$0,tuple$<> > > >+0x26   0xf42d6ffc10   0x7ff64a8e5aba   
[0x32]   numbat!std::panic::catch_unwind+0x5   0xf42d6ffc80   0x7ff64a8c760e   
[0x33]   numbat!std::thread::impl$0::spawn_unchecked_::closure$1<numbat::impl$1::run::closure_env$0,tuple$<> >+0xfa   0xf42d6ffc80   0x7ff64a8c760e   
[0x34]   numbat!core::ops::function::FnOnce::call_once<std::thread::impl$0::spawn_unchecked_::closure_env$1<numbat::impl$1::run::closure_env$0,tuple$<> >,tuple$<> >+0xe   0xf42d6ffd90   0x7ff64afb7afc   
[0x35]   numbat!std::sys::windows::thread::impl$0::new::thread_start+0x4c   0xf42d6ffdd0   0x7ff8a1a17344   
[0x36]   kernel32!BaseThreadInitThunk+0x14   0xf42d6ffe60   0x7ff8a33826b1   
[0x37]   ntdll!RtlUserThreadStart+0x21   0xf42d6ffe90   0x0   

@sharkdp
Copy link
Owner

sharkdp commented Jan 24, 2024

Thank you very much for looking into this.

That also explains why the test was flaky, because it probably depended on the timings of the HTTP requests.

I do wonder, however, if this is the right fix. Imagine I have a slow internet connection (this is actually the case sometimes when I am behind a company proxy). Now my quick numbat -e '…' calculations would take at least as long as the HTTP request (even if they don't include any currencies), correct?

If that is the case, I'm not really sure how to proceed. We would probably have to send some kind of signal to the thread to close down immediately, without further blocking on the HTTP request. And that would probably involve a non-blocking solution + select. Or we could use signals? But then that might not be portable...

Maybe we should first find out if this is an actual, real-world problem :-)

@eminence
Copy link
Contributor Author

Now my quick numbat -e '…' calculations would take at least as long as the HTTP request (even if they don't include any currencies), correct?

Yes, this is correct. In my tests, it takes about 500ms for the HTTP thread to finish. Obviously this could be much slower on a worse internet connection.

Killing the background thread would be one option, but there isn't a simple way to do this.

Sending a signal to the HTTP client to interrupt itself would be nice, but I don't know if attohttp supports this (I haven't looked).

The most simple thing we could do is only spawn the background HTTP thread for interactive REPLs. If you use the -e <expression> option, then only load currency info on demand. But even if we did this, I think we still want to merge this PR, because some of the CI tests don't use -e, and instead they pipe expressions to the REPL via stdin.

Load on-demand for all other use cases.
@eminence
Copy link
Contributor Author

only spawn the background HTTP thread for interactive REPLs

Upon more reflection, this seems like the most sensible thing to do, so I just pushed a commit that implements this. numbat -e <expr> is now fast again

PS, i love using hyperfine for this kind of stuff:

❯ hyperfine -L commit 24af97f,4f93c4 --prepare 'git checkout {commit}; cargo build' './target/debug/numbat -e 1'
Benchmark 1: ./target/debug/numbat -e 1 (commit = 24af97f)
  Time (mean ± σ):      78.5 ms ±   1.5 ms    [User: 67.5 ms, System: 4.0 ms]
  Range (min … max):    75.9 ms …  80.1 ms    10 runs
 
Benchmark 2: ./target/debug/numbat -e 1 (commit = 4f93c4)
  Time (mean ± σ):     484.8 ms ±  19.5 ms    [User: 75.4 ms, System: 7.0 ms]
  Range (min … max):   471.0 ms … 537.1 ms    10 runs
 
Summary
  ./target/debug/numbat -e 1 (commit = 24af97f) ran
    6.18 ± 0.28 times faster than ./target/debug/numbat -e 1 (commit = 4f93c4)

@sharkdp
Copy link
Owner

sharkdp commented Jan 27, 2024

only spawn the background HTTP thread for interactive REPLs

Upon more reflection, this seems like the most sensible thing to do, so I just pushed a commit that implements this. numbat -e <expr> is now fast again

I think this is a great solution. The only case where this would result in a performance regression would be if someone wrote a Numbat program (like in an actual file) that does some longer calculations without currencies and then suddenly requires the exchange rates. Previously, we would have loaded them in the background while the other calculation was running. But now we would need to fetch them on demand.

But I think this is a far-fetched use case.

But I definitely appreciate that we have a fast numbat -e "…" and a background retrieval of currency exchange rates for REPL use cases.

PS, i love using hyperfine for this kind of stuff:

Thank you for the feedback! Please notice that you are performing benchmarks with the debug build. The release build of numbat starts up much faster (~22 ms on my notebook. And much faster than that if you use --no-prelude... below one millisecond).

@sharkdp
Copy link
Owner

sharkdp commented Jan 27, 2024

I pushed a commit to re-enable the CI build for Window. This will now close #290, if I understand correctly.

@sharkdp sharkdp merged commit ca4df99 into sharkdp:master Jan 27, 2024
15 checks passed
@eminence
Copy link
Contributor Author

Agreed with your analysis above. In the future, we could implement a --prefetch-exchange-rates <always, never, auto> option, to allow the background fetching to happen in parallel with a long-running script. But let's wait until someone asks for this.

BTW, I think 32-bit windows CI builds were also impacted by this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants