Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault: 11 #280

Closed
nerzh opened this issue Aug 4, 2018 · 16 comments · Fixed by #630
Closed

Segmentation fault: 11 #280

nerzh opened this issue Aug 4, 2018 · 16 comments · Fixed by #630

Comments

@nerzh
Copy link
Contributor

nerzh commented Aug 4, 2018

Hi! I don't know english...

Bug:

1. git clone [email protected]:grpc/grpc-swift.git
2. cd grpc-swift
3. make && make project
4. ./.build/x86_64-apple-macosx10.10/debug/Echo serve
5.  !!!!!!!!!!! WAIT ~30 MINUTES !!!!!!!!!!!!!!!
6. Open another tab of terminal 
7. ./.build/x86_64-apple-macosx10.10/debug/Echo get
8. Segmentation fault: 11

THE END =)

server

$ ./.build/x86_64-apple-macosx10.10/debug/Echo serve
starting insecure server
Server received request to localhost:8080 calling /echo.Echo/Get from ipv6:[::1]:61747 with metadata ["x-ios-bundle-identifier": "io.grpc.echo", "x-goog-api-key": "YOUR_API_KEY", "user-agent": "grpc-c/6.0.0 (osx; chttp2; glorious)"]

Segmentation fault: 11

client

Olehs-MacBook-Pro:grpc-swift nerzh$ ./.build/x86_64-apple-macosx10.10/debug/Echo get
calling get
get sending: Testing 1 2 3
get received: Swift echo get: Testing 1 2 3

Olehs-MacBook-Pro:grpc-swift nerzh$ ./.build/x86_64-apple-macosx10.10/debug/Echo get
calling get
get sending: Testing 1 2 3
Unknown error occurred.

I repeat this test five times and get the same result: wait some time ~15-30 minutes, send request with ./.build/x86_64-apple-macosx10.10/debug/Echo get or another like ./.build/x86_64-apple-macosx10.10/debug/Echo update etc and get Segmentation fault: 11

If run server in Xcode :

2018-08-04 20 51 08

Why ? What I am doing wrong ?

@MrMage
Copy link
Collaborator

MrMage commented Aug 5, 2018

Stupid question, but did you try rebooting?

@nerzh
Copy link
Contributor Author

nerzh commented Aug 5, 2018

Yes : ) it was so unexpected for me that the standard code template does not work, that I tried it in month ago, and then I cloned swift-grpc yesterday and tried test echo example again - same result: run server, wait ~30 minutes, send request, sigmentation error

My Mac:
Target: x86_64-apple-darwin17.7.0
Darwin Olehs-MBP 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64

@maznikoff
Copy link

The same thing on iOS server.

@tikidunpon
Copy link
Contributor

@woodcrust @maznikoff
I've tried to reproduce this bug with same Kernel 17.7.0, but I can't.
Could you tell me your swift and Xcode version? and make && make project show any errors?

Here is my mac environment.

$ uname -a
Darwin koichitanakanoMacBook-Pro.local 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64

$ swift --version
Apple Swift version 4.1.2 (swiftlang-902.0.54 clang-902.0.39.2)

$ xcodebuild -version
Xcode 9.4.1
Build version 9F2000

@nerzh
Copy link
Contributor Author

nerzh commented Aug 11, 2018

@tikidunpon I will make video from quicktime, I think this should be better variant for demonstration this mistake

$ swift --version
Apple Swift version 4.1.2 (swiftlang-902.0.54 clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0

$ xcodebuild -version
Xcode 9.4.1
Build version 9F2000

@maznikoff
Copy link

maznikoff commented Aug 17, 2018

I've run server on iOS device and after 15 mins of inactivity got it. The issue can be reproduced on iOS 11.x and 12.x using Xcode 9.4/10.x
$ uname -a
Darwin Dmitrys-MBP.local 18.0.0 Darwin Kernel Version 18.0.0: Sun Aug 5 20:59:30 PDT 2018; root:xnu-4903.200.354~13/RELEASE_X86_64 x86_64
$ swift --version
Apple Swift version 4.1.2 (swiftlang-902.0.54 clang-902.0.39.2)
Target: x86_64-apple-darwin18.0.0
$ xcodebuild -version
Xcode 9.4
Build version 9F1027a

(lldb) bt
  thread #2, queue = 'SwiftGRPC.CompletionQueue.runToCompletion.spinloopThread', stop reason = signal SIGABRT
    frame #0: 0x000000021683d104 libsystem_kernel.dylib__pthread_kill + 8
    frame #1: 0x00000002168bca00 libsystem_pthread.dylibpthread_kill$VARIANT$armv81 + 296
    frame #2: 0x0000000216794d78 libsystem_c.dylibabort + 140
    frame #3: 0x0000000108dddee0 RETFMLogic::gpr_mu_lock(mu=0x00000001060100b0) at sync_posix.cc:47
    frame #4: 0x0000000108d52f2c RETFMLogicpollset_add_fd(pollset=0x00000001060100b0, fd=0x0000000283144620) at ev_poll_posix.cc:835
    frame #5: 0x0000000108d56cf8 RETFMLogicgrpc_pollset_add_fd(pollset=0x00000001060100b0, fd=0x0000000283144620) at ev_posix.cc:264
    frame #6: 0x0000000108de0694 RETFMLogictcp_add_to_pollset(ep=0x000000010350a740, pollset=0x00000001060100b0) at tcp_posix.cc:685
    frame #7: 0x0000000108d4e00c RETFMLogicgrpc_endpoint_add_to_pollset(ep=0x000000010350a740, pollset=0x00000001060100b0) at endpoint.cc:36
    frame #8: 0x0000000108d2f2bc RETFMLogicset_pollset(gt=0x0000000109fe0000, gs=0x0000000109fea0a8, pollset=0x00000001060100b0) at chttp2_transport.cc:2731
    frame #9: 0x0000000108debcf4 RETFMLogicgrpc_transport_set_pops(transport=0x0000000109fe0000, stream=0x0000000109fea0a8, pollent=0x0000000109fe8880) at transport.cc:182
    frame #10: 0x0000000108d49924 RETFMLogicset_pollset_or_pollset_set(elem=0x0000000109fe92b8, pollent=0x0000000109fe8880) at connected_channel.cc:157
    frame #11: 0x0000000108d217ec RETFMLogicgrpc_call_stack_set_pollset_or_pollset_set(call_stack=0x0000000109fe91c0, pollent=0x0000000109fe8880) at channel_stack.cc:205
    frame #12: 0x0000000108d15cf0 RETFMLogicgrpc_call_set_completion_queue(call=0x0000000109fe8800, cq=0x000000010600c020) at call.cc:489
    frame #13: 0x0000000108dbdcf0 RETFMLogicpublish_call(server=0x0000000103737c50, calld=0x0000000109fe92d0, cq_idx=0, rc=0x000000028265ac00)::call_data*, unsigned long, (anonymous namespace)::requested_call*) at server.cc:444
    frame #14: 0x0000000108dbdac0 RETFMLogicpublish_new_rpc(arg=0x0000000109fe9240, error=0x0000000000000000) at server.cc:501
    frame #15: 0x0000000108dbd838 RETFMLogicfinish_start_new_rpc(server=0x0000000103737c50, elem=0x0000000109fe9240, rm=0x0000000103737d40, payload_handling=GRPC_SRM_PAYLOAD_NONE)::request_matcher*, grpc_server_register_method_payload_handling) at server.cc:557
    frame #16: 0x0000000108dbd668 RETFMLogicstart_new_rpc(elem=0x0000000109fe9240) at server.cc:619
    frame #17: 0x0000000108dbd19c RETFMLogicgot_initial_metadata(ptr=0x0000000109fe9240, error=0x0000000000000000) at server.cc:760
    frame #18: 0x0000000108d57fb0 RETFMLogicexec_ctx_run(closure=0x0000000109fe9370, error=0x0000000000000000) at exec_ctx.cc:40
    frame #19: 0x0000000108d19550 RETFMLogicgrpc_closure_run(file="/Users/dmitry/Work/retfm-ios-proto/iOS-11/Pods/gRPC-Core/src/core/lib/surface/call.cc", line=1263, c=0x0000000109fe9370, error=0x0000000000000000) at closure.h:258
    frame #20: 0x0000000108d1a7a0 RETFMLogicpost_batch_completion(bctl=0x0000000109feabd0) at call.cc:1262
    frame #21: 0x0000000108d1959c RETFMLogicfinish_batch_step(bctl=0x0000000109feabd0) at call.cc:1275
    frame #22: 0x0000000108d18a28 RETFMLogicfinish_batch(bctlp=0x0000000109feabd0, error=0x0000000000000000) at call.cc:1532
    frame #23: 0x0000000108d57fb0 RETFMLogicexec_ctx_run(closure=0x0000000109feac50, error=0x0000000000000000) at exec_ctx.cc:40
    frame #24: 0x0000000108d57e5c RETFMLogicgrpc_core::ExecCtx::Flush(this=0x000000016d246358) at exec_ctx.cc:128
    frame #25: 0x0000000108d52d08 RETFMLogicpollset_work(pollset=0x0000000103737fa0, worker_hdl=0x0000000000000000, deadline=1200009) at ev_poll_posix.cc:1052
    frame #26: 0x0000000108d56ed8 RETFMLogicpollset_work(pollset=0x0000000103737fa0, worker=0x0000000000000000, deadline=1200009) at ev_posix.cc:249
    frame #27: 0x0000000108da544c RETFMLogicgrpc_pollset_work(pollset=0x0000000103737fa0, worker=0x0000000000000000, deadline=1200009) at pollset.cc:48
    frame #28: 0x0000000108d44754 RETFMLogiccq_next(cq=0x0000000103737e90, deadline=(tv_sec = 1200, tv_nsec = 8548041, clock_type = GPR_CLOCK_MONOTONIC), reserved=0x0000000000000000) at completion_queue.cc:929
    frame #29: 0x0000000108d43ad0 RETFMLogic::grpc_completion_queue_next(cq=0x0000000103737e90, deadline=(tv_sec = 1200, tv_nsec = 8548041, clock_type = GPR_CLOCK_MONOTONIC), reserved=0x0000000000000000) at completion_queue.cc:1005
    frame #30: 0x0000000108bf4380 RETFMLogiccgrpc_completion_queue_get_next_event(cq=0x0000000103737e90, timeout=600) at completion_queue.c:30
    frame #31: 0x0000000108c07e4c RETFMLogicCompletionQueue.wait(timeout=600, self=0x0000000281051ac0) at CompletionQueue.swift:97
    frame #32: 0x0000000108c18964 RETFMLogicclosure #1 in Server.run(self=0x0000000280656040, handlerFunction=0x0000000108c21c08 RETFMLogicpartial apply forwarder for closure #1 (SwiftGRPC.Handler) -> () in SwiftGRPC.ServiceServer.start() -> () at <compiler-generated>) at Server.swift:79
    frame #33: 0x0000000108a45814 RETFMLogicthunk for @escaping @callee_guaranteed () -> () at <compiler-generated>:0
    frame #34: 0x00000002166df6c8 libdispatch.dylib_dispatch_call_block_and_release + 24
    frame #35: 0x00000002166e0484 libdispatch.dylib_dispatch_client_callout + 16
    frame #36: 0x00000002166bafb0 libdispatch.dylib_dispatch_lane_serial_drain$VARIANT$armv81 + 548
    frame #37: 0x00000002166bbaf4 libdispatch.dylib_dispatch_lane_invoke$VARIANT$armv81 + 412
    frame #38: 0x00000002166c3f14 libdispatch.dylib_dispatch_workloop_worker_thread + 584
    frame #39: 0x00000002168c20f0 libsystem_pthread.dylib_pthread_wqthread + 312
    frame #40: 0x00000002168c4d00 libsystem_pthread.dylibstart_wqthread + 4

@nerzh
Copy link
Contributor Author

nerzh commented Aug 28, 2018

@MrMage @maznikoff @tikidunpon @timburks @haberman
I promised to make video ...
On this video I waited 900 seconds
https://youtu.be/K_6SKRDHN4o

@tikidunpon
Copy link
Contributor

@woodcrust
Thanks for making a video, I'm not sure the reason, But it seems like keepalive issue.
You can try following environment vars for debugging.

export GRPC_TRACE=all
export GRPC_VERBOSITY="DEBUG"

and this document also helpful. https://github.com/grpc/grpc/blob/master/doc/environment_variables.md

@novi
Copy link

novi commented Oct 10, 2018

I have the same problem. This workaround will fix that. But no idea to fix completely.

@iphone5s
Copy link

iphone5s commented Jan 9, 2019

I have the same problem.

This workaround will fix that. But no idea to fix completely.

@MrMage
Copy link
Collaborator

MrMage commented Jul 3, 2019

Closing this now; please open a new issue if the issue persists in Swift gRPC 0.9.0.

@MrMage MrMage closed this as completed Jul 3, 2019
@adirburke
Copy link

Just FYI still happens in v0.10.0

@MrMage MrMage reopened this Nov 11, 2019
@adirburke
Copy link

I went down the rabbit hole abit yesterday, It looks like when the timeout event is received you are you just continuing the "spinloop" but it seems that the core is destorying the underlyingCompletionQueue

I couldnt find any documentation on how gRPC should handle the timeout event, so either you can have the server end running or you can spin up a new server. In my opinion the server should stop and the developer should be able to decide what they want to do, either re-run the server etcc...

@MrMage
Copy link
Collaborator

MrMage commented Nov 12, 2019

That makes sense as the cause of the error; I just don't understand why gRPC destroys the queue upon that timeout — that one doesn't make sense to me.

Maybe we should look into e.g. the C++ or Python libraries built on top of gRPC-Core to see how they handle this; maybe a search for the corresponding "wait for event" function could help.

@adirburke
Copy link

From thread_manager.cc
switch (work_status) { case TIMEOUT: // If we timed out and we have more pollers than we need (or we are // shutdown), finish this thread if (shutdown_ || num_pollers_ > max_pollers_) done = true; break;

`
// If the return value is TIMEOUT:,
// - ThreadManager WILL NOT call DoWork()
// - ThreadManager MAY terminate the thread depending on the current number
// of active poller threads and mix_pollers/max_pollers settings
// - Also, the value of timeout is specific to the derived class
// implementation

`

@MrMage
Copy link
Collaborator

MrMage commented Nov 13, 2019

I am about to fix this in #630.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants