Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass only interface name to syscall.util.if_nametoindex() #747

Merged
merged 1 commit into from
Feb 17, 2016

Conversation

aperezdc
Copy link
Contributor

@aperezdc aperezdc commented Feb 9, 2016

The syscall.util.if_nametoindex() only needs to be passed the name of the interface. The version of the function which accepts an open socket as second argument is internal to ljsyscall, and not even exported to client code.

Also, this moves the call to syscall.util.if_nametoindex() to the top of RawSocket:new() to simplify error handling: this way one less explicit call to syscall.close() is needed in case of errors.

The syscall.util.if_nametoindex() only needs to be passed the name of the
interface. The version of the function which accepts an open socket as second
argument is internal to ljsyscall, and not even exported to client code.

Also, this moves the call to syscall.util.if_nametoindex() to the top of
RawSocket:new() to simplify error handling: this way one less explicit call
to syscall.close() is needed in case of errors.
@mention-bot
Copy link

By analyzing the blame information on this pull request, we identified @dpino, @alexandergall and @eugeneia to be potential reviewers

@aperezdc
Copy link
Contributor Author

aperezdc commented Feb 9, 2016

Some background on how I got to write this patch: while adding support for running the lwAFTR using TAP interfaces (see Igalia#236), one of my tests was to use a TAP interface for the B4 side and RawSocket for the Internet side, and it would crash with:

apps/tap/tap.lua:46: Failed read on aftr: Bad file descriptor
stack traceback:
        core/main.lua:122: in function <core/main.lua:120>
        [C]: in function 'error'
        apps/tap/tap.lua:46: in function 'method'
        core/app.lua:76: in function 'with_restart'
        core/app.lua:287: in function 'breathe'
        core/app.lua:241: in function 'main'
        program/lwaftr/run_nohw/run_nohw.lua:112: in function 'run'
        program/lwaftr/lwaftr.lua:17: in function 'run'
        core/main.lua:41: in function <core/main.lua:32>
        [C]: in function 'xpcall'
        core/main.lua:151: in main chunk
        [C]: at 0x004511b0
        [C]: in function 'pcall'
        core/startup.lua:1: in main chunk
        [C]: in function 'require'
        [string "require "core.startup""]:1: in main chunk

Using strace I found out that at the Tap application would be issuing a few hundreds of calls to read(), then close() would be called on the FD, and the next call to read() would cause the above error.

Then I added calls to print(debug.traceback()) inside syscall.close() to better understand from where the calls were coming. During the initialization, in RawSocket:new(), which ends up calling if_nametoindex(), file descriptor 5 gets closed, and then the same number gets assigned to the Tap socket FD, and then after a while a __gc metamethod kicks in and closes the same FD number 5 again, which causes the next read() to fail:

close(5)
stack traceback:
        syscall/syscalls.lua:80: in function 'close'
        syscall/linux/util.lua:50: in function 'if_nametoindex'
        apps/socket/raw.lua:20: in function 'new'
        core/app.lua:165: in function <core/app.lua:162>
        core/app.lua:201: in function 'apply_config_actions'
        core/app.lua:110: in function 'configure'
        program/lwaftr/run_nohw/run_nohw.lua:90: in function 'run'
        program/lwaftr/lwaftr.lua:17: in function 'run'
        core/main.lua:41: in function <core/main.lua:32>
        [C]: in function 'xpcall'
        core/main.lua:151: in main chunk
        [C]: at 0x004511b0
        [C]: in function 'pcall'
        core/startup.lua:1: in main chunk
        [C]: in function 'require'
        [string "require "core.startup""]:1: in main chunk
TAP socket FD: 5
...
close(5)
stack traceback:
        syscall/syscalls.lua:80: in function 'close'
        syscall/methods.lua:145: in function '__gc'
        apps/socket/raw.lua:43: in function 'can_receive'
        apps/socket/raw.lua:37: in function 'method'
        core/app.lua:76: in function 'with_restart'
        core/app.lua:287: in function 'breathe'
        core/app.lua:241: in function 'main'
        program/lwaftr/run_nohw/run_nohw.lua:112: in function 'run'
        program/lwaftr/lwaftr.lua:17: in function 'run'
        core/main.lua:41: in function <core/main.lua:32>
        [C]: in function 'xpcall'
        core/main.lua:151: in main chunk
        [C]: at 0x004511b0
        [C]: in function 'pcall'
        core/startup.lua:1: in main chunk
        [C]: in function 'require'
        [string "require "core.startup""]:1: in main chunk
apps/tap/tap.lua:47: Failed read on aftr: Bad file descriptor
...

That's what led me to look at syscall.util.if_nametoindex() to see how it works and understand why it would cause the file descriptor it used to be GCd later on. There's where I noticed that the function was incorrectly passed an additional argument, and decided to move the call to the beginning of RawSocket:new() to simplify error handling, in an attempt to avoid the file descriptor for the socket being GCd later on.

(Related to this, I have also posted a patch for ljsyscall to immediately close sockets on errors when calling the lower-level implementation of if_nametoindex(), see justincormack/ljsyscall#188.)

@eugeneia eugeneia mentioned this pull request Feb 10, 2016
@eugeneia
Copy link
Member

@aperezdc Do we need the justincormack/ljsyscall#188 patch for this patch to work?

@aperezdc
Copy link
Contributor Author

@eugeneia: This patch is enough by itself to fix the issue.

(The other patch for ljsyscall is not technically needed: it avoids having the file descriptor open waiting for the GC to collect it in case of an error, by explicitly closing it. I made the patch to ljsyscall as a small enhancement while investigating this issue.)

@eugeneia
Copy link
Member

Okay, then I am merging this into max-next right away. Great detective work in this PR! I like when there are a lot of references to give context. :-)

eugeneia added a commit that referenced this pull request Feb 10, 2016
@eugeneia eugeneia self-assigned this Feb 11, 2016
@lukego lukego merged commit 5f45d3a into snabbco:next Feb 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants