Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getipaddrs() ignores IPv6 addresses #30604

Closed
mgkuhn opened this issue Jan 5, 2019 · 19 comments
Closed

getipaddrs() ignores IPv6 addresses #30604

mgkuhn opened this issue Jan 5, 2019 · 19 comments

Comments

@mgkuhn
Copy link
Contributor

mgkuhn commented Jan 5, 2019

I would have expected the new getipaddrs() function (#30349) to show on my dual-stack Linux machine both the global scope IP addresses listed by hostname -I, but currently (as of 1.2.0-DEV.100) it only outputs its IPv4 address:

julia> using Sockets

julia> getipaddr()
ip"128.232.65.23"

julia> getipaddrs()
1-element Array{IPv4,1}:
 ip"128.232.65.23"

shell> hostname -I
128.232.65.23 2001:630:212:238:ea40:f2ff:fe0b:d1eb 

The current docstring does not point out that getipaddrs() only returns IPv4 addresses. The returned type does not even have any provisions for IPv6 addresses.

I don't think Julia should add in 2019 new API functions that only support IPv4. Instead, the Julia API should encourage a network programming style that works equally well on machines with IPv4-only, IPv6-only and IPv4+IPv6 interfaces. Many people have dual-stack IPv4/IPv6 network interfaces configured these days, for many home users IPv4 is now only provided by carrier-grade NAT (i.e., the less preferred protocol), and big data centers increasingly use IPv6-only subnets internally (as IPv4 addresses trade for nearly $20 these days).

There are also more types of addresses to distinguish in the filter parameters than just lo, e.g. a server who sends out invitations for reverse connections might want to avoid listing RFC 4149 temporary addresses.

@jpsamaroo
Copy link
Member

I'm spending a bit of time trying to get a PR together to fix this, but before I get too far with a certain approach, how do we want to reconcile this in the API? Would we just widen the output of getipaddrs() to Vector{Union{IPv4,IPv6}}, or would we instead want to have something like getipaddrs6()? The logic of getipaddrs() is already simple enough to extend to do the former approach, and the latter approach is of course just as simple from what I've seen so far. Maybe @StefanKarpinski or @vtjnash would be best to answer this?

@StefanKarpinski
Copy link
Member

Vector{IPAddr} would be fine as a return type; the only builtin instances of IPAddr are IPv4 and IPv6 but if that were to expand in the future, the vector type would not have to change.

@jpsamaroo
Copy link
Member

Now that I think about this, wouldn't that make this into a (minor) breaking change? Assuming we want to be consistent and apply this logic to getipaddr() as well, some code could be relying on getipaddr returning only IPv4 addresses, especially given that a lot of people still think of an "IP address" as IPv4 only.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jan 5, 2019

Returning all IP addresses by default seems better; one could write getipaddrs(IPv4) and getipaddrs(IPv6) to only get IPv4 and IPv6 IP addresses, respectively. The only issue with that is what getipaddr() should do—I think it currently only returns IPv4 addresses, right? Having getipaddr6 seems not very generic or generalizable. We could have getipaddr(IPv4) and getipaddr(IPv6). Would it make sense to return IPv4 addresses before IPv6 addresses and have getipaddr() return getipaddrs()[1]? That way if there are any IPv4 addresses, you'll get one of those and if there aren't you'll get an IPv6 address; only if there aren't any of those do you get an error.

@jpsamaroo
Copy link
Member

Yes, getipaddr only currently returns an IPv4 address. I agree that it isn't very generic, I'm just trying to sidestep a breaking change in my PR, if we consider that desirable. If not, I'm happy to return IPAddrs instead of just IPv4 or IPv6, and remove the ugly "6" suffix.

Do note that the 1-argument version of each function is already taken, so we'd probably want to make a 2-argument function along the same lines. I would prefer that approach.

And AFAICT, libuv already returns IPv4 addresses first, although I'm not sure if that's guaranteed in their API.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jan 5, 2019

I'm just trying to sidestep a breaking change in my PR, if we consider that desirable.

I think the design I outlined would also be non-breaking since it would return an IPv4 address in every situation where it did previously. The behavior would only change if there are no IPv4 addresses which would currently be an error. Summarizing:

  • Make the return type of getipaddrs() be Vector{IPAddr}.
  • Return IPv4 addresses sorted before IPv6 addresses.
  • Maake getipaddr() return getipaddrs()[1] or error if there are none; thus, we prefer using an IPv4 address over IPv6 if there are any and error only if there are no addresses at all.
  • Make getipaddrs(IPv4) and getipaddrs(IPv6) return only IP addresses of the indicated type, returning a vector of the requested element type, i.e. Vector{IPv4} and Vector{IPv6}.
  • Make getipaddr(T) return getipaddrs(T)[1] so that you can require an IPv4/6 type of address.

And AFAICT, libuv already returns IPv4 addresses first, although I'm not sure if that's guaranteed in their API.

We may as well guarantee it ourselves since you're never going to have all that many IP addresses.

@jpsamaroo
Copy link
Member

I definitely misinterpreted what you were saying; I agree with your summary. I'll update the PR a little later, and also fix the test failures (seems I forgot to test the loopback stuff locally).

@mgkuhn
Copy link
Contributor Author

mgkuhn commented Jan 5, 2019

Sounds all good to me. The only additional suggestion I have is that at least the docstring for getipaddr() should deprecate it in favor of getipaddrs(), to raise awareness that using the old function is bad practice.

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Jan 6, 2019

What's the point of preferring IPv4 over IPv6 for ipaddr() if both are available?

If it's because of what's done elsewhere* isn't that just legacy since IPv6 did not exist, and Julia can have its own rules? In some faraway future we may all only have IPv6, and at least by then this "i'm getting IPv4" assumption that people might have will be broken (better to surprise people soon?). Why not prefer IPv6 now? Don't we WANT to use that if both are available?

If it's because what Julia 1.0 did, isn't Julia 1.1 (or 1.2), rather than Julia 1.0.x, to break such assumptions, and we should just document?

  • nst.sourceforge.net/nst/docs/scripts/getipaddr.html

By default, IPv4 (Version: 4) Addresses are displayed. If you include this option, we will display the IPv6 (Version: 6) Addresses instead.

If I'm right, we may also want to have ipaddrs() returrn IPv6 first (and "getipaddr() return getipaddrs()[1]").

@StefanKarpinski
Copy link
Member

to raise awareness that using the old function is bad practice.

Why is it bad practice? If you just want some IP address isn’t it fine to call getipaddr?

@StefanKarpinski
Copy link
Member

What's the point of preferring IPv4 over IPv6 for ipaddr() if both are available?

Partly because it’s what 1.0 does so we don’t want to break working code. Also because we still live in a world where IPv4 is much more common and is the most likely to “just work”.

This overall design is mostly agnostic about which version of IP is used, however, so it does pave the way for switching the default in Julia 2.0 without much disruption.

@jpsamaroo
Copy link
Member

I would also agree that in most cases, getipaddr() is probably not the right thing to use, at least the way I expect most people would use it. I would expect that someone might write a package/application that uses getipaddr as an attempt to determine which IP is connected to the user's local subnet/internet gateway. This will work for most users, but will fail miserably with more complicated or unusual network configurations (like with VPNs or other tunnel mechanisms). It also may encourage the (incorrect) assumption that the "primary" interface can only have a single IP address.

In general, unless you really don't care about anything other than just getting a "working" IP address (I can't think of any examples where this is the case), using getipaddrs instead is the correct function to use to make robust software.

@StefanKarpinski
Copy link
Member

Which leads to this question: if someone does need an IP address, what is the right way to pick it? Or should they generally be using a set of IP addresses instead of a single one?

@jpsamaroo
Copy link
Member

jpsamaroo commented Jan 6, 2019

IMO, a better question to ask, is why would one need a single IP that's registered by their OS, without getting any useful context about it (routes, interface, netmask, etc.)? If you need some arbitrary IP, just hardcode it, that's much less error prone than relying on the OS to provide one (which isn't guaranteed when you exclude loopback).

So yes, I think providing a list of all registered IPs is the only API that makes sense, unless we choose to expose a lot more information about network details via Sockets stdlib (which would be cool, but maybe out of scope of a Julia stdlib?).

@mgkuhn
Copy link
Contributor Author

mgkuhn commented Jan 6, 2019

Are there any other modern programming-language API's with a function to give you just one single IP address of the local machine? All the ones I know return linked lists or arrays of addresses:

That's for good reasons, as multiple interfaces and multiple-addresses per interface are very common: multi-homed machines, IPv4/IPv6 dual-stack, VPNs, virtual machine interfaces, virtual hosts, service-based addresses, etc. are a fact of life and giving programmers an API that pretends otherwise may lead to naive code that breaks in such situations. (The home PC I am typing this on has currently 7 IPv4 and 3 IPv6 addresses ...)

What do you want to know the local IP address(es) for?

The only application example for getipaddr() that I found in Julia itself is in stdlib/Distributed/cluster.jl:init_bind_addr() where workers (that haven't been told on the command-line with --bind-to what address to use) call getipaddr() to pick their own socket address, and then fall back to 127.0.0.1 if they have no IPv4 address at all. This will certainly fail on IPv6-only machines and may cause unexpected surprises on multi-homed machines (multiple interface cards, VPN connections, policy routing, virtual hosts, etc.), where only a subset of addresses may be routable.

What stdlib/Distributed/cluster.jl:init_bind_addr() probably should do instead of calling getipaddr() is to consult split(ENV["SSH_CONNECTION"], ' ')[3], where sshd already tells the worker under which IP address it was reached successfully. As a fall-back, the worker could also just listen on all interface addresses, and provide the caller with no IP address, in which case the caller should then just resolve the hostname that it gave to the ssh client, and iterate through the resulting addresses (e.g. in RFC 3484 order), as this is likely to lead to the same host address that is already known to work for ssh. At no point here does calling getipaddr() seem useful.

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Jan 14, 2019

What's the point of preferring IPv4 over IPv6 for ipaddr() if both are available?

Partly because it’s what 1.0 does so we don’t want to break working code. Also because we still live in a world where IPv4 is much more common and is the most likely to “just work”.

This would only be an argument if IPv6 didn't actually work, while it's been supported by default for years on all platforms.

IPv6 is supported in all operating systems Julia supports (and if not configured or no supported IPv4 would be returned), including FreeBSD (and all we likely want to support, iOS and all supported Android back to Lollipop); in all versions of them (back to Windows Vista):

https://en.wikipedia.org/wiki/Comparison_of_IPv6_support_in_operating_systems

[All except Android also support DHCPv6, and non-support may not matter; neither for ND RDNSS that I'm not familiar with, it seems to be an alternative to DHCHv6.]

I googled prefer IPv4 or IPv6 to see what others do and this may or may not apply https://support.microsoft.com/en-us/help/929852/guidance-for-configuring-ipv6-in-windows-for-advanced-users

Possibly we would want to check the Registry key to choose what to prefer.

There are also some comments there:

https://community.spiceworks.com/topic/1986162-prefer-ipv4-over-ipv6

@StefanKarpinski
Copy link
Member

The fact that IPv6 works is not a justification for making a breaking change.

@jpsamaroo
Copy link
Member

The breaking change we should be making in the future would be the removal of getipaddr entirely, but for now let's only consider the getipaddrs function (which yes, I need to fix in my PR) so that we can get this issue partially addressed in the 1.x timeframe. We can always revisit IPv4 vs IPv6 once the 2.0-DEV merge period begins.

@StefanKarpinski
Copy link
Member

Agree with all of that, @jpsamaroo. I think that for now getipaddrs() should return all public IP addresses with IPv4 sorted first and getipaddr() can just be a non-recommended convenience function for getipaddrs()[1] but with a better error message when the list is empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants