Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WebSocket support with API for connecting to external devices #744

Open
TheLastProject opened this issue May 24, 2016 · 12 comments
Open
Labels
enhancement New feature or request

Comments

@TheLastProject
Copy link
Contributor

Reasoning

WhatsApp has a "web client" which allows the user to run a browser application that communicates with WhatsApp on their smartphone. These "web clients" tremendously help usability, because it is much easier to type larger amounts of text on a computer than on a smartphone. This system also protects less tech-savvy Kontalk users from leaking their private key to untrustworthy or improperly secured devices.

The downside of this kind of client that it is not usable without another device already running Kontalk. However, I feel this is of limited issue, seeing how Kontalk already supports basic federation with XMPP and will improve this in the future. Therefore, users who want to use a web client to chat with Kontalk users can create an XMPP account on another server and use one of the many existing web-based XMPP clients.

Note: In this document, I use the term web client to tell Android and non-Android apart easily, but there is no reason why the client could not be a desktop client, smartwatch client or anything else. The parts that describe the behaviour of the web client are only mentioned for clarity's sake, and are obviously not the responsibility of the Android client.

Necessary additional functionality in the Kontalk Android client

  • Websocket support
  • Support for scanning a QR code (already desirable to be able to confirm the public key of a friend matches the expectation in the identity dialog. Identity information #456 is the first issue that comes to mind, but this may need its own issue)
  • Simple JSON(?) Websocket API, exposing the following functionality:
    • Retrieving contact list
    • Retrieving messages belonging to contact (perhaps limited, so that the web client cannot extract the complete conversation history)
    • Sending a message
    • Retrieving media related to a specific message
    • Sending media

Intended workflow

Note: All IP and port numbers are examples, the actual IP and port numbers will vary depending on the client.

  1. The web client starts a websocket server on 192.168.0.11 on port 1234.
  2. The web client shows the user a QR code containing the IP and port with the following instructions: "Please go to connect other device on your phone/tablet and scan the QR code above."
  3. The user navigates to an option called connect other device and is presented with a QR code scanner with the following message: "Make sure you are on the same WiFi network as the device you are trying to connect to and scan the QR code to connect to the device".
  4. The user scans the QR code and the Android client starts a secure web socket connection to the given IP address and port and shows the following warning: "We are connecting from 192.168.0.10. If the device you are connecting to shows this number on the screen, all is fine. If it shows another number, someone is attacking your device and you must press cancel now.
  5. The web client displays: "Incoming connection from 192.168.0.10. Confirm on your phone/tablet to continue.
    • If the user accepts the warning, the Android client sends a "HELO" packet, notifying the web client that the connection has been set up succesfully.
    • If the user presses cancel, the Android client closes the WebSocket connection and reverts to normal state. The web client will display a "connection lost" message.
  6. The Android client starts to display a screen saying "You are currently using Kontalk from another device", with a big disconnect button. This disconnect button (or the back button) is the only way to go back to using Kontalk from the phone again (this should be discussed, as it is not the most user-friendly option, but should simplify secure implementation and make it extremely obvious for the user a client is connected).
  7. The Web client can now request the desired information using the APIs and will receive notifications from the Android client when a new chat message is received (in the future, possibly more events).
  8. When the connection closes for any reason, the Android client closes the socket and reverts to normal state. A new connection will have to be made using the QR code scanning method.

In this system, the QR code is used as authentication, making it very simple to authenticate a system and prevent unwanted connections. For security reasons, the Android client will NEVER run a server and will ONLY connect as a client after specifically being instructed to do so, disabling all related functionality AS SOON AS the connection dies.

Reference material

These are just the first things I found, there may be better references and libraries.

Android WebSocket library
Android WebSocket example
Android JSONObject
Android JsonReader

Notes

  • We will want to use wss:// (secure WebSocket), not ws:// (plain, insecure WebSocket). We probably need to find out how to trust unsigned certificates here. Trusting unsigned certificates does add the opportunity for a MitM attack. By showing the IP address on the screen and requiring confirmation (step 4 and 5), this risk is mitigated for all users that are not extremely irresponsible (there's only so much we can do).
  • I am willing to take the development of a web client using this system on myself.
  • This system could, in a modified state, also be used to help fix Multiple devices support sharing same key material #122 by sending the key over this protocol. However, that will both require the Kontalk Android clients to also have a WebSocket server component and needs additional security measures to prevent web clients that should not be able to access the private key from accessing it.
@daniele-athome daniele-athome added the pending Issue is pending further analysis label May 26, 2016
@daniele-athome daniele-athome self-assigned this May 26, 2016
@daniele-athome
Copy link
Member

Thanks @TheLastProject for the very thorough explanation and of course thanks for your volunteering for this.
I think there is one small detail that could jeopardize the whole idea: browsers can't open WebSocket servers as far as I know. Can you please confirm that? I did some research and it seems to me the WebSocket browser spec allows only for client connections. Although you specify that "web client" could be anything, I'm guessing the browser would your primary target, right?

If indeed WebSocket servers can't be used on the browser, we might use the Kontalk server as an encrypted and secure support channel to pass data through. It will open a whole new can of worms though (and it's a totally different idea anyway - just throwing stuff right now).

EDIT: I did some more research, and there seems to be some effort to standardize peer-to-peer communication within the browser with HTML 5 and WebSockets, but it's still rough and not very precise; it does not support raw TCP/IP as far as I can see and it doesn't support encryption either.
Source: https://www.w3.org/TR/2008/WD-html5-20080122/#peer-to-peer

@TheLastProject
Copy link
Contributor Author

You indeed raise a very valid point. The way I see it, the best way is probably to use a connection broker like Peer.js, which allows two clients to connect to each other through a central server. After the connection is set up, the server is no longer needed. I tested this with 2 Peer.js clients and the Peer.js connection broker server, which I think killed, and the connection stayed stable. Assuming the Peer.js connection broker also works with other WebRTC clients, this is probably a great solution.

Using a connection broker like that does remove the nice authentication system we have in step 4 and 5, but a possible workaround would be to have the web client generate a random number and put this in the QR code together with their Peer.js ID (making sure the number is only transmitted over the QR code). Then, the web client and Android client could both show this number and the Android client could ask the user to confirm it. This would make it extremely unlikely for the broker server to send a connection to the wrong device without it being noticed, as the "bad device" would also have to generate the exact same number.

@daniele-athome
Copy link
Member

To broker connections, PeerJS connects to a PeerServer. Note that no peer-to-peer data goes through the server; The server acts only as a connection broker. (peerjs.com)

Nice. We could host a PeerJS instance on the Kontalk server.

Then, the web client and Android client could both show this number and the Android client could ask the user to confirm it.

Wouldn't be enough to just insert the random number inside the QR code like you said, and avoid the manual number typing step? Did you mean it as an additional confirmation step for security reasons?

@TheLastProject
Copy link
Contributor Author

The whole inserting a random number is indeed meant it as additional (and possibly paranoid, but I prefer to err on the side of caution) confirmation step, in the case the PeerServer would, for whatever reason, connect someone to the wrong system (which you would notice pretty quickly, but this extra confirmation would allow the user to notice it before the other system is allowed to request any information).

I did however mean showing the number both on the web client and in the Android client and letting the user simply press "Continue" or "Cancel" on the Android client instead of typing the number manually, sorry for my unclarity there.

@daniele-athome
Copy link
Member

I did however mean showing the number both on the web client and in the Android client and letting the user simply press "Continue" or "Cancel" on the Android client instead of typing the number manually, sorry for my unclarity there.

No problem, ok I got it now.

If we use a broker, does it mean over-the-Internet connections are possible too? Security concerns aside for a moment.
Anyway, I still have to understand if the broker is actually needed for a local network (I mean if the connection is p2p, one of the two browsers must have a listening socket somewhere). I'll dive into WebRTC API and do some tests.

@daniele-athome
Copy link
Member

Ok let's say we use PeerJS or ICE or whatever method to handshake the connection. It doesn't matter where the two devices are. We can allow over-the-Internet connections (with proper encryption and security measures).

Anyway I like the idea so let's proceed with further research. If you'd like you may draft some specs for the protocol (APIs) that we should use, we'll discuss it together when the time comes - of course after the proper research have been done. Use the wiki in this repository to share your research if you want.

I'll set this to 4.0.0 for now and we'll see how the group chat development efforts will drive the overall implementation of this feature.
I'll go one step further (showing belief in this feature :-) and make a proposal to you (even if it might be a bit premature at this point): if you take on the development of this web interface you may join our team and I'll setup a project inside the GitHub organization, much like what @abika did for the desktop client.

@daniele-athome daniele-athome added enhancement New feature or request and removed pending Issue is pending further analysis labels May 27, 2016
@daniele-athome daniele-athome added this to the 4.0.0 milestone May 27, 2016
@daniele-athome daniele-athome removed their assignment May 27, 2016
@TheLastProject
Copy link
Contributor Author

TheLastProject commented May 27, 2016

Personally, I'd like to avoid having to use a broker when possible, but from my research it seems impossible to do it without one. I would love to be proven wrong, though!

I have played with PeerJS before and it indeed would allow us to set up connections between different networks, which I believe is a convenience feature WhatsApp lacks. If we can do this without a broker, though, it would be (slightly) more secure.

I'd happily write up a first version of what I'd like the API to do and put it on the wiki. How should I name the wiki page for this feature? Does [WIP] API for app-to-app communication sound okay? Also, are you sure about using the wiki in this repository? I feel that https://github.com/kontalk/specs/wiki is probably the best wiki to use for this, because if we later get clients for other platforms they should implement the same API for this feature.

Being able to join the team would be great, but I'm very willing to develop the web client in an own repository and later move it to the GitHub organisation when it gets accepted.

On the note of setting the milestone to 4.0.0, I'd like to urge you to not do so. While I agree it would be a cool feature, group chats is already a huge new thing that has the potential of having some small edge-case bugs after release. Having a start for this planned in the same period will probably cause a load of unexpected work after 4.0.0 and emergency bugfix releases are never fun.

@daniele-athome
Copy link
Member

I'd happily write up a first version of what I'd like the API to do and put it on the wiki. How should I name the wiki page for this feature? Does [WIP] API for app-to-app communication sound okay

Sure, whatever you believe is the right title.

Also, are you sure about using the wiki in this repository

No you're right. Since this will be highly WIP stuff, please use the specs wiki for now (I've granted you commit access so you can modify the wiki) for anything you'd like to note about the APIs, any research and/or insights you might find useful for the development. Think of it as a brainstorming notebook if you want. When it will reach a more defined state, we'll convert the wiki pages into .md documents in the specs repository itself.

@daniele-athome daniele-athome modified the milestones: 4.1.0, 4.0.0 May 27, 2016
@daniele-athome
Copy link
Member

And I've set it to 4.1.0 for now (absolutely not a requirement, I'm constantly changing the milestones; let's say it will be some time "after group chat").

@TheLastProject
Copy link
Contributor Author

Thanks for moving the milestone. Group chats definitely matter more.

For what it's worth, I've set up a first revision at https://github.com/kontalk/specs/wiki/[WIP]-API-for-app-to-app-communication. There still need to be changes, including documenting on how to start a new session, but at least it shows my general idea for the API. Feedback and/or questions from you (or other Kontalk community members) would be very welcome.

The API definitely looks a lot more complicated than it is, due to the ability to request a specific field instead of all data all the time, but should cover everything to manage the user, contacts and conversations, including sending and receiving messages.

@TheLastProject
Copy link
Contributor Author

I noticed that my WIP is very similar to JSON-RPC yet somewhat different. I'll probably go through it again sooner or later to reform it to use JSON-RPC and make it easier to parse and hopefully create with existing libraries.

@TheLastProject
Copy link
Contributor Author

I updated the wiki page, converting the whole system to JSON-RPC 2.0, which means it should become much easier to generate all this with existing libraries in all major languages, improving interoperability.

@daniele-athome daniele-athome removed this from the 4.2.0 milestone Nov 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants