Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isReady may remain false in MessageListWeb #3080

Closed
jainkuniya opened this issue Oct 26, 2018 · 4 comments
Closed

isReady may remain false in MessageListWeb #3080

jainkuniya opened this issue Oct 26, 2018 · 4 comments

Comments

@jainkuniya
Copy link
Member

jainkuniya commented Oct 26, 2018

We are using postMessage to communicate between react app and message list (i.e WebView).
postMessage is acting as a bridge between whole app and webView.
Whenever there is a update in app e.g.: typing events or new message etc we send html content to WebView via this postMessage bridge. And similarly we get scroll, touch/click events from WebView (Message List) to react native app via the same bridge postMessage.

https://github.com/zulip/zulip-mobile/blob/master/docs/architecture/react.md#the-exception-messagelistweb

One thing we need to make sure is, bridge should be successfully established before we start communication between two components. Else we will be sending messages from one side while the other side is not yet ready to receive/process messages. Thus this will cause in consistency and the updates will not be observed in the real time.

On current master, we are sending isReady message whenever webView is ready to receive/send/process messages. But the the problem is, what if when webView send isReady message at that time RN part is not ready or this event is not received by RN? We are just ignoring this case currently.
Then according to this RN will always thinks that webView is not ready and will never send any new message to webView, and keep on waiting till it hears isReady from webView.

On current master, one way hand shake is implemented (explained in above paragraph).

Which starts causing problems on Android 9. Issue like:

oh, it's a mix. Sometimes loading bars, sometimes not

(https://chat.zulip.org/#narrow/stream/48-mobile/subject/messages.20don't.20load/near/655849)

  • etc etc (actually whole communication seems to be broken)

Solution: Improve/enhance this hand shaking mechanism. Implement a two way hand shake.
Keep sending a message from one side till a confirmation from other side is heard/received.
I will prefer to keep this one side as RN one, because we have more control on RN one and it is easier to debug in RN.

(Sending PR in few minutes)

@zulipbot claim

jainkuniya added a commit to jainkuniya/zulip-mobile that referenced this issue Oct 26, 2018
Implement a two way hand shake. Send a ping from app to webView till a
ping is received from webView.

Refer zulip#3080 for more details.

Fixes: zulip#3080
jainkuniya added a commit to jainkuniya/zulip-mobile that referenced this issue Oct 26, 2018
Implement a two way hand shake. Send a ping from app to webView till a
ping is received from webView.

Refer zulip#3080 for more details.

Fixes: zulip#3080, zulip#3078
@borisyankov
Copy link
Contributor

Great find, @jainkuniya Also, thanks for the quick reaction.
I have some comments regarding the code itself but will do it as a review in the PR.

Let's work a bit on researching the exact cause and time this issue appeared.
Few people reported this started for them just a few days ago. That is interesting.
It led us (at least me) to believe that some code change we did recently did introduce that.
On the other hand, being Android version-specific makes sense as I have not seen this myself, and I am yet on Android 8, as Samsung is slow to update.

It is a bit strange that our component will render and the HTML, CSS and JS be ready and executing before it is even capable of receiving messages.
We need to investigate this further and be very clear why this happen and what else might be possible to happen (as this is absolutely key functionality for our app).

@gnprice
Copy link
Member

gnprice commented Oct 27, 2018

Hmm, this is an interesting possibility! Thanks @jainkuniya for raising this.

I agree with Boris that it seems a bit surprising that this would happen, but I think well within the range of surprising things that are possible. Also definitely agree that I want to understand clearly what other things can happen.

It remains puzzling to me that this would only just start happening to some people, going from rare to 100% of the time, in the past week. We haven't sent a new release to beta or prod in several weeks, and none of the people reporting symptoms (of #3078 and your fourth bullet) are alpha testers. Also we have those reports on Android versions 7.1.2, 8.0.0, and 9.

One plausible hypothesis, really the only one I have: an update to the Chrome app, which powers the WebView. Apparently 70.0.3538 was released 2018-10-17, i.e. last week; and I'm still on the previous version, and I haven't yet seen these symptoms. They certainly roll these things out gradually, so perhaps the people seeing this are people whose devices have gotten that update.

(That'd still leave open the questions Boris mentioned, which we'll still want to understand -- but it'd give us a lead on answering them, among other things.)

@gnprice gnprice changed the title android 9: isReady is always false in MessageListWeb. isReady may remain false in MessageListWeb Oct 27, 2018
@gnprice
Copy link
Member

gnprice commented Oct 28, 2018

See updates on #3078. Most happily, we have a way to reproduce! It looks like the Chrome 70 update is indeed the trigger.

This thread is a plausible hypothesis for what's causing #3078. The next step is to nail down whether this is really what's happening, by watching closely with the debugger while reproducing #3078.

jainkuniya added a commit to jainkuniya/zulip-mobile that referenced this issue Oct 29, 2018
Implement a two way hand shake. Send a ping from app to webView till a
ping is received from webView.

Refer zulip#3080 for more details.

Fixes: zulip#3080, zulip#3078
jainkuniya added a commit to jainkuniya/zulip-mobile that referenced this issue Oct 29, 2018
Implement a two way hand shake. Send a ping from app to webView till a
ping is received from webView.

Refer zulip#3080 for more details.

Fixes: zulip#3080, zulip#3078
jainkuniya added a commit to jainkuniya/zulip-mobile that referenced this issue Oct 29, 2018
Implement a two way hand shake. Send a ping from app to webView till a
ping is received from webView.

Refer zulip#3080 for more details.

Fixes: zulip#3080, zulip#3078
jainkuniya added a commit to jainkuniya/zulip-mobile that referenced this issue Oct 30, 2018
Implement a two way hand shake. Send a ping from app to webView till a
ping is received from webView.

Refer zulip#3080 for more details.

Fixes: zulip#3080, zulip#3078
gnprice pushed a commit to jainkuniya/zulip-mobile that referenced this issue Oct 30, 2018
In the message list, we use `postMessage` to communicate between the
JS code running inside the message list's WebView and the JS code
running in the React Native environment.  In the moments after the
WebView is first created, the `postMessage` channel may not yet be
functioning; so we watch for an indication that it is, and queue up
early messages to be sent then.  See b06d2cd and its parent for
where we introduced this.

The indication we watch for is that `window.postMessage` inside the
WebView has a `length` property of 1 (meaning it takes 1 required
argument), as a sign that RN has injected its modified `postMessage`
function.  This worked because the browser's own `window.postMessage`
had a length of 2.

Starting with Chrome 70, the browser's `window.postMessage` has a
length of 1, just like RN's modified version.  So our test succeeds
immediately, and we prematurely signal that `postMessage` is ready.
The app never gets the signal, and never sends messages into the
WebView.  This is zulip#3080, with severe symptoms like zulip#3078.

Fix the problem by switching to a round-trip, end-to-end handshake:
keep sending "ready" messages until we hear one echoed back.

Fixes zulip#3080.
Fixes zulip#3078.

[greg: diagnosed `window.postMessage.length` change;
 wrote commit message.]
@gnprice
Copy link
Member

gnprice commented Oct 30, 2018

Copying here FTR: I investigated this a bit more while preparing #3081 for merge, and got to a more specific diagnosis of what changed in Chrome 70 that caused our code to go wrong. Here's what I wrote in a comment there:


OK, and while revising the commit message I thought more about what might actually be going on here -- and then I pinned it down. 😄

The issue was here:

const waitForBridge = () => {
  if (window.postMessage.length === 1) {
    sendMessage({ type: 'ready' });

That window.postMessage.length === 1 test is intended to check whether we have the modified postMessage injected by RN, or still have the one the browser provides. Before Chrome 70, the one the browser provides takes 2 required parameters, and its length is 2.

Starting with Chrome 70, the browser's native postMessage has a length of 1. That's the same value as RN's substitute version has, which we're looking for -- so this test succeeds immediately, we try to send the ready message even though we're not in fact yet ready, it goes to the browser's postMessage, and it doesn't make it to our RN-hosted code waiting outside.

I'll incorporate that into the commit message.


See also my commit message on 6e42cb8.

The window.postMessage.length test is a thing we knew was a hack when we added it; with this bug we basically reaped what we sowed. The new mechanism from #3081 should be much more robust. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants