On iOS Zulip will log out of realm/organization after force closing app and re opening. #3588

dalanmiller · 2019-08-13T09:09:15Z

Currently using a self-hosted instance with both Android and iOS. Android runs fine at the moment but on iOS the app will require logging into the organization every time the app is closed and started or opened after a long period of time.

How can I try to debug this issue or provide more information on what is going on?

ferman · 2019-08-13T18:12:19Z

I am experiencing the same issues. I am on a iPhone 6s on iOS 12.4.

I was just running Zulip on an android phone and it stays logged in perfectly fine.

gnprice · 2019-08-13T22:40:30Z

Thanks @dalanmiller and @ferman for these reports! One followup question for both of you:

Have you been seeing this on devices where the app has been installed for a while, and previously was staying logged in just fine?
Or have you seen it only on fresh installs that have never successfully stayed logged in?

We've also heard about this in a chat thread today, and we heard a report of it Friday in a support thread.

I reproduce this myself when I make a fresh install of v26.1.124, the latest, on an iPhone 7 with iOS 12.3. The report on Friday was on v26.0.123, so it also affects that version.

For me, the app was working just fine (staying logged in) before that fresh install. If it does only affect fresh installs (or, newly logged-into accounts on existing installs), then that's a clue because it suggests we're reading the stored data fine, and the bug specifically affects writing changes to it.

This is a pretty showstopper symptom, so we'll certainly be making a priority of getting to the bottom of it.

dalanmiller · 2019-08-13T23:21:46Z

Confirm that we're observing iOS app version v26.1.124. We're also running the most recent version of the Zulip docker container available.

gnprice · 2019-08-13T23:45:02Z

Update from some attempts to debug:

The issue continues to reproduce 100% of the time with v26.1.124 (from TestFlight, but should be the same binary as in the App Store) on my physical iPhone 7.
In a debug build on a simulated iPhone X, the issue did reproduce a few times... but then it didn't always, and now I'm finding it hard to reproduce there at all.

That's slowed down the investigation -- it's much easier to make progress when looking at a debug build with debug logs, adding a few lines of ad-hoc logging, rerunning with that, etc. Also the uncertainty of what might be confounding variables causes me to be paranoid about resetting the state extra-clean between trials, making the iteration cycle longer.

Probably the next thing I'll try is a debug build on that physical iPhone 7, and see if that does any better.

Things I did learn from when it was still reproducing in debug builds:

The call to write to the storage (AsyncStorage.setItem, a React Native facility) is returning success.
But on re-opening the app, we're not finding that data in the storage.
I also tried a getItem after the setItem returned success, and it consistently found that the data was there. That suggests that we're successfully getting the data into the native data structures of AsyncStorage, but AsyncStorage in turn is failing somehow to actually flush those writes to persistent storage.

I'm not 100% sure that those last observations about getItem didn't come from a run where the issue didn't repro, though. (It was shortly after that that it mostly stopped reproing in my debug builds.) If I'd taken better notes I'd know, I guess.

gnprice · 2019-08-14T00:20:32Z

From a different angle:

26.0.123 went out to all users on 2019-08-06. We got the first report a bit under 72h later, and a series of further reports today, starting last night (California time) with this issue.
26.1.124 then went out this morning, about 7 hours after this issue was filed. @dalanmiller you don't happen to have signed up to use the beta version from TestFlight, do you? If not, then your original report makes a second case of seeing the issue on 26.0.123.
The previous release to reach all users (i.e., more than our beta users) was 25.6.120. That went out a month ago, on 2019-07-12, and given the severity of this issue it seems likely we'd have heard about it sooner if this issue were present there.

So the bug was very likely introduced between 25.6.120 and 26.0.123. I'm looking at the code that changed in that interval.

One change was we upgraded RN from v0.57.8 to v0.59.10. That sure sounds like a plausible candidate.

And in fact there were a few changes to AsyncStorage in that range!

Looking closer, the one that seems like it could plausibly affect whether data gets successfully stored is this one: facebook/react-native@27b4d21 ... And looking closely at that, it seems quite unlikely to be the cause of this issue -- in particular it appears to only affect the codepath for deleting a key from the storage entirely.

I don't feel I understand the entire subsystem at this point and so I wouldn't 100% rule that commit out as the cause; but I think not.
We made some changes in our own code a layer or two above AsyncStorage that manages persisting the app's relevant data and migrating it through changes in our data structures between versions. Notably there were several changes in our src/boot/store.js.

A bug in that code could certainly cause this kind of symptom... though I don't have a good theory for why the bug would only affect iOS, as that's platform-independent JS code.

I'll likely dig into that lead some more.

dalanmiller · 2019-08-14T00:48:31Z

26.1.124 then went out this morning, about 7 hours after this issue was filed. @dalanmiller <https://github.com/dalanmiller> you don't happen to have signed up to use the beta version from TestFlight, do you? If not, then your original report makes a second case of seeing the issue on 26.0.123.

I can confirm TestFlight is not installed on this device.

gnprice · 2019-08-14T19:52:33Z

Probably the next thing I'll try is a debug build on that physical iPhone 7, and see if that does any better.

Notes from first efforts on that:

I've tried three configurations:

a freshly-installed debug build, unmodified from master, in its default state
a freshly-installed debug build, unmodified from master, but with RN's "remote JS debugging" enabled
a freshly-installed debug build, with RN's "remote JS debugging" enabled, and with a console.log call added which logs a bit of relevant info to the console provided by that remote JS debugging

"Remote JS debugging" means the Chrome DevTools, in my desktop Chrome browser, gets attached; and IIUC the JS all actually runs inside Chrome instead of in the JS implementation (JSC) that RN builds into the app.

So far the pattern is that the issue reproduces 100% of the time in the first case (no "remote JS debugging"), and fails to repro -- i.e. the app works normally -- 100% of the time in the second and third.

I was pretty sure I'd seen the issue repro yesterday in the simulated iPhone X with the devtools attached, and even with some ad-hoc logging code added. But while I tried taking out the logging code and was then concerned to see the issue continue to not reproduce, I don't think I tried going back to not even attaching the debugger.

If I can't reproduce with the debugger attached that will certainly add an obstacle in debugging it; I'll have to find another way to get info out. OTOH it may also be a clue.

gnprice · 2019-08-14T19:55:54Z

Completing a small matrix, I just tried:

a freshly-installed debug build, with that same console.log call added, but in the default state without the devtools attached

This isn't a super helpful setup in itself, because there isn't an immediate way to get those logs. But the issue indeed reproduced, fitting the pattern that it's all about whether the devtools are attached.

ferman · 2019-08-14T21:11:25Z

My install that I reported is on a fresh install of iOS and of the app. I am on v26.0.123.

gnprice · 2019-08-16T00:47:59Z

New information from some debugging just now:

The issue reproduces at commit dfbdd97, where we went from RN v0.57.8 to RN v0.59.10.
It doesn't reproduce at that commit's parent, namely e8ea477.
More generally, it reproduces at every commit that I've specifically tested that's after that point (3c25ad1 aka 26.0.123, and 6219e5a which is the latest on master); and it doesn't reproduce at any commit I've specifically tested that's before that point (cab7346 aka 25.8.122).

So the issue was introduced by the upgrade to RN v0.59.

A key ingredient here was what I learned yesterday, that so long as I don't enable remote JS debugging, the issue reproduces 100% of the time.

We made some changes in our own code a layer or two above AsyncStorage that manages persisting the app's relevant data [...].
A bug in that code could certainly cause this kind of symptom... though I don't have a good theory for why the bug would only affect iOS, as that's platform-independent JS code.

In particular this finding appears to rule out those changes of ours as the cause. They happened before the RN upgrade, and before 25.8.122, at which point the issue is absent.

As for what changes within RN might have caused this: in addition to the unlikely-looking small change in the AsyncStorage implementation that I discussed above, one thing that occurs to me is that RN upstream made significant changes to their JS runtime environment. It's possible that a bug in that -- or a bug in our (or a dependency's) JS code, that the new environment triggers and the old one didn't -- might somehow cause this failure. That would fit particularly nicely with the puzzle that the issue is suppressed by remote JS debugging -- because that debugging involves running the JS in an entirely different environment, namely my desktop Chrome browser.

A likely next thing I'll pursue is to bisect the changes in RN itself, in hopes of pinpointing the RN commit that introduces the issue. Possibly tricky to do, because other dependencies interact with the RN version. As a variation, I may attempt simply reverting some plausibly-relevant RN changes, or alternatively backporting them directly onto v0.57.8, to see if they affect the issue.

gnprice · 2019-08-16T02:19:35Z

Next finding: the issue

reproduces at RN commit v0.58.6~314 aka 8427f64e0;
doesn't reproduce at RN commit v0.58.6~328 aka c95fdb050.

In both cases I'm starting from our dfbdd97, the RN upgrade; I've edited our package.json to refer to a version like facebook/react-native#c95fdb050, and run yarn; and then rebuild in Xcode and attempt the usual repro.

(There's a wrinkle where our AwayStatusSwitch component doesn't work for some reason on some of these versions; that component doesn't really interact with any of this, so I commented out where we use that component, and carried on.)

So, something in that range of 14 commits introduced the issue! Or the trigger for some issue in another layer.

Those 14 commits are exactly the changes I mentioned here:

one thing that occurs to me is that RN upstream made significant changes to their JS runtime environment. It's possible that a bug in that [...] might somehow cause this failure.

so that guess turned out to be a good one.

A bit more about these changes:

$ git log --oneline --reverse v0.58.6~328..v0.58.6~314
e337bcafb Add JSI API and JSIDynamic glue to OSS React Native
a23fb1a22 Add JSI implementation for JSC
749b18dbc Add JSI-based JSExecutor for the bridge
53fa8130f Switch RN from JSCJavaScriptExecutorFactory to JSCExecutorFactory
6796e62bb Remove JSC references
c49d3653e iOS changes to switch to JSI
00381920c Remove jschelpers and privatedata
7ffb40651 Remove RN files which use JSC
e8cbc4f89 iOS files removed for switch to JSI
6370b86c1 Get RNTester to compile and run
f85692cf8 Get RN Podspec xcodeproject to compile
78fcf7c55 Get RNTester xcodeproj to compile
3341adac4 Ensure HelloWorld xcodeproj template works
8427f64e0 Remove unused code using JSC

and

$ git diff --shortstat --dirstat v0.58.6~328..v0.58.6~314
 108 files changed, 5269 insertions(+), 6157 deletions(-)
  18.0% React/React.xcodeproj/
   4.2% React/
   5.2% ReactAndroid/src/main/jni/react/jni/
   3.5% ReactCommon/cxxreact/tests/
  14.3% ReactCommon/cxxreact/
  16.6% ReactCommon/jschelpers/
  27.2% ReactCommon/jsi/
   5.5% ReactCommon/jsiexecutor/jsireact/

Working on bisecting further. That's harder than it ought to be, because unfortunately these changes seem to have been made rather messily 😞 -- several of the commits cause RN itself to fail to build at all, failures that are fixed some time later in other commits.

gnprice · 2019-08-16T02:32:06Z

Further notes, then I'll pick this up tomorrow.

Here's those 14 commits, along with my results on them so far:

* GOOD e337bcafb Add JSI API and JSIDynamic glue to OSS React Native
*      a23fb1a22 Add JSI implementation for JSC
*      749b18dbc Add JSI-based JSExecutor for the bridge
* GOOD 53fa8130f Switch RN from JSCJavaScriptExecutorFactory to JSCExecutorFactory
*      6796e62bb Remove JSC references
* skip c49d3653e iOS changes to switch to JSI
* skip 00381920c Remove jschelpers and privatedata
* skip 7ffb40651 Remove RN files which use JSC
*      e8cbc4f89 iOS files removed for switch to JSI
* skip 6370b86c1 Get RNTester to compile and run
*      f85692cf8 Get RN Podspec xcodeproject to compile
*      78fcf7c55 Get RNTester xcodeproj to compile
*      3341adac4 Ensure HelloWorld xcodeproj template works
* BAD  8427f64e0 Remove unused code using JSC

Here

GOOD -> things work; issue doesn't repro!
BAD -> issue repros
skip -> RN doesn't even build 😞 so I've been unable to tell whether the issue is present

In particular the first 4 commits of the series are in the clear. That leaves this diff as where the issue (or, again, the change that triggers an issue elsewhere) may lie:

$ git diff --shortstat --dirstat v0.58.6~324..v0.58.6~314
 86 files changed, 966 insertions(+), 6145 deletions(-)
  27.6% React/React.xcodeproj/
   6.5% React/
   7.7% ReactAndroid/src/main/jni/react/jni/
   5.4% ReactCommon/cxxreact/tests/
  22.0% ReactCommon/cxxreact/
  25.5% ReactCommon/jschelpers/

gnprice · 2019-08-16T21:09:56Z

Because it's not clear how quickly we'll have a diagnosis or a forward-fix here, I'm now preparing a release 26.2.125, for iOS only, which will roll back to 25.8.122 -- it'll be identical except for the version number.

That's the last release we sent to beta before 26.0.123; the only change between them was the RN upgrade and some related commits, and in particular I determined above that this issue doesn't appear there.

I'll continue working to debug this to find a more targeted fix. We want to be on the newer RN version for a number of reasons, not least of them that RN v0.57 cannot make an app that meets Google Play's requirements (as announced in 2017 and in effect from this month; see #3323 ). But once the stopgap release 26.2.125 works its way through Apple's process, the issue should be resolved for everyone who's been seeing it.

gnprice · 2019-08-16T22:13:04Z

I'm now preparing a release 26.2.125, for iOS only, which will roll back to 25.8.122 -- it'll be identical except for the version number.

Done, and submitted to Apple. If all goes well within their process, then most likely this stopgap version will be available in the App Store by Monday.

gnprice · 2019-08-19T21:50:18Z

That stopgap fix release, 26.2.125, went out to the App Store on Saturday morning.

Meanwhile, I've located the cause of the bug! Fix incoming, with more explanation; then I'll make a new release, so that iOS users can again have all the unrelated improvements we've made since the RN upgrade.

For iOS the latest prod release was 26.2.125, not 26.1.124. So the login issue #3588 is already fixed, and there's no need to repeat the note on it.

This adds a new "scheme", which Apple's docs (in an old version -- somehow with Apple the highest-quality docs are always being abandoned for hipper designs with less-informative text) define like so: An Xcode scheme defines a collection of targets to build, a configuration to use when building, and a collection of tests to execute. You can have as many schemes as you want, but only one can be active at a time. [...] When you select an active scheme, you also select a run destination (that is, the architecture of the hardware for which the products are built). https://developer.apple.com/library/archive/featuredarticles/XcodeConcepts/Concept-Schemes.html The scheme is the dropdown by the top left of the main Xcode window, where we usually select "ZulipMobile". (I think the "run destination" is the dropdown just to its right, where we select a specific device or emulator to run on, or else "Generic iOS Device" when building an artifact to upload to the App Store.) The new "ZulipMobile release-mode" is just like "ZulipMobile", except that Cmd-r ("Run") will do a release build and run that on the target device, rather than a debug build. The "schemes" like the existing one and this new one live at a layer of indirection above the "configurations", of which we have two, named "Debug" and "Release". They contain a very small amount of configuration of their own, and mainly just a list of by-name references to configurations. The configurations themselves contain quite a bit more detail. Some helpful discussion: http://rhult.github.io/articles/testing-release-builds/ We're in "the simple case" in that post's terms: both our debug and release configurations already use development certs for signing. (For an actual release, I guess the artifact gets re-signed at upload time?) I originally did this in August, while debugging zulip#3588; didn't commit it, in part because I didn't feel I understood what it was doing. This version is slightly cleaned up and I've added this commit message, but it has the same substance as the config I left uncommitted then. Also I've read more and experimented more, and I think I grasp at least some of what this means now. :)

This adds a new "scheme", which Apple's docs (in an old version -- somehow with Apple the highest-quality docs are always being abandoned for hipper designs with less-informative text) define like so: An Xcode scheme defines a collection of targets to build, a configuration to use when building, and a collection of tests to execute. You can have as many schemes as you want, but only one can be active at a time. [...] When you select an active scheme, you also select a run destination (that is, the architecture of the hardware for which the products are built). https://developer.apple.com/library/archive/featuredarticles/XcodeConcepts/Concept-Schemes.html The scheme is the dropdown by the top left of the main Xcode window, where we usually select "ZulipMobile". (I think the "run destination" is the dropdown just to its right, where we select a specific device or emulator to run on, or else "Generic iOS Device" when building an artifact to upload to the App Store.) The new "ZulipMobile release-mode" is just like "ZulipMobile", except that Cmd-r ("Run") will do a release build and run that on the target device, rather than a debug build. The "schemes" like the existing one and this new one live at a layer of indirection above the "configurations", of which we have two, named "Debug" and "Release". They contain a very small amount of configuration of their own, and mainly just a list of by-name references to configurations. The configurations themselves contain quite a bit more detail. Some helpful discussion: https://rhult.github.io/articles/testing-release-builds/ We're in "the simple case" in that post's terms: both our debug and release configurations already use development certs for signing. (For an actual release, I guess the artifact gets re-signed at upload time?) I originally did this in August, while debugging zulip#3588; didn't commit it, in part because I didn't feel I understood what it was doing. This version is slightly cleaned up and I've added this commit message, but it has the same substance as the config I left uncommitted then. Also I've read more and experimented more, and I think I grasp at least some of what this means now. :)

This adds a new "scheme", which Apple's docs (in an old version -- somehow with Apple the highest-quality docs are always being abandoned for hipper designs with less-informative text) define like so: An Xcode scheme defines a collection of targets to build, a configuration to use when building, and a collection of tests to execute. You can have as many schemes as you want, but only one can be active at a time. [...] When you select an active scheme, you also select a run destination (that is, the architecture of the hardware for which the products are built). https://developer.apple.com/library/archive/featuredarticles/XcodeConcepts/Concept-Schemes.html The scheme is the dropdown by the top left of the main Xcode window, where we usually select "ZulipMobile". (I think the "run destination" is the dropdown just to its right, where we select a specific device or emulator to run on, or else "Generic iOS Device" when building an artifact to upload to the App Store.) The new "ZulipMobile release-mode" is just like "ZulipMobile", except that Cmd-r ("Run") will do a release build and run that on the target device, rather than a debug build. The "schemes" like the existing one and this new one live at a layer of indirection above the "configurations", of which we have two, named "Debug" and "Release". They contain a very small amount of configuration of their own, and mainly just a list of by-name references to configurations. The configurations themselves contain quite a bit more detail. Some helpful discussion: https://rhult.github.io/articles/testing-release-builds/ We're in "the simple case" in that post's terms: both our debug and release configurations already use development certs for signing. (For an actual release, I guess the artifact gets re-signed at upload time?) I originally did this in August, while debugging #3588; didn't commit it, in part because I didn't feel I understood what it was doing. This version is slightly cleaned up and I've added this commit message, but it has the same substance as the config I left uncommitted then. Also I've read more and experimented more, and I think I grasp at least some of what this means now. :)

gnprice added a-iOS a-onboarding Everything you would do when first joining a realm. P0 critical Highest priority labels Aug 13, 2019

gnprice closed this as completed in 5cf7f10 Aug 19, 2019

gnprice self-assigned this Aug 19, 2019

gnprice mentioned this issue Aug 24, 2019

Google OAuth flow is broken #3594

Closed

gnprice added a commit that referenced this issue Aug 27, 2019

changelog: Adjust baseline for recent user notes.

a26a168

For iOS the latest prod release was 26.2.125, not 26.1.124. So the login issue #3588 is already fixed, and there's no need to repeat the note on it.

gnprice mentioned this issue Jun 24, 2020

[android] Handle receiving shares from other apps #4124

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On iOS Zulip will log out of realm/organization after force closing app and re opening. #3588

On iOS Zulip will log out of realm/organization after force closing app and re opening. #3588

dalanmiller commented Aug 13, 2019

ferman commented Aug 13, 2019

gnprice commented Aug 13, 2019

dalanmiller commented Aug 13, 2019 via email •

edited

Loading

gnprice commented Aug 13, 2019

gnprice commented Aug 14, 2019

dalanmiller commented Aug 14, 2019 via email •

edited

Loading

gnprice commented Aug 14, 2019

gnprice commented Aug 14, 2019

ferman commented Aug 14, 2019

gnprice commented Aug 16, 2019

gnprice commented Aug 16, 2019

gnprice commented Aug 16, 2019

gnprice commented Aug 16, 2019

gnprice commented Aug 16, 2019

gnprice commented Aug 19, 2019

On iOS Zulip will log out of realm/organization after force closing app and re opening. #3588

On iOS Zulip will log out of realm/organization after force closing app and re opening. #3588

Comments

dalanmiller commented Aug 13, 2019

ferman commented Aug 13, 2019

gnprice commented Aug 13, 2019

dalanmiller commented Aug 13, 2019 via email • edited Loading

gnprice commented Aug 13, 2019

gnprice commented Aug 14, 2019

dalanmiller commented Aug 14, 2019 via email • edited Loading

gnprice commented Aug 14, 2019

gnprice commented Aug 14, 2019

ferman commented Aug 14, 2019

gnprice commented Aug 16, 2019

gnprice commented Aug 16, 2019

gnprice commented Aug 16, 2019

gnprice commented Aug 16, 2019

gnprice commented Aug 16, 2019

gnprice commented Aug 19, 2019

dalanmiller commented Aug 13, 2019 via email •

edited

Loading

dalanmiller commented Aug 14, 2019 via email •

edited

Loading