Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On iOS Zulip will log out of realm/organization after force closing app and re opening. #3588

Closed
dalanmiller opened this issue Aug 13, 2019 · 15 comments
Assignees
Labels
a-iOS a-onboarding Everything you would do when first joining a realm. P0 critical Highest priority

Comments

@dalanmiller
Copy link

Currently using a self-hosted instance with both Android and iOS. Android runs fine at the moment but on iOS the app will require logging into the organization every time the app is closed and started or opened after a long period of time.

How can I try to debug this issue or provide more information on what is going on?

@ferman
Copy link

ferman commented Aug 13, 2019

I am experiencing the same issues. I am on a iPhone 6s on iOS 12.4.

I was just running Zulip on an android phone and it stays logged in perfectly fine.

@gnprice gnprice added a-iOS a-onboarding Everything you would do when first joining a realm. P0 critical Highest priority labels Aug 13, 2019
@gnprice
Copy link
Member

gnprice commented Aug 13, 2019

Thanks @dalanmiller and @ferman for these reports! One followup question for both of you:

  • Have you been seeing this on devices where the app has been installed for a while, and previously was staying logged in just fine?
  • Or have you seen it only on fresh installs that have never successfully stayed logged in?

We've also heard about this in a chat thread today, and we heard a report of it Friday in a support thread.

I reproduce this myself when I make a fresh install of v26.1.124, the latest, on an iPhone 7 with iOS 12.3. The report on Friday was on v26.0.123, so it also affects that version.

For me, the app was working just fine (staying logged in) before that fresh install. If it does only affect fresh installs (or, newly logged-into accounts on existing installs), then that's a clue because it suggests we're reading the stored data fine, and the bug specifically affects writing changes to it.

This is a pretty showstopper symptom, so we'll certainly be making a priority of getting to the bottom of it.

@dalanmiller
Copy link
Author

dalanmiller commented Aug 13, 2019 via email

@gnprice
Copy link
Member

gnprice commented Aug 13, 2019

Update from some attempts to debug:

  • The issue continues to reproduce 100% of the time with v26.1.124 (from TestFlight, but should be the same binary as in the App Store) on my physical iPhone 7.

  • In a debug build on a simulated iPhone X, the issue did reproduce a few times... but then it didn't always, and now I'm finding it hard to reproduce there at all.

    That's slowed down the investigation -- it's much easier to make progress when looking at a debug build with debug logs, adding a few lines of ad-hoc logging, rerunning with that, etc. Also the uncertainty of what might be confounding variables causes me to be paranoid about resetting the state extra-clean between trials, making the iteration cycle longer.

    Probably the next thing I'll try is a debug build on that physical iPhone 7, and see if that does any better.

Things I did learn from when it was still reproducing in debug builds:

  • The call to write to the storage (AsyncStorage.setItem, a React Native facility) is returning success.

  • But on re-opening the app, we're not finding that data in the storage.

  • I also tried a getItem after the setItem returned success, and it consistently found that the data was there. That suggests that we're successfully getting the data into the native data structures of AsyncStorage, but AsyncStorage in turn is failing somehow to actually flush those writes to persistent storage.

    I'm not 100% sure that those last observations about getItem didn't come from a run where the issue didn't repro, though. (It was shortly after that that it mostly stopped reproing in my debug builds.) If I'd taken better notes I'd know, I guess.

@gnprice
Copy link
Member

gnprice commented Aug 14, 2019

From a different angle:

  • 26.0.123 went out to all users on 2019-08-06. We got the first report a bit under 72h later, and a series of further reports today, starting last night (California time) with this issue.
  • 26.1.124 then went out this morning, about 7 hours after this issue was filed. @dalanmiller you don't happen to have signed up to use the beta version from TestFlight, do you? If not, then your original report makes a second case of seeing the issue on 26.0.123.
  • The previous release to reach all users (i.e., more than our beta users) was 25.6.120. That went out a month ago, on 2019-07-12, and given the severity of this issue it seems likely we'd have heard about it sooner if this issue were present there.

So the bug was very likely introduced between 25.6.120 and 26.0.123. I'm looking at the code that changed in that interval.

  • One change was we upgraded RN from v0.57.8 to v0.59.10. That sure sounds like a plausible candidate.

    And in fact there were a few changes to AsyncStorage in that range!

    Looking closer, the one that seems like it could plausibly affect whether data gets successfully stored is this one: facebook/react-native@27b4d21 ... And looking closely at that, it seems quite unlikely to be the cause of this issue -- in particular it appears to only affect the codepath for deleting a key from the storage entirely.

    I don't feel I understand the entire subsystem at this point and so I wouldn't 100% rule that commit out as the cause; but I think not.

  • We made some changes in our own code a layer or two above AsyncStorage that manages persisting the app's relevant data and migrating it through changes in our data structures between versions. Notably there were several changes in our src/boot/store.js.

    A bug in that code could certainly cause this kind of symptom... though I don't have a good theory for why the bug would only affect iOS, as that's platform-independent JS code.

    I'll likely dig into that lead some more.

@dalanmiller
Copy link
Author

dalanmiller commented Aug 14, 2019 via email

@gnprice
Copy link
Member

gnprice commented Aug 14, 2019

Probably the next thing I'll try is a debug build on that physical iPhone 7, and see if that does any better.

Notes from first efforts on that:

I've tried three configurations:

  • a freshly-installed debug build, unmodified from master, in its default state
  • a freshly-installed debug build, unmodified from master, but with RN's "remote JS debugging" enabled
  • a freshly-installed debug build, with RN's "remote JS debugging" enabled, and with a console.log call added which logs a bit of relevant info to the console provided by that remote JS debugging

"Remote JS debugging" means the Chrome DevTools, in my desktop Chrome browser, gets attached; and IIUC the JS all actually runs inside Chrome instead of in the JS implementation (JSC) that RN builds into the app.

So far the pattern is that the issue reproduces 100% of the time in the first case (no "remote JS debugging"), and fails to repro -- i.e. the app works normally -- 100% of the time in the second and third.

I was pretty sure I'd seen the issue repro yesterday in the simulated iPhone X with the devtools attached, and even with some ad-hoc logging code added. But while I tried taking out the logging code and was then concerned to see the issue continue to not reproduce, I don't think I tried going back to not even attaching the debugger.

If I can't reproduce with the debugger attached that will certainly add an obstacle in debugging it; I'll have to find another way to get info out. OTOH it may also be a clue.

@gnprice
Copy link
Member

gnprice commented Aug 14, 2019

Completing a small matrix, I just tried:

  • a freshly-installed debug build, with that same console.log call added, but in the default state without the devtools attached

This isn't a super helpful setup in itself, because there isn't an immediate way to get those logs. But the issue indeed reproduced, fitting the pattern that it's all about whether the devtools are attached.

@ferman
Copy link

ferman commented Aug 14, 2019

My install that I reported is on a fresh install of iOS and of the app. I am on v26.0.123.

@gnprice
Copy link
Member

gnprice commented Aug 16, 2019

New information from some debugging just now:

  • The issue reproduces at commit dfbdd97, where we went from RN v0.57.8 to RN v0.59.10.
  • It doesn't reproduce at that commit's parent, namely e8ea477.
  • More generally, it reproduces at every commit that I've specifically tested that's after that point (3c25ad1 aka 26.0.123, and 6219e5a which is the latest on master); and it doesn't reproduce at any commit I've specifically tested that's before that point (cab7346 aka 25.8.122).

So the issue was introduced by the upgrade to RN v0.59.

A key ingredient here was what I learned yesterday, that so long as I don't enable remote JS debugging, the issue reproduces 100% of the time.

  • We made some changes in our own code a layer or two above AsyncStorage that manages persisting the app's relevant data [...].
    A bug in that code could certainly cause this kind of symptom... though I don't have a good theory for why the bug would only affect iOS, as that's platform-independent JS code.

In particular this finding appears to rule out those changes of ours as the cause. They happened before the RN upgrade, and before 25.8.122, at which point the issue is absent.

As for what changes within RN might have caused this: in addition to the unlikely-looking small change in the AsyncStorage implementation that I discussed above, one thing that occurs to me is that RN upstream made significant changes to their JS runtime environment. It's possible that a bug in that -- or a bug in our (or a dependency's) JS code, that the new environment triggers and the old one didn't -- might somehow cause this failure. That would fit particularly nicely with the puzzle that the issue is suppressed by remote JS debugging -- because that debugging involves running the JS in an entirely different environment, namely my desktop Chrome browser.

A likely next thing I'll pursue is to bisect the changes in RN itself, in hopes of pinpointing the RN commit that introduces the issue. Possibly tricky to do, because other dependencies interact with the RN version. As a variation, I may attempt simply reverting some plausibly-relevant RN changes, or alternatively backporting them directly onto v0.57.8, to see if they affect the issue.

@gnprice
Copy link
Member

gnprice commented Aug 16, 2019

Next finding: the issue

  • reproduces at RN commit v0.58.6~314 aka 8427f64e0;
  • doesn't reproduce at RN commit v0.58.6~328 aka c95fdb050.

In both cases I'm starting from our dfbdd97, the RN upgrade; I've edited our package.json to refer to a version like facebook/react-native#c95fdb050, and run yarn; and then rebuild in Xcode and attempt the usual repro.

(There's a wrinkle where our AwayStatusSwitch component doesn't work for some reason on some of these versions; that component doesn't really interact with any of this, so I commented out where we use that component, and carried on.)

So, something in that range of 14 commits introduced the issue! Or the trigger for some issue in another layer.

Those 14 commits are exactly the changes I mentioned here:

one thing that occurs to me is that RN upstream made significant changes to their JS runtime environment. It's possible that a bug in that [...] might somehow cause this failure.

so that guess turned out to be a good one.

A bit more about these changes:

$ git log --oneline --reverse v0.58.6~328..v0.58.6~314
e337bcafb Add JSI API and JSIDynamic glue to OSS React Native
a23fb1a22 Add JSI implementation for JSC
749b18dbc Add JSI-based JSExecutor for the bridge
53fa8130f Switch RN from JSCJavaScriptExecutorFactory to JSCExecutorFactory
6796e62bb Remove JSC references
c49d3653e iOS changes to switch to JSI
00381920c Remove jschelpers and privatedata
7ffb40651 Remove RN files which use JSC
e8cbc4f89 iOS files removed for switch to JSI
6370b86c1 Get RNTester to compile and run
f85692cf8 Get RN Podspec xcodeproject to compile
78fcf7c55 Get RNTester xcodeproj to compile
3341adac4 Ensure HelloWorld xcodeproj template works
8427f64e0 Remove unused code using JSC

and

$ git diff --shortstat --dirstat v0.58.6~328..v0.58.6~314
 108 files changed, 5269 insertions(+), 6157 deletions(-)
  18.0% React/React.xcodeproj/
   4.2% React/
   5.2% ReactAndroid/src/main/jni/react/jni/
   3.5% ReactCommon/cxxreact/tests/
  14.3% ReactCommon/cxxreact/
  16.6% ReactCommon/jschelpers/
  27.2% ReactCommon/jsi/
   5.5% ReactCommon/jsiexecutor/jsireact/

Working on bisecting further. That's harder than it ought to be, because unfortunately these changes seem to have been made rather messily 😞 -- several of the commits cause RN itself to fail to build at all, failures that are fixed some time later in other commits.

@gnprice
Copy link
Member

gnprice commented Aug 16, 2019

Further notes, then I'll pick this up tomorrow.

Here's those 14 commits, along with my results on them so far:

* GOOD e337bcafb Add JSI API and JSIDynamic glue to OSS React Native
*      a23fb1a22 Add JSI implementation for JSC
*      749b18dbc Add JSI-based JSExecutor for the bridge
* GOOD 53fa8130f Switch RN from JSCJavaScriptExecutorFactory to JSCExecutorFactory
*      6796e62bb Remove JSC references
* skip c49d3653e iOS changes to switch to JSI
* skip 00381920c Remove jschelpers and privatedata
* skip 7ffb40651 Remove RN files which use JSC
*      e8cbc4f89 iOS files removed for switch to JSI
* skip 6370b86c1 Get RNTester to compile and run
*      f85692cf8 Get RN Podspec xcodeproject to compile
*      78fcf7c55 Get RNTester xcodeproj to compile
*      3341adac4 Ensure HelloWorld xcodeproj template works
* BAD  8427f64e0 Remove unused code using JSC

Here

  • GOOD -> things work; issue doesn't repro!
  • BAD -> issue repros
  • skip -> RN doesn't even build 😞 so I've been unable to tell whether the issue is present

In particular the first 4 commits of the series are in the clear. That leaves this diff as where the issue (or, again, the change that triggers an issue elsewhere) may lie:

$ git diff --shortstat --dirstat v0.58.6~324..v0.58.6~314
 86 files changed, 966 insertions(+), 6145 deletions(-)
  27.6% React/React.xcodeproj/
   6.5% React/
   7.7% ReactAndroid/src/main/jni/react/jni/
   5.4% ReactCommon/cxxreact/tests/
  22.0% ReactCommon/cxxreact/
  25.5% ReactCommon/jschelpers/

@gnprice
Copy link
Member

gnprice commented Aug 16, 2019

Because it's not clear how quickly we'll have a diagnosis or a forward-fix here, I'm now preparing a release 26.2.125, for iOS only, which will roll back to 25.8.122 -- it'll be identical except for the version number.

That's the last release we sent to beta before 26.0.123; the only change between them was the RN upgrade and some related commits, and in particular I determined above that this issue doesn't appear there.

I'll continue working to debug this to find a more targeted fix. We want to be on the newer RN version for a number of reasons, not least of them that RN v0.57 cannot make an app that meets Google Play's requirements (as announced in 2017 and in effect from this month; see #3323 ). But once the stopgap release 26.2.125 works its way through Apple's process, the issue should be resolved for everyone who's been seeing it.

@gnprice
Copy link
Member

gnprice commented Aug 16, 2019

I'm now preparing a release 26.2.125, for iOS only, which will roll back to 25.8.122 -- it'll be identical except for the version number.

Done, and submitted to Apple. If all goes well within their process, then most likely this stopgap version will be available in the App Store by Monday.

@gnprice
Copy link
Member

gnprice commented Aug 19, 2019

That stopgap fix release, 26.2.125, went out to the App Store on Saturday morning.

Meanwhile, I've located the cause of the bug! Fix incoming, with more explanation; then I'll make a new release, so that iOS users can again have all the unrelated improvements we've made since the RN upgrade.

@gnprice gnprice self-assigned this Aug 19, 2019
gnprice added a commit that referenced this issue Aug 27, 2019
For iOS the latest prod release was 26.2.125, not 26.1.124.
So the login issue #3588 is already fixed, and there's no
need to repeat the note on it.
gnprice added a commit to gnprice/zulip-mobile that referenced this issue Oct 11, 2019
This adds a new "scheme", which Apple's docs (in an old version --
somehow with Apple the highest-quality docs are always being abandoned
for hipper designs with less-informative text) define like so:

  An Xcode scheme defines a collection of targets to build, a
  configuration to use when building, and a collection of tests to
  execute.

  You can have as many schemes as you want, but only one can be active
  at a time. [...]  When you select an active scheme, you also select
  a run destination (that is, the architecture of the hardware for
  which the products are built).

  https://developer.apple.com/library/archive/featuredarticles/XcodeConcepts/Concept-Schemes.html

The scheme is the dropdown by the top left of the main Xcode window,
where we usually select "ZulipMobile".  (I think the "run destination"
is the dropdown just to its right, where we select a specific device
or emulator to run on, or else "Generic iOS Device" when building an
artifact to upload to the App Store.)

The new "ZulipMobile release-mode" is just like "ZulipMobile", except
that Cmd-r ("Run") will do a release build and run that on the target
device, rather than a debug build.

The "schemes" like the existing one and this new one live at a layer
of indirection above the "configurations", of which we have two, named
"Debug" and "Release".  They contain a very small amount of
configuration of their own, and mainly just a list of by-name
references to configurations.  The configurations themselves contain
quite a bit more detail.

Some helpful discussion:
  http://rhult.github.io/articles/testing-release-builds/

We're in "the simple case" in that post's terms: both our debug and
release configurations already use development certs for signing.
(For an actual release, I guess the artifact gets re-signed at upload
time?)

I originally did this in August, while debugging zulip#3588; didn't commit
it, in part because I didn't feel I understood what it was doing.
This version is slightly cleaned up and I've added this commit
message, but it has the same substance as the config I left
uncommitted then.  Also I've read more and experimented more,
and I think I grasp at least some of what this means now. :)
rk-for-zulip pushed a commit to gnprice/zulip-mobile that referenced this issue Oct 14, 2019
This adds a new "scheme", which Apple's docs (in an old version --
somehow with Apple the highest-quality docs are always being abandoned
for hipper designs with less-informative text) define like so:

  An Xcode scheme defines a collection of targets to build, a
  configuration to use when building, and a collection of tests to
  execute.

  You can have as many schemes as you want, but only one can be active
  at a time. [...]  When you select an active scheme, you also select
  a run destination (that is, the architecture of the hardware for
  which the products are built).

  https://developer.apple.com/library/archive/featuredarticles/XcodeConcepts/Concept-Schemes.html

The scheme is the dropdown by the top left of the main Xcode window,
where we usually select "ZulipMobile".  (I think the "run destination"
is the dropdown just to its right, where we select a specific device
or emulator to run on, or else "Generic iOS Device" when building an
artifact to upload to the App Store.)

The new "ZulipMobile release-mode" is just like "ZulipMobile", except
that Cmd-r ("Run") will do a release build and run that on the target
device, rather than a debug build.

The "schemes" like the existing one and this new one live at a layer
of indirection above the "configurations", of which we have two, named
"Debug" and "Release".  They contain a very small amount of
configuration of their own, and mainly just a list of by-name
references to configurations.  The configurations themselves contain
quite a bit more detail.

Some helpful discussion:
  https://rhult.github.io/articles/testing-release-builds/

We're in "the simple case" in that post's terms: both our debug and
release configurations already use development certs for signing.
(For an actual release, I guess the artifact gets re-signed at upload
time?)

I originally did this in August, while debugging zulip#3588; didn't commit
it, in part because I didn't feel I understood what it was doing.
This version is slightly cleaned up and I've added this commit
message, but it has the same substance as the config I left
uncommitted then.  Also I've read more and experimented more,
and I think I grasp at least some of what this means now. :)
rk-for-zulip pushed a commit that referenced this issue Oct 14, 2019
This adds a new "scheme", which Apple's docs (in an old version --
somehow with Apple the highest-quality docs are always being abandoned
for hipper designs with less-informative text) define like so:

  An Xcode scheme defines a collection of targets to build, a
  configuration to use when building, and a collection of tests to
  execute.

  You can have as many schemes as you want, but only one can be active
  at a time. [...]  When you select an active scheme, you also select
  a run destination (that is, the architecture of the hardware for
  which the products are built).

  https://developer.apple.com/library/archive/featuredarticles/XcodeConcepts/Concept-Schemes.html

The scheme is the dropdown by the top left of the main Xcode window,
where we usually select "ZulipMobile".  (I think the "run destination"
is the dropdown just to its right, where we select a specific device
or emulator to run on, or else "Generic iOS Device" when building an
artifact to upload to the App Store.)

The new "ZulipMobile release-mode" is just like "ZulipMobile", except
that Cmd-r ("Run") will do a release build and run that on the target
device, rather than a debug build.

The "schemes" like the existing one and this new one live at a layer
of indirection above the "configurations", of which we have two, named
"Debug" and "Release".  They contain a very small amount of
configuration of their own, and mainly just a list of by-name
references to configurations.  The configurations themselves contain
quite a bit more detail.

Some helpful discussion:
  https://rhult.github.io/articles/testing-release-builds/

We're in "the simple case" in that post's terms: both our debug and
release configurations already use development certs for signing.
(For an actual release, I guess the artifact gets re-signed at upload
time?)

I originally did this in August, while debugging #3588; didn't commit
it, in part because I didn't feel I understood what it was doing.
This version is slightly cleaned up and I've added this commit
message, but it has the same substance as the config I left
uncommitted then.  Also I've read more and experimented more,
and I think I grasp at least some of what this means now. :)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a-iOS a-onboarding Everything you would do when first joining a realm. P0 critical Highest priority
Projects
None yet
Development

No branches or pull requests

3 participants