Closed Bug 1069412 Opened 10 years ago Closed 10 years ago

Email constantly asks me to reauthenticate

Categories

(Firefox OS Graveyard :: Gaia::E-Mail, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:2.1+, b2g-v2.1 fixed, b2g-v2.2 fixed)

VERIFIED FIXED
2.1 S5 (26sep)
blocking-b2g 2.1+
Tracking Status
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: daleharvey, Assigned: jrburke)

References

Details

Attachments

(3 files)

Since switching to 2.1 aurora, this seems to happen multiple times a day
I wonder if this is related to the clock time getting weird sometimes. We need to refresh our auth tokens, and we have a way to do that if we think it will expire, but if the clock is skewed, we may not think to check for a new one.

I'll try reproducing locally to confirm though.
I got the latest 2.1 on a flame, set up an email account, did one sync. Then closed the email app, disconnected the device (periodic sync was not enabled), then let it sit for about an hour and a half. 

Then I reopened email, and I got the following logcat. Will investigate more, but wanted to persist this capture first:

I/GeckoDump( 3301): LOG: pushCard for type: message_list
I/Gecko   ( 3301): WLOG: Email knows that it is: online and previously was: online
I/Gecko   ( 3301): WLOG: deferring getConfig call until ready
I/Gecko   ( 3301): WLOG: main thread reports DB ready
I/Gecko   ( 3301): WLOG: issuing getConfig call to main thread
I/Gecko   ( 3301): WLOG: cronsync: ensureSync called
I/Gecko   ( 3301): 
I/Gecko   ( 3301): WLOG: Mail universe/bridge created and notified!
I/GeckoDump( 3301): LOG: cronsync-main: ensureSync called
I/GeckoDump( 3301): LOG: cronsync-main: success!
I/GeckoDump( 3301): LOG: cronsync-main: ensureSync completed
I/Gecko   ( 3301): WLOG: cronsync: received an syncEnsured via a message handler
I/Gecko   ( 3301): 
I/Gecko   ( 3301): WLOG: syncDateRange: 1410825600000 null
I/Gecko   ( 3301): WLOG: Skewed DB lookup. Start:  1410850800000 Tue, 16 Sep 2014 07:00:00 GMT End:  null null
I/Gecko   ( 3301): WLOG: [slog] oauth:credentials-ok
I/GeckoDump( 3301): LOG: htmlCache.save: 18876 in 10 segments
I/Gecko   ( 3301): WLOG: [slog] imap:protocol-error {"humanReadable":"Invalid credentials (Failure)"}
I/Gecko   ( 3301): WERR: [slog] imap:normalized-error {"error":{"code":"ALERT"},"errorName":"Error","errorMessage":"Invalid credentials (Failure)","errorStack":"i.prototype.exec/<@app://email.gaiamobile.org/js/ext/composite/configurator.js:1959:161\nr.prototype._processServerResponse@app://email.gaiamobile.org/js/ext/composite/configurator.js:1850:9\nr.prototype._processServerQueue@app://email.gaiamobile.org/js/ext/composite/configurator.js:1840:1\nr.prototype._addToServerQueue@app://email.gaiamobile.org/js/ext/composite/configurator.js:1819:9\nr.prototype._onData@app://email.gaiamobile.org/js/ext/composite/configurator.js:1811:20\nt/r</<@app://email.gaiamobile.org/js/ext/worker-bootstrap.js:8168:1\no.catchSocketExceptions@app://email.gaiamobile.org/js/ext/worker-bootstrap.js:7679:17\nt/r<@app://email.gaiamobile.org/js/ext/worker-bootstrap.js:8167:40\no/a[e]@app://email.gaiamobile.org/js/ext/worker-bootstrap.js:553:18\ne@app://email.gaiamobile.org/js/ext/worker-bootstrap.js:518:14\n","socketLevelError":null,"protocolLevel
I/Gecko   ( 3301): WERR: [slog] imap:connect-error {"error":"needs-oauth-reauth","connInfo":{"hostname":"imap.gmail.com","port":"993","crypto":"ssl"}}
I/Gecko   ( 3301): WLOG: [slog] mailslice:mutex-released {"folderId":"0/0","err":"aborted"}
I/GeckoDump( 3301): LOG: message_list complete: 15 items of 19 alleged known headers. canGrow: false
I/GeckoDump( 3301): LOG: Showing toast: {"text":"Unable to connect to server"}
I/GeckoDump( 3301): LOG: pushCard for type: setup_fix_oauth2
Meant to add: the clock was correct, as it read in the status bar.
Further logging suggests the expiry timeout was set for three more hours in the future when I logged it, but since the tokens are only good for an hour, it looks like maybe the clock when I set up the account was incorrectly. I did need to reset it correctly after a flash, but thought it was set correctly before beginning. But the indications still point to a clock Date.now() wackiness.

I added more logging and restarted the experiment, but we definitely need to do some work here, given this and newer developments around using mozbrowser or window.open for the oauth jump[1], I will use this ticket to track these changes:

1) Try to avoid the vagaries of clock wackiness on the flame. asuth suggested using performance.now() (wrapped in date.js so can be manually manipulated for tests) as it should always be strictly increasing. Update gelam to use a similar mechanism. This should help cushion us a little bit against weird date stuff (although the phone just needs a proper update for that).

2) Do one accessToken renewal try if we get an "Invalid credentials (Failure)" when doing the actual imap connection, just in case the date calculations are wrong. This would be enough to work around the issue reported here, but we should do the other fixes around the time to avoid unnecessary renewal requests.

3) Upgrade the oauth jump to use mozbrowser instead of a plain window.open() to properly clear the cookies as pointed out in the dev-webapi thread[1].

Since this issue was found on 2.1 and we want to make sure 2.1 is robust with the oauth change, asking for 2.1? flag. Even though we will be switching to mozbrowser for the oauth user jump, we should be able to avoid any new l10n strings.

[1] https://groups.google.com/d/topic/mozilla.dev.webapi/oqPPmSrt1Ws/discussion
Assignee: nobody → jrburke
Status: NEW → ASSIGNED
blocking-b2g: --- → 2.1?
Target Milestone: --- → 2.1 S5 (26sep)
triage: blocking 2.1 for major issue with new feature.
blocking-b2g: 2.1? → 2.1+
Attached file GELAM pull request
Starting review for the GELAM part of the change. The Gaia change would include the mozbrowser part of the change. I am asking for GELAM review now though since I am less confident about where I put things, in particular around the transient retry timestamp, and want to start getting that sorted out sooner rather than later.

The good news is that I do not expect to add more to the GELAM pull request as far a capabilities for this ticket. The only other capability is the mozbrowser change, but that is contained within Gaia. In other words, we could land this GELAM change on its own and it still fixes part of the problems.
Attachment #8494243 - Flags: review?(bugmail)
Attached file Gaia pull request
Gaia pull request. Includes GELAM changes. The gaia change is the use of mozbrowser. See pull request for description of changes.
Attachment #8494875 - Flags: review?(bugmail)
Comment on attachment 8494243 [details] [review]
GELAM pull request

r=asuth with criss-cross-reviewed patch I've pushed to your (rebased by me) branch that adds IMAP test coverage via a new fake (but still somewhat limited) oauth server from https://github.com/mozilla-b2g/mail-fakeservers/pull/22.  The only meaningful content change was to flip a conditional that was backwards (caught by the tests actually).

Note that I am about to finalize my gaia review with some on-device testing.  I doubt any further changes will be required, but if there are, I'll push another commit and comment in that review sign-off.
Attachment #8494243 - Flags: review?(bugmail) → review+
Comment on attachment 8494875 [details] [review]
Gaia pull request

r=asuth (with updated gelam changes), with one thing that should ideally be fixed:

- I noticed that the <section> containing the <iframe> was ending up with a scrollHeight of 493 while its clientHeight was only 489.  This can result in some wacky weirdness where if the HTML page needs to be scrolled (the auth page scrolls a little, although if you click on the "about google" link in the lower left you'll end up on a much longer page) which results in double-scrollbar weirdness.  Scroll the page down, then up; you'll see the iframe's tinier scrollbar then get overlaid by the section's iframe.

  I experimentally addressed this by making the section "display: flex; flex-direction: column;" and then the iframe: "flex 1 1;" after killing the child text node didn't fix things.  This fixed the scrollbar problem.  Although then I did notice that especially on the "about google" longer page, the tiling performance while scrolling was horrible.  I don't think it matters a ton, but it might be worth explicitly overriding the "will-change: scroll-position" that "scrollregion-below-header" is contributing.  Or maybe it won't work.

  I started to play with this a bit more but then the devtools freaked out, possibly because I side-loaded after "reset-gaia" betrayed me in various frustrating ways.  Which is to say, I leave this in your capable hands because the sub-optimal changed permissions workflow is interacting badly with the sub-optimal "reset-gaia" workflow sufficiently right now that my sanity demands I hand it off :)


Separate thought: I did some extensive validation that clearBrowserData works while trying to help :gaye out with calendar's weird oauth problem it's having, so I feel pretty confident about this change.  My current impression of calendar's unknown problem are that whatever is happening, it's unlikely to affect email.
Attachment #8494875 - Flags: review?(bugmail) → review+
Thanks for catching the gelam logic issue and the other cleanup and tests for gelam. I have pulled in those latest changes into the gaia app, rebased over latest gaia master, and then did this fix for the scrollbar/scrolling in the iframe:

+.card-setup-oauth2 .scrollregion-below-header {
+  overflow: hidden;
+  will-change: auto;
+}
+
 .sup-oauth2-browser {
   border: 0;
+  margin: 0;
+  padding: 0;
   width: 100%;
-  height: 100%
+  height: 100%;
+  background-color: white;
 }
 

Hiding the overflow in the scrollregion (the section element) removed the double scrollbars. For good measure I tried to also reset will-change on that element as it would not scroll itself.

I tried will-change: scroll-position on the iframe element (.sup-oauth2-browser), but did not seem to cure the checkerboarding. The background-color: white just helps call out that the iframe region is a different region visually for the user, and when there is checkerboarding in that iframe, at least it is white now, which I think is a better perceptual choice. The margin/padding additions were mostly to make be explicit that element does not want to take up extra space.

I think this is fine for this sort of authentication jump. This contained browser context is fairly constrained anyway. For the "about google" testing, I hit a few links in there accidentally and wanted a browser back button, and some loads took a while so I was not sure if the browser was doing work or if I missed the link too. So in summary, this in-app browser will not be great for anything more than just the basic authentication jump.

I pushed the rebased/squashed gaia pull request, and will wait for the try results, doing one last local test confirmation, then hopefully merge.
During testing, specifically, when explicitly revoking the auth token via google settings interface, the setup_fix_oauth2 dialog had two problems:

1) It did not prepopulate the google form with the username.
2) On successful completion, taps in the message_list no longer worked.

Problem 2 is a new one to this patch because we now use a card for the browser window instead of a window.open, and the root cause is similar to the one that was documented in setup_progress:

https://github.com/mozilla-b2g/gaia/blob/2834baf4c7e34fe6ef335f0469f6d0f593c5922b/apps/email/js/cards/setup_progress.js#L55

This patch shows the changes I did. I rebased/squashed and will wait for the fresh try run to finish before merging, and will ping in IRC to confirm this still within the r+.
(In reply to James Burke [:jrburke] from comment #10)
> I tried will-change: scroll-position on the iframe element
> (.sup-oauth2-browser), but did not seem to cure the checkerboarding. The
> background-color: white just helps call out that the iframe region is a
> different region visually for the user, and when there is checkerboarding in
> that iframe, at least it is white now, which I think is a better perceptual
> choice. The margin/padding additions were mostly to make be explicit that
> element does not want to take up extra space.

Whoops; I left out a key point in what I was saying there.  I was thinking that the presence of will-change at all might actually be part of the problem since it might provide conflicting layerization guidance to the graphics subsystem that could be making things slower.

I agree that it's not a huge or even medium-sized concern; there should be effectively no scrolling of the browser content ever, and people absolutely should not be navigating.  It's unfortunate that more things don't trigger the browser itself.

(In reply to James Burke [:jrburke] from comment #11)
> During testing, specifically, when explicitly revoking the auth token via
> google settings interface, the setup_fix_oauth2 dialog had two problems:

Ah, I had cut a review corner here and not tested this flow this time.  Thank you being thorough on this!

> This patch shows the changes I did. I rebased/squashed and will wait for the
> fresh try run to finish before merging, and will ping in IRC to confirm this
> still within the r+.

I affirm the changes look good, continued r=asuth, but agree that my r=asuth continued to hold in general.
(In reply to Andrew Sutherland [:asuth] from comment #12)
> Whoops; I left out a key point in what I was saying there.  I was thinking
> that the presence of will-change at all might actually be part of the
> problem since it might provide conflicting layerization guidance to the
> graphics subsystem that could be making things slower.

I had tried earlier today removing the will-change and just tried again now (by just removing all will-change references in the email css) and tried the google about page. It still has noticeable checkerboarding, on the order of what was there before removing those rules. Just mentioning it for completeness, was hoping for an easy boost.
Had trouble with try builds yesterday, unrelated to this change. Rebased and got a good try result now.

Merged in Gaia master:
https://github.com/mozilla-b2g/gaia/commit/713382a3953ce326943ef5799ba9733a28beadd1

from pull request:
https://github.com/mozilla-b2g/gaia/pull/24407
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment on attachment 8494875 [details] [review]
Gaia pull request

[Approval Request Comment]

[Bug caused by] (feature/regressing bug #):
Bug 1059100, implementing oauth2 for gmail.

[User impact] if declined:
The user will be annoyed by "reauthorize" dialogs in the app if their clock is skewed. Since we did not have a last ditch "see if renew works regardless of our local state", it should mean a general reduction in cases where the user needs to explicitly reauthorize.

This patch also moves to a mozbrowser for the window, so cookie clearing stronger, so a better overall story on that front.

[Testing completed]:
Tested on Flame device. Tried skewed clock scenario, and revoking a token, so that a user-prompted reauthentication was required. That last test checks the case where the last ditch renew request fails, and that it does not fall into a recursive loop.

The backend ext/ changes have new automated tests in the GELAM repo.

[Risk to taking this patch] (and alternatives if risky):
The biggest risk is the move to mozbrowser from a window.open for the user's gmail signin process. This approach is the same one that calendar uses though, so it has precedent, and is an overall better story for cookie data. It was also the suggested approach given on a dev-webapi thread.

The backend ext/ changes are a more robust solution, with a try/catch retry for the renew of the token, and it includes automated tests in the GELAM repo.

[String changes made]:
none
Attachment #8494875 - Flags: approval-gaia-v2.1?
I'm on the fence to approve this one for 2.1- the patch looks fairly risky. I guess there's no other way to workaround the issue?
Note that all the stuff under js/ext/ is back-end code covered by expanded tests in the back-end that includes a new fake OAuth server.

And a big arugment in favor of landing this is that as humans we're going to have a hard time dealing with a different oauth2 flow on 2.1 versus 2.2/trunk.  And the use of mozbrowser is a clear privacy/security win since we clear the cookies and I verified very thoroughly that they're actually getting cleared by the mozbrowser API.
Attachment #8494875 - Flags: approval-gaia-v2.1? → approval-gaia-v2.1+
[Environment]
Gaia-Rev        778ebac47554e1c4b7e9a952d73e850f58123914
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-aurora/rev/aa98eb8873a2
Build-ID        20141005160206
Version         34.0a2
Device-Name     flame
FW-Release      4.4.2
FW-Incremental  eng.cltbld.20140925.192608
FW-Date         Thu Sep 25 19:26:18 EDT 2014
Bootloader      L1TC10011800


[Result]
PASS
Status: RESOLVED → VERIFIED
QA Whiteboard: [COM=Gaia::E-Mail]
OS: Mac OS X → Gonk (Firefox OS)
Hardware: x86 → ARM
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: