Closed Bug 1188216 Opened 7 years ago Closed 7 years ago

[email/IMAP] Bisection logic should maintain open-ended endTS values.

Categories

(Firefox OS Graveyard :: Gaia::E-Mail, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(b2g-v2.1 unaffected, b2g-v2.2 wontfix, b2g-master fixed)

RESOLVED FIXED
Tracking Status
b2g-v2.1 --- unaffected
b2g-v2.2 --- wontfix
b2g-master --- fixed

People

(Reporter: ajones, Assigned: asuth)

Details

Attachments

(2 files)

Often when I look at the email app on my Firefox OS phone it is missing recent emails that have arrived in the last few hours. For example I press sync and the most recent email it shows is 11 hours ago - an email that arrived last night. On desktop Thunderbird I see over 30 more recent emails that are missing from the phone.
Can you clarify:

1) What email domain and/or account type this is?  (See https://wiki.mozilla.org/Gaia/Email/RequiredBugInfo for info on determining the account type we're using, although in general I can infer the likely account type given a domain.)

2) When you're pressing sync and saying there are missing messages, is it like this sequence:
a) hit refresh
b) wait for refresh to complete (the spinner stops spinning)
c) scroll to the top and notice there are still missing messages

or is it more like a,c?  Because of bug 1182358 the email app may fail to sync in the background and may only do things when you trigger manual sync.


And of course if you can provide a logcat a la https://wiki.mozilla.org/Gaia/Email/RequiredBugInfo, that will prove invaluable if it's the a,b,c situation, because that implies there's an error that's happening.  If you do provide a logcat, feel free to email it to asuth@mozilla.com if you don't want to attach it to the bug.  (Please do double-check the contents if attaching it, the logging in the v2.2+ series may output some info that we didn't use to output.)
Flags: needinfo?(ajones)
(In reply to Andrew Sutherland [:asuth] from comment #1)
> 1) What email domain and/or account type this is?  (See
> https://wiki.mozilla.org/Gaia/Email/RequiredBugInfo for info on determining
> the account type we're using, although in general I can infer the likely
> account type given a domain.)

Mozilla email address using IMAP+SMTP.

> 2) When you're pressing sync and saying there are missing messages, is it
> like this sequence:
> a) hit refresh
> b) wait for refresh to complete (the spinner stops spinning)
> c) scroll to the top and notice there are still missing messages

Yes. I press refresh. It spins but shows no new messages. Restarting the phone doesn't seem to help either. I'm wondering if it is related to my time zone GMT+12. This problem has been around a long time.
 
> or is it more like a,c?  Because of bug 1182358 the email app may fail to
> sync in the background and may only do things when you trigger manual sync.

I have not discovered any way to get the messages. They eventually show up a few hours later.

> And of course if you can provide a logcat a la
> https://wiki.mozilla.org/Gaia/Email/RequiredBugInfo, that will prove
> invaluable if it's the a,b,c situation, because that implies there's an
> error that's happening.  If you do provide a logcat, feel free to email it
> to asuth@mozilla.com if you don't want to attach it to the bug.  (Please do
> double-check the contents if attaching it, the logging in the v2.2+ series
> may output some info that we didn't use to output.)

I will get a logcat next time the issue occurs. I don't have a USB cable handy at the moment.
Flags: needinfo?(ajones)
It could be timezone related, especially if you're an inbox zero practitioner and you had zero messages in your inbox when you created the account.  A logcat would definitely shed more light.

(NB: The good news is the conversations refactor that should land in v2.5 overhauls the sync strategy to no longer depend be timezone aware at all and lots of other good enhancements.  But we do care about whether this is buggy for v2.2 users.)
(Clarification: In theory v2.2 is not timezone aware either, and v2.1 was the last version that compensated for timezones, but the queries we issue to the server do involve a time component, so it is a possibility.)
This bug has been around since forever. However I usually have the problem that I'm using my phone because I don't have my PC.
:k17e provided a log over email (thank you very much!) and it looks like the problem is that the bisection logic is firing and it's losing the open-ended part of the range which leads to the timezone becoming very relevant.

:k17e, are you on the foxfood branch or on nightly (or something that otherwise updates more frequently)?  I ask because if you're on foxfood I probably want to provide you with a zip for testing that you can sideload since the foxfood builds don't get updated too promptly.

== Log Analysis

We see the initial sync:
  syncDateRange: 1438560000000 null
Then we see bisection trigger:
  [syncDateRange] Should bisect? ***YES, BISECT!*** curDaysDelta = 2 serverUIDs.length = 84
Which results in the "old" chunk here:
  syncDateRange: 1438560000000 1438646400000
  [syncDateRange] Should bisect? no. curDaysDelta = 1 serverUIDs.length = 15
Followed by the new chunk:
  syncDateRange: 1438646400000 1438732800000
  [syncDateRange] Should bisect? no. curDaysDelta = 1 serverUIDs.length = 49

These date-stamps amount to:
$ (new Date(1438560000000)).toUTCString()
'Mon, 03 Aug 2015 00:00:00 GMT'
$ (new Date(1438646400000)).toUTCString()
'Tue, 04 Aug 2015 00:00:00 GMT'
$ (new Date(1438732800000)).toUTCString()
'Wed, 05 Aug 2015 00:00:00 GMT'

(Noting that the queries we issue to the server are timezone-less and the IMAP spec and implementations have a serious snafu about what this means as it relates to the query.)

== Problem Analysis

There's a few emergent things going on, I believe:
- The bisection logic starts from the oldest date and works its way forward using a day-step.  This is what results in us picking and using specific date.
- We stopped applying timezone compensation to our search query times as part of bug 886534 for complexity/brain-hurt reasons, not realizing that the bisection logic was losing the open-ended null constraint.
- The bisection logic steps until it reaches it reaches _curSyncDoNotGrowBoundary using TIME_DIR_AT_OR_BEYOND.  The good news is that we properly save the null magic value into _curSyncDoNotGrowBoundary.  The bad news is that we effectively experience a units problem here because TIME_DIR_AT_OR_BEYOND is aware of the magic null value and uses NOW() for comparison when it sees null.  (The units problem is that the UTC-quantized time, although technically in our future, has units of "however the heck the server interprets SINCE/BEFORE timestamps which leads to an ambiguity of basically up to a day".)  Prior to bug 886534, if our timezone probing logic worked and got things right, we would have the same units.  (Of course, it was also possible for us to make the problem worse.)

== Fix

The band-aid hack fix is to bump the bisection limit way up.  But that's not a good fix.

The real fix is to make ImapFolderSyncer.onSyncCompleted detect when it's reaching the end of the bisected range and restore use the "null" magic value as part of the query.  The enhancements in bug 886534 let us generally use dates as fuzzy sync buckets.  The bisection logic is in the wrong by trying to use the fuzzy date as a strict date.  Luckily this only matters at the now-horizon we have, and that's why we have the "null" magic value in the first place.

== Tests

Since this whole sync strategy is going away and the timezone issues of this are non-trivial to accurately model, my plan is to just make sure that the unit test coverage verifies that the bisection results in an open-ended constraint as appropriate, but not bring GMT+12 into the picture.  (We have gained support for specific timezone emulation along the way; the issue for me is fully characterizing what is going wrong here.)
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Flags: needinfo?(ajones)
Summary: Email sync is misses most recent emails → [email/IMAP] Bisection logic should maintain open-ended endTS values.
Attachment #8643392 - Flags: review?(m)
Attachment #8643392 - Flags: review?(m) → review+
landed on master:
https://github.com/mozilla-b2g/gaia/pull/31264
https://github.com/mozilla-b2g/gaia/commit/1de6a5e747679d8d158ff3dbe54a4bfb93215199

https://github.com/mozilla-b2g/gaia-email-libs-and-more/pull/394
https://github.com/mozilla-b2g/gaia-email-libs-and-more/commit/ab62cfbc9a456f124aae2c94cfdcf65d3d6b7c22

I also pushed the comment-fix https://github.com/mozilla-b2g/gaia-email-libs-and-more/commit/10f40a9635f890a852026a7fb3a2dda614f8dcd7 because I noticed during the gaia patch generation that we'd accidentally s/forward/foward/ in a comment.

I'd like to get this uplifted to v2.2, but we need time to bake and ideally to see that this fixes :k17e's problem first.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
(In reply to Andrew Sutherland [:asuth] from comment #9)
> I'd like to get this uplifted to v2.2, but we need time to bake and ideally
> to see that this fixes :k17e's problem first.

I'm using dogfood-latest so I should get an update soon. I'll let you know how it goes.
Works for me. Thanks.
Flags: needinfo?(ajones)
Comment on attachment 8643927 [details] [review]
[gaia] asutherland:email-bisect-endts > mozilla-b2g:master

[Approval Request Comment]
[Bug caused by]:
Bug 886534 (probably).

[User impact]:
IMAP users who receive more than 60 messages per 2-day interval in a folder may not see their new messages until the day after they arrive.  This is highly dependent on the interaction of the user's timezone and the server's interpretation of IMAP SINCE/BEFORE operators (possibly impacted by its understanding of the user's timezone, or its own timezone).  In at least this case the very popular gmail.com service used by a user in New Zealand (UTC+12) was experiencing the problem.  It would not be particularly surprising for this problem to impact other users in high UTC+ timezones.  (Unfortunately the analysis is non-trivial compared to the bug.)

[Testing completed]:
This has baked on the trunk for ~11 days, has an automated unit tests scoped to the fix, and has been verified by the impacted user in comment 10.

[Risk to taking this patch]:
Extremely low risk.  This is a very targeted fix designed to only alter the exact scenario with the bug would trigger.

[String changes made]:
No string changes.
Attachment #8643927 - Flags: review+
Attachment #8643927 - Flags: approval-gaia-v2.2?
Comment on attachment 8643927 [details] [review]
[gaia] asutherland:email-bisect-endts > mozilla-b2g:master

Not maintaining FxOS 2.2 anymore
Attachment #8643927 - Flags: approval-gaia-v2.2? → approval-gaia-v2.2-
You need to log in before you can comment on or make changes to this bug.