886534 - [email/IMAP] contend with INTERNALDATE time-zones for date-range-based resyncing (was: gelam github issue 12)

Reporter

Description

•

12 years ago

Porting from https://github.com/mozilla-b2g/gaia-email-libs-and-more/issues/12 so we can turn off github issues for GELAM: (first comment; I'm going to paste all of my comments as separate comments since it's more clear that my thoughts came at different times) When resynchronizing a given time range, we need to make sure that the dates we use to retrieve data from the database line up with the dates as they will be perceived by the IMAP server (or have enough slop and knowledge of what is slop). The key deal is that BEFORE and SINCE are specced in RFC 3501 as: BEFORE <date> Messages whose internal date (disregarding time and timezone) is earlier than the specified date. SINCE <date> Messages whose internal date (disregarding time and timezone) is within or later than the specified date. The changes from RFC 2060 section, item 71 provides us with the only real detail on this: 71) Clarify that date searches disregard the time and timezone of the INTERNALDATE or Date: header. In other words, "ON 13-APR-2000" means messages with an INTERNALDATE text which starts with "13-APR-2000", even if timezone differential from the local timezone is sufficient to move that INTERNALDATE into the previous or next day. Section 2.3.3 "Internal Date Message Attribute" also provides the definition: The internal date and time of the message on the server. This is not the date and time in the [RFC-2822] header, but rather a date and time which reflects when the message was received. In the case of messages delivered via [SMTP], this SHOULD be the date and time of final delivery of the message as defined by [SMTP]. In the case of messages delivered by the IMAP4rev1 COPY command, this SHOULD be the internal date and time of the source message. In the case of messages delivered by the IMAP4rev1 APPEND command, this SHOULD be the date and time as specified in the APPEND command description. All other cases are implementation defined. The string representation specification is awkward, but the good news is that the server is basically in charge of this string and should return it verbatim. (There is some potential disconnect between the IMAP server and the SMTP server, however. And if a server undergoes a timezone change, that could glitch.) The major potential points for errors are then: The conversion of our date-bounds into string values for the BEFORE/SINCE commands. Our database storage using internal date values that have had their time-zone compensated, but the server actually indexing on the pure string value. If the server normalizes things internally so the same time-zone offset is always used, this can be fine/easily adapted for, but if the time-zone offset can vary/is not normalized, this could result in problems where the 'real'/normalized ordering is different from the SINCE/BEFORE-purposes ordering. I think I am going to need to gather data on this, even if we need to deal with the worst-case scenario.

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 1

•

12 years ago

(ported from https://github.com/mozilla-b2g/gaia-email-libs-and-more/issues/12#issuecomment-9000244) DATA: On yahoo.com, this is clearly affecting search results: Using the search query: NOT DELETED SINCE 25-Sep-2012 BEFORE 27-Sep-2012 A message was returned where the server internaldate was reported over the wire as: 27-Sep-2012 06:32:43 +0000 From the Yahoo webmail UI's "View full header" interface, we have the following interesting header values that appear to be added by Yahoo and not from the upstream mailing list server: X-Apparently-To: sombreroboy@yahoo.com via 72.30.237.156; Wed, 26 Sep 2012 23:32:43 -0700 Received: from 127.0.0.1 (EHLO lists.mozilla.org) (63.245.216.66) by mta1324.mail.bf1.yahoo.com with SMTP; Wed, 26 Sep 2012 23:32:43 -0700 This is consistent with arrival of the message at my dreamhost account using dovecot, where there were a lot more Received headers added because of a bunch of extra smarthost routing. The topmost one looked like: Received: from madmax.dreamhost.com (caiajhbdcagg.dreamhost.com [208.97.132.66]) by homiemail-mx19.g.dreamhost.com (Postfix) with ESMTP id 0C29F25DD8A for <asutherland@asutherland.org>; Wed, 26 Sep 2012 23:32:43 -0700 (PDT)

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 2

•

12 years ago

(ported from https://github.com/mozilla-b2g/gaia-email-libs-and-more/issues/12#issuecomment-9000386) SOLUTIONS / near-solutions / non-solutions: Promising: A: Adjust our database queries assuming a consistent time-zone skew on the server (which may not hold for APPENDed messages or if the server's time-zone ever changes). We will bug like we currently do when we do not line up, but other steps could attempt to correct / amend. We can detect the time-zone skew by checking the first Received header line of a message. B: Deal with bugged messages by attempting to normalize them by replacing them with a version we mess with the headers on. C: Keep the search ranges the same, but load headers from the adjacent days based on maximum time-zone offsets (-12 to +14 per wikipedia) in specific buckets so that if we see the UID of those adjacent messages, we can know we already know about the message and handle it appropriately. (We don't want to ignore it because it won't show up in the search range where we expect it to show up.) Deletion gets a little trickier; we address the changes in straightforward negative evidence by marking the header as having a wacky time offset (signed for the direction we found it in). When we do our sync for the range where it actually is (but is not supposed to be), we detect deletions by scanning the adjacent date buckets for messages with the wacky time offset flag with the correct sign; if the UID was not there, then we delete it. When we sync the range where it should be (but is not), when we consider deleting it, we see that it is marked with the wacky time offset flag and that is at the end of the range where the sign indicates this is reasonable. (If it was in the middle of the range, the sign does not matter and the negative evidence is conclusive.) Have some issues: D: Have all search queries to the server request an extra day's worth of padding on both sides and discard everything outside our actual search range. This has inherent inefficiency since we need extra steps to ask the server details, and it would just be faster to check our local db in most cases. I think 'A' is the simplest/easiest option, but more data would be required to make sure we don't also need 'B' or 'C'. 'B' is scary, but 'C' represents a non-trivial expense.

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 3

•

12 years ago

(ported from https://github.com/mozilla-b2g/gaia-email-libs-and-more/issues/12#issuecomment-9000467) Just realized that the servers will probably obey things like daylight savings time, so there will be (consistent) changes in that case which are potentially predictable based on knowing how time-zones move around. A variant of 'C' that limits the database slop range could be workable to still try and be efficient but avoid needing to be too aware of daylight savings time rules. Another near-solution: Maintain day-maps as index structures in the database/block map. The trick is that we really only are able to learn this when the message falls outside of the search range, since the returned internaldate value may discard TZ information.

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 4

•

12 years ago

Archaeopteryx just provided me with logs for a GMX IMAP account. Our calculated TZ offset was 7200000 aka +0200. This is consistent with a received line indicating +0200. In a specific message that got nuked, the received line had a timestamp of: Thu, 27 Jun 2013 01:20:08 +0200 But the actual INTERNALDATE reported by the server was, noting that the timestamp appears to actually derive from the IMAP import rather than the SMTP server's received line: 26-Jun-2013 23:20:09 +0000 The message was positively returned on a search query of NOT DELETED SINCE 26-Jun-2013 BEFORE 27-Jun-2013 db query for that was: start: 1372204800000, end: 1372291200000, skewedStart: 1372197600000, skewedEnd: 1372284000000 But not returned and inferred deleted on a query of NOT DELETED SINCE 27-Jun-2013 BEFORE 28-Jun-2013 db query for that was: start: 1372291200000, end: 1372377600000, skewedStart: 1372284000000, skewedEnd: 1372370400000 The 26th/27th query returned the following new INTERNALDATE values: 26-Jun-2013 22:44:32 +0000 26-Jun-2013 23:20:09 +0000 26-Jun-2013 23:35:38 +0000 26-Jun-2013 23:54:35 +0000 The 26th/27th query returned existing INTERNALDATE extremes of: 26-Jun-2013 06:18:03 +0000 26-Jun-2013 21:47:21 +0000 The 27th/28th query returned INTERNALDATE extremes of: 27-Jun-2013 00:06:14 +0000 27-Jun-2013 13:21:23 +0000 Hello stanza for imap.gmx.net: OK IMAP server ready H migmx112 96217 Capability line pre-login: CAPABILITY IMAP4rev1 CHILDREN ENABLE ID IDLE LIST-EXTENDED LIST-STATUS LITERAL+ MOVE NAMESPACE SASL-IR SORT THREAD=ORDEREDSUBJECT UIDPLUS UNSELECT WITHIN AUTH=LOGIN AUTH=PLAIN Capability line post-login: CAPABILITY IMAP4rev1 CHILDREN ENABLE ID IDLE LIST-EXTENDED LIST-STATUS LITERAL+ MOVE NAMESPACE SASL-IR SORT THREAD=ORDEREDSUBJECT UIDPLUS UNSELECT WITHIN

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 6

•

12 years ago

The just-duped-to-this-bug bug 900303 is a great example of this happening. The probability of the bug triggering is greatly increased by the large number of messages triggering the bisection logic. I want to work this for the next sprint since it is probably our most fundamental correctness bug.

Assignee: nobody → bugmail

Status: NEW → ASSIGNED

blocking-b2g: --- → koi?

James Lal [:lightsofapollo]

Updated

•

12 years ago

Assignee: bugmail → nobody

Dylan Oliver [:doliver]

Comment 7

•

12 years ago

Triage: This bug has been added to the v1.2 backlog but is not blocking release

blocking-b2g: koi? → -

Dylan Oliver [:doliver]

Comment 8

•

12 years ago

Andrew Sutherland deleted the linked story in Pivotal Tracker

Dylan Oliver [:doliver]

Updated

•

12 years ago

Target Milestone: --- → 1.2 FC (16sep)

Dylan Oliver [:doliver]

Comment 9

•

12 years ago

Tagging this bug to put it in the productivity backlog

tracking-b2g18: --- → 28+

Target Milestone: 1.2 FC (16sep) → ---

Dylan Oliver [:doliver]

Updated

•

12 years ago

blocking-b2g: - → backlog

Candice Serran (:cserran)

Updated

•

12 years ago

Whiteboard: [priority][p=13]

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 10

•

11 years ago

For your information, when checking for mails this morning with 10+ mail falling into the 2 hours hidden timeframe (UTC vs. local timezone offset), I can see the messages getting added to the message list while downloading, but they are being removed after the download finishes.

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 11

•

11 years ago

Yeah, that's the expected unfortunate interaction of this bug with our windowed sync implementation; it's the adjacent sync that thinks it owns those messages because of their timestamps causing the deletion. Can you clarify for the record what mail server software and/or domain/host you are using (if it's not just a server you're running)? It will be handy to know to accurately simulate the underlying problem for regression test purposes when this is fixed.

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 12

•

11 years ago

See comment 4: gmx.net. If you need more info/logs, just let me know.

Dylan Oliver [:doliver]

Comment 13

•

11 years ago

[priority] --> tracking-b2g:+ conversion

tracking-b2g: --- → +

Andrew Sutherland [:asuth] (he/him)

Reporter

Updated

•

11 years ago

Blocks: 958614

Marcus Cavanaugh [:mcav]

Assignee

Updated

•

11 years ago

Assignee: nobody → m

Marcus Cavanaugh [:mcav]

Assignee

Comment 14

•

11 years ago

(In reply to Andrew Sutherland [:asuth] from comment #2) Do you still think Option D (quoted below) is a poor solution? I thought you mentioned something about coming around to that idea in IRC, but my irccloud backlog is sketchy from that day. > D: Have all search queries to the server request an extra day's worth of > padding on both sides and discard everything outside our actual search > range. This has inherent inefficiency since we need extra steps to ask the > server details, and it would just be faster to check our local db in most > cases.

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 15

•

11 years ago

I'd still prefer to avoid option D except as a last-ditch failsafe, especially since we no longer pipeline IMAP requests since the change to email.js. I'm going to wax verbose for a sec on this since I've been thinking about this problem more for conversation support. === Problem restatement === The effective date used by SINCE/BEFORE is potentially inconsistent with the INTERNALDATE returned by the server. The spec for these was bad even before dealing with server bugs. The end result is that a server might implement SEARCH/BEFORE in several ways: 1) Using the INTERNALDATE with the server (or user if there is such a thing) timezone applied. I believe this is what dovecot does. 2) Using the INTERNALDATE in UTC+0. This appears to be the case for comment 4. 3) Truncating the string date representation of the "received" line, thereby ignoring the time and timezone, amounting to something close to the first case but with daylight savings time changes permanently applied so that the delta between UTC and apparent time varies over the duration of the year. (Because the server should only ever be in one timezone itself.) 4) Truncating the string date representation of the message's "date", thereby ignoring the time and timezone. This is the same as 3 if senders are in the same timezone as the server, but becomes a serious issue if they're not. 5) The message's "Date"'s date and time are used but the timezone is ignored. 6) Something somehow crazier than any of the above. Our current implementation is designed assuming case 1 with the potential for some wasteful churn when the timezone changes. The problem can result in messages disappearing or being duplicated. The bug is triggered at sync range boundaries and is more likely to occur with users with more messages. Our implementation could handle the 2nd case as well if our timezone detection logic used a SEARCH query instead of just assuming that the server's timezone is used or had a failsafe that noticed when things were going off the rails. The fundamental issue in our implementation is that the SEARCH/BEFORE ordering is (potentially) a different ordering than the INTERNALDATE ordering. === Tentative Conversations Plan === For conversations we no longer need a total ordering over the messages in a folder by INTERNALDATE or something else; conversations inherently add a layer of indirection. As such we can rely on the ordering of the SEARCH results itself. I have been hypothetically supposing that we would just store the date range we used for SEARCH and the list of UIDs that we got from that SEARCH. (Well, and some meta-data, but that's not important.) So for example, we might store: - Jan 1-8, UIDS: 5, 7, 9 - Jan 9-16, UIDS: 11, 13, 15 Except for daylight savings time changes, we would assume the same answers would be received every time we search on those exact ranges. We would additionally assume that if we did a search of Jan 1-16, we would receive the union of those sets. If we ended up wanting to search on a subset of one of those date ranges, our existing SEARCH results are not enough to answer things. And as noted above, INTERNALDATE may or may not be trustworthy with some additional legwork. But the solution is easy, if we want more granularity, we pick a partition point and run two searches. Ex Jan 1-4, Jan 5-8. Anything not in the union of those results is deleted. We could even just only run one search and conclude that anything not in the one set actually belongs to the other. (I would suggest we don't actually do that since that could lead to its own sadness.) === Translating the conversation "insight" to our existing Implementation === We could maintain a similar data-structure as we'd use for conversations, or we can annotate each message with the search range it was most recently found in, effectively inverting the map-ish thing from before. If we do that annotation, I think we can address this bug by having our database query grow the search on INTERNALDATE in 1 day in both directions. Then we apply a filtering pass over the found-in-search range. This can allow us to completely eliminate timezone compensation from the queries and such. The range would bloat storage slightly, but because of our block strategy, I would expect snappy to do quite well with the repeated data. Because of the duplicated UIDs screw up, we would probably need a true upgrade pass on this. During that pass we would assign a speculative range to messages based on their current timestamp and the timezone we'd already been using. It wouldn't be perfect, but it would be no worse than the sync prior to our fix. We would still run into some trouble in case 1 when the timezone changes. It would be possible to do some extra effort here to handle this, like keeping the messages that we filtered out around in a set for each side. If the UID is in those buckets, we can/should reuse it. === Revisiting "D" More Specifically === As noted at the top, I don't think we'd want to perform extra queries as a regular thing, but it's acceptable if it's a one-off situation. If you have specific thoughts on how to do this, please comment here!

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 16

•

11 years ago

Hm, just remembered that we do have our AccuracyRangeInfo structures stored in _accuracyRanges that are sorta similar to those range structures. The glitch with that is that FolderStorage.markSyncRange() likes to coalesce blocks with FolderStorage.markSyncedToDawnOfTime() nuking us down to one entirely. The actual use-case there is just knowing when to auto-refresh a time range, which is also where any/all complexity in that method lives. I think I need to ponder more about _accuracyRanges, but my gut instinct given our goals here would be to do the range-storing and post-filtering logic since the changes are more localized and breaking the sync ranges could break the auto-refresh logic. Gut-wise I think for conversations I would fold that info into the sync-range info for the dumb IMAP case. For CONDSTORE and QRESYNC capable servers I think we would just apply all changes to all known messages in a single go. (Well, CONDSTORE and deletion-detection is a different issue, that might be a secondary pass that uses the syncrange/UID list entry in a windowed fashion, but conceptually it'd be part of the atomic sync.)

Marcus Cavanaugh [:mcav]

Assignee

Comment 17

•

11 years ago

(In reply to Andrew Sutherland [:asuth] from comment #15) Most of this makes sense; I'm going to parrot some things back to make sure I understand. Problem Re-Restatement: When we ask the server for messages, it generally returns the right messages, but different servers might return messages several hours before and after the range. Because we only partially compensate for this (inferring the server timezone), messages sometimes fall into the cracks, causing us to delete or duplicate them when they don't match up with the messages we expect. The solution you recommend is to just stop worrying about _why_ each server returns things at the end of the ranges. Rather, we just accept that the server returns messages with its own internally-consistent rules. We store the list of messages returned for each search query, so that we have a mapping of { range -> uids }, as you say here: > - Jan 1-8, UIDS: 5, 7, 9 > - Jan 9-16, UIDS: 11, 13, 15 We can merge those groups together as-needed to answer "Jan 1-16"; to get smaller chunks, we'd have to run queries on partitions of those groups. -- Solution: Syncing involves us comparing the set of messages returned from a server range with the set of messages we have locally, and syncing the differences. So, we start storing each message's "search range" locally, so we know which bucket it will fall under. Then, syncing could be done as follows: 1. Expand our database search to grab some extra messages at the beginning and end of the search range, because we want to get all local messages whose "search range" matches what the server returns; and because we don't store a separate index of "search range -> message", expanding the database query by a day on each side would allow us to effectively find a superset of all messages matching the given search range. 2. Filter out the database messages we found whose search range doesn't line up with the current range. 3. Then, run the server query, compare, and sync as normal, annotating new messages with the range in which they were found. --- You wrote: > Because of the duplicated UIDs screw up, we would probably need a true > upgrade pass on this. What do you mean by "duplicated UIDs screw up"? It seems that we could lazily infer a default "expected sync range" when looking at messages which have not been annotated with one; I'm not sure why we'd need to do that eagerly. > We would still run into some trouble in case 1 when the timezone changes. > It would be possible to do some extra effort here to handle this, like > keeping the messages that we filtered out around in a set for each side. If > the UID is in those buckets, we can/should reuse it. In other words, we reserve these two sets of "set aside" messages to handle cases where the server returns messages we didn't think would show up in this sync range; rather than screwing up by treating them as new messages, we can just infer that the server decided to start bucketing them into a new bucket, and start treating the message as coming from the new sync bucket.

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 18

•

11 years ago

(In reply to Marcus Cavanaugh [:mcav] (MoCo SF) from comment #17) > Problem Re-Restatement: Exactly. > Solution: Precisely. > You wrote: > > > Because of the duplicated UIDs screw up, we would probably need a true > > upgrade pass on this. > > What do you mean by "duplicated UIDs screw up"? It seems that we could > lazily infer a default "expected sync range" when looking at messages which > have not been annotated with one; I'm not sure why we'd need to do that > eagerly. Some existing account databases will have duplicated messages in them and we need to deal with this. I was thinking it might be overkill to blow away the contents of the database, and it might be ugly to have to make the sync logic itself deal with this, leaving an upgrade as the best solution. I don't have a strong preference on the best way to deal with this, it's your call on this. (Including blowing away the database, although preferably just nuking on a per-folder basis so we can avoid blowing away the drafts folder). > > We would still run into some trouble in case 1 when the timezone changes. > > It would be possible to do some extra effort here to handle this, like > > keeping the messages that we filtered out around in a set for each side. If > > the UID is in those buckets, we can/should reuse it. > > In other words, we reserve these two sets of "set aside" messages to handle > cases where the server returns messages we didn't think would show up in > this sync range; rather than screwing up by treating them as new messages, > we can just infer that the server decided to start bucketing them into a new > bucket, and start treating the message as coming from the new sync bucket. Precoisely. The question is really just whether the complexity is worth it in terms of implementation and tests. It seems like it would just amount to a set-check in the cases where we're thinking the message is deleted, which isn't so bad. From a testing perspective, I not-so-long-ago made it so that the imapDaemon knows its tzOffsetMillis at startup, letting it control the timezone it thinks it is in. With a minor update to let this be updated on the fly, this could probably allow for testing this fairly easily (just need a 'slog' to indicate that we're triggering that code-path.)

Flags: needinfo?(bugmail)

Marcus Cavanaugh [:mcav]

Assignee

Comment 19

•

11 years ago

Positing another scenario to confirm we're on the same page: Say we do the following sync, and get some messages. SYNC [Jan 1, null) RETURNS { id: 1, date: Jan 4 } { id: 2, date: Jan 8 } { id: 3, date: Jan 10 } and then later, a sync starting farther in the future. SYNC [Jan 5, null) RETURNS { id: 2, date: Jan 8 } { id: 3, date: Jan 10 } Our database will pull all three messages (due to adding +/- a day): { id: 1, syncRange: [Jan 1, null], date: Jan 4 } { id: 2, syncRange: [Jan 1, null], date: Jan 8 } { id: 3, syncRange: [Jan 1, null], date: Jan 10 } However, we don't have enough info to differentiate between the following two cases: A) Message 1 was deleted, or B) Message 1 would have appeared in the bucket of [Jan 1, Jan 5) instead. So, we must: 1. do a SEARCH for [Jan 1, Jan 5) 2. Save the new narrower syncRange for each of those messages in the DB Let's say that Message 1 was in the first bucket, leaving us with the following info: { id: 1, syncRange: [Jan 1, Jan 5], date: Jan 4 } { id: 2, syncRange: [Jan 1, null], date: Jan 8 } { id: 3, syncRange: [Jan 1, null], date: Jan 10 } Now we can intersect those with [Jan 5, null), leaving us to correctly compare #2 and #3. -- But the situation could be more complex in a different scenario, if the syncRange of our messages vary: { id: 1, syncRange: [Jan 1, null], date: Jan 4 } { id: 2, syncRange: [Jan 3, null], date: Jan 8 } { id: 3, syncRange: [Jan 4, null], date: Jan 10 } In this case, when trying to sync [Jan 5, null], we'd have to do several searches, for the same rationale as above: [Jan 1, Jan 3) [Jan 3, Jan 4) [Jan 4, Jan 5) ... which is messy, though feasible. Does this match your thinking or has my logic gotten derailed?

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 20

•

11 years ago

An excellent and clear presentation here for discussion, thank you! I was responding inline, which was really helpful for my thoughts, but my response started to get confusing as I jumped around. == Assumptions A core simplifying assumption our sync engine already makes, and I think we should continue to run with: - We only care about whether a message is deleted if we would be showing it to the user and it could be a lie. == Example: Initial Sync Starting from the first scenario: (In reply to Marcus Cavanaugh [:mcav] (MoCo SF) from comment #19) > SYNC [Jan 1, null) RETURNS > { id: 1, date: Jan 4 } > { id: 2, date: Jan 8 } > { id: 3, date: Jan 10 } I'm assuming this is the result of an initial sync where we arbitrarily picked Jan 1st and then our sync heuristics decided that was "enough" messages. (If these were all the messages in the folder, our sync would have just done "SINCE 1-Jan-1990".) == Example: Refresh Sync > and then later, a sync starting farther in the future. > > SYNC [Jan 5, null) RETURNS > { id: 2, date: Jan 8 } > { id: 3, date: Jan 10 } If we've decided to give up on the Jan 1 sync range, then it probably happened because we're doing a refresh sync and our heuristic is "the UI wants N messages, so walk back N messages, take the timestamp of that message, and roundToPastUTCMidnight() to determine the time range." So this sync would actually be keyed off message 1, which we must have reported to the user. Although not covered in this example, there is inherently always going to be the risk of ambiguity for the search partition point we choose. But since all we care about are the messages we show the user, it makes sense for us to selfishly choose the search point so that there is no ambiguity for the message we chose. Because of our day slop thing, we'd choose [Jan 3, null) instead. No ambiguity for message 1. Win! But let's assume there was also a { id: 7, date: Jan 3 } because that can absolutely be a thing. It's in the ambiguity range and it didn't get reported. Instead of marking the message deleted, we'd mark its syncRange as [Jan 1, Jan 3) and don't deal with it right now. But we've provided enough information to the next sync that crosses it to know that it can infer deletion if it doesn't see the message in that sync range. == Example: Grow slice backwards based on that refresh So let's pretend that. The user scrolls/we prefetch/whatever, and now we fetch just that message from the database. It's worth noting that all the messages on the UTC day of Jan 3rd should have had their range updated because of that previous sync. They should *all* now have a range that's either [Jan 3, whatever) or [whatever, Jan 3). In fact, let's add some more { id: 7, syncRange: [Jan 1, Jan 3), date: Jan 3 } // didn't hear about, gets reported as Jan 2nd. { id: 8, syncRange: [Jan 1, Jan 3), date: Jan 3 } // didn't hear about, got deleted { id: 9, syncRange: [Jan 3, null), date: Jan 3 } // we did hear about this in SINCE Jan 3 // NB: I really want to do a "seven ate nine" thing, but it seems like that doesn't help. and we'll also have message 1 again in the db window because of +/- 1: { id: 1, syncRange: [Jan 3, null), date: Jan 4 } == Changes that weaken invariants but we use them differently and things still work? So, our current sync algorithm has the hypothetical behaviour that FolderStorage tells the (IMAP) FolderSyncer a UTC date-range like [Jan 3, null) and the FolderStorage will bring all of the messages in that part of the DB up to date. It is the FolderStorage that (based on headers we're reporting to the UI) decides the dates we're dealing with, tracks the accuracy ranges, subtracts the already up-to-date range off, and tells _lazySyncDataRange what that resulting date range. It then goes and does what it has to do. As I propose above, FolderStorage looked at the oldest header with a timestamp of Jan 4 and then it told FolderSyncer something, and together they somehow decided to do a server query of [Jan 3, null). (It's important to note that all of the timestamp juggling FolderStorage._growSlice does is just for the benefit of IMAP; no other sync protocol cares at all.) We want the logic in this grow case to pick a sync range of [whatever, Jan 3). If we assume it was FolderStorage that saw the header and grew the date range to Jan 3rd, this should "just work" with the existing code for this scenario, because it will tell FolderSyncer [whatever, Jan 3). This seems like it might work then? Message 7 will get reported as existing, so we'll be happy about that. Message 8 will not get reported, and because of the intersection, we should be able to know that we *must* infer deletion. Message 9 won't be reported, but it doesn't overlap anyways. Maybe we win? My brain definitely needs time to stew on this more, but I feel that with the revised date-range window heuristics and this logic, we avoid the need to perform disambiguating searches. (Since we just push any ambiguity out into time ranges we don't currently care about.) My apologies for this being somewhat brain-dumpy. I'll endeavour to come back and provide a more clear statement after the brain stew finishes. == Edge-cases that currently don't really matter? One of the complicated things that could happen *prior to our switch to the virtual list implementation* was that if we scrolled down far enough, we would shrink the slice so that it would no longer include the most recent messages. This could then result in us growing into the future and needing to do a refresh. I haven't analytically thought about this, but... this can also no longer happen with the virtual list. We never shrink the slice so it doesn't include "now". And we are *NOT* changing this for v2.2. The result is that if a user wants to see updates to anything they're looking at, one of two things needs to happen: 1) They hit refresh, and we trigger a refresh over the current slice which stretches from [oldest sync date so far, now) 2) Periodic background sync goes and opens a new slice, this extends back a finite amount and is like if the user had switched folders and back again, resulting in a smaller refresh sync. So, maybe we still win?

Flags: needinfo?(bugmail)

Marcus Cavanaugh [:mcav]

Assignee

Comment 21

•

10 years ago

Attached file Github Pull Request (GELAM) — Details

Deets in PR summary. GELAM tests pass locally when the segfault fairies are on my side. Tested on a Flame and things still work.

Attachment #8556566 - Flags: review?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 22

•

10 years ago

Comment on attachment 8556566 [details] [review] Github Pull Request (GELAM) Awesome cleanup/refactoring and tests! I'm really looking forward to the new test infrastructure! I've pushed some review fixes to your branch (in the mozilla-b2g org instead of in your repo?) that you need to criss-cross review. I think all of my fixes make sense, but please do sanity check me and also ideally do some on-device testing. And be wary of bug 1128285; I've been seeing it a lot now, but haven't been able to look into it yet. I probably won't be able to look into it until Tuesday because of downloads stuff, so don't worry about duplicating work there! A particularly good testing trick is to load more messages a bit and then scroll all the way down and do a refresh. In a sufficiently busy folder this will trigger a bisection sync. Which, uh, unfortunately also violates the bit where I flat-out said we only sync present to past... but bisection actually results in a past-to-present stepping. However, it also turns out even if we fixed this aspect of bisection, just general multi-session use could result in the same behaviour anyways. See more below. The 3 commits do: 1) Add some comments to the test to help me better understand it, and clean up some timezone warp issues in other IMAP cases. I had been deeply hand-waving with the timezone stuff originally on those tests. I think I've fixed them now since we're UTC consistent now and I made sure to have the IMAP fake-server also operate in the UTC timezone. Without these extra TZ changes some of the affected tests failed when I set my computer to Taipei time. 2) Avoid heisenberg cats/messages by extending the sync date range by an extra day so that there is no ambiguity about the messages we report to the UI. I also enhanced the test to enable refresh ranges and do appropriate time-warping so that the accuracy-range set logic will result in the 'grow' calls (effectively) use a moving window instead of just growing the window. 3) Don't union disjoint ranges and mistakenly infer deletion. More details in the commit message and comments on this. PS: I apologize for crufting up the new clean test a bit with the old th_main stuff :). Feel free to clean up the test. Commit-wise, for my sanity if we need to revisit this, it's probably best if we don't squash any of my GELAM commits. Do feel free to land if you r+ my stuff. Please re-request review if you make any control-flow/logic changes.

Attachment #8556566 - Flags: review?(m)

Attachment #8556566 - Flags: review?(bugmail)

Attachment #8556566 - Flags: review+

Marcus Cavanaugh [:mcav]

Assignee

Comment 23

•

10 years ago

Comment on attachment 8556566 [details] [review] Github Pull Request (GELAM) Thanks for the thorough review; the comments are going to be super helpful when we (intentionally or otherwise) forget about these horrors. And good catch on the non-contiguous ranges fix, too; I remember that possibility crossing my mind before, but I had erroneously convinced myself that it wouldn't happen. No post-review fixes on my end. Landed in GELAM (unsquashed to preserve history): https://github.com/mozilla-b2g/gaia-email-libs-and-more/commit/e359307af6e2727912fcee3079fcd0d84424c378

Attachment #8556566 - Flags: review?(m) → review+

GitHub Autolander Bot

Comment 24

•

10 years ago

Attached file [PullReq] mcav:land-internaldate to mozilla-b2g:master — Details

Marcus Cavanaugh [:mcav]

Assignee

Comment 25

•

10 years ago

Comment on attachment 8557991 [details] [review] [PullReq] mcav:land-internaldate to mozilla-b2g:master Carrying over r+ from GELAM attachment, to see if autolander will accept a self-signed review.

Attachment #8557991 - Flags: review+

Marcus Cavanaugh [:mcav]

Assignee

Updated

•

10 years ago

Keywords: checkin-needed

GitHub Autolander Bot

Updated

•

10 years ago

Keywords: checkin-needed

GitHub Autolander Bot

Comment 26

•

10 years ago

Pull request has landed in master: https://github.com/mozilla-b2g/gaia/commit/9130d070d61bb00490676200e0d54fa1d21f79fd

GitHub Autolander Bot

Updated

•

10 years ago

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 27

•

10 years ago

I think we should let this bake on trunk for a little bit, but then I think we absolutely want this up on v2.2. We may also want to see if we can get it on v2.1.

status-b2g-v2.0: --- → wontfix

status-b2g-v2.1: --- → affected

status-b2g-v2.2: --- → affected

status-b2g-master: --- → fixed

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 28

•

10 years ago

Comment on attachment 8557991 [details] [review] [PullReq] mcav:land-internaldate to mozilla-b2g:master [Approval Request Comment] [Bug caused by]: This bug has always existed. [User impact] if declined: Who is impacted: IMAP accounts where we failed to correctly predict the timezone of the server and/or where the server interprets the spec in a different way than dovecot and friends. What happens: - Messages may be temporarily or permanently hidden from the user, depending on complicated stuff. - Messages may be duplicated, depending on complicated stuff. The affected messages are messages where the server reports the message on a different day than we expect it to report the message. This happens to messages around the boundary of our sync interval after our timezone adjustment. This is more likely to occur for messages received around midnight, and for accounts with high message density. This is more likely to be observed by users who scroll further back into time. If they don't scroll to the bug boundary region, it's less to happen or (unsurprisingly) for the user to notice. [Testing completed]: Excellent tests, on device-testing, b2g-desktop testing. This has been baking on trunk for over 2 weeks because we are just that serious about quality. So serious that we haven't come up with a good joke to put here in all that time, because we're just busy being that serious. [Risk to taking this patch]: Given the baking and testing, this is considered low-risk. [String changes made]: No user-facing string changes. (Developer debugging has gotten a little more chatty.)

Attachment #8557991 - Flags: approval-gaia-v2.2?

bhavana bajaj [:bajaj]

Updated

•

10 years ago

Attachment #8557991 - Flags: approval-gaia-v2.2? → approval-gaia-v2.2+

Ryan VanderMeulen [:RyanVM]

Comment 29

•

10 years ago

v2.2: https://github.com/mozilla-b2g/gaia/commit/ea86e2a952fed5db3f8f1f56edc93f1c90d0aca6

status-b2g-v2.2: affected → fixed

Target Milestone: --- → 2.2 S5 (6feb)

Nobody; OK to take it and work on it

Updated

•

10 years ago

blocking-b2g: backlog → ---

Github Pull Request (GELAM) 10 years ago Marcus Cavanaugh [:mcav] 64 bytes, text/x-github-pull-request	asuth : review+ mcav : review+	Details \| Review
[PullReq] mcav:land-internaldate to mozilla-b2g:master 10 years ago GitHub Autolander Bot 46 bytes, text/x-github-pull-request	mcav : review+ bajaj : approval-gaia-v2.2+	Details \| Review