Closed Bug 972100 Opened 6 years ago Closed 6 years ago

Sync is making an additional info/collections request

Categories

(Firefox :: Sync, defect, P2)

All
macOS
defect

Tracking

()

VERIFIED FIXED
mozilla32
Tracking Status
firefox29 --- wontfix
firefox30 - affected
firefox31 - affected
firefox32 --- verified

People

(Reporter: rnewman, Assigned: markh)

References

Details

(Whiteboard: [qa!])

Attachments

(2 files)

Attached file Log
It *should* fetch i/c, then m/g, then crypto/keys, then each collection.

Instead, it's fetching i/c, m/g, i/c, c/k, i/c, m/g, i/c, i/c, m/g, then each collection.

Log attached.

Testing on 2014-02-10 Nightly.

Can someone else verify this? It'll have some perf impact and perhaps consistency problems if it's reproducible, and if it's not it's weird.
Whiteboard: [qa+]
(In reply to Richard Newman [:rnewman] from comment #0)
> Can someone else verify this?

Kinda :)

What I see is the following:

https://sync-1-us-east-1.sync.services.mozilla.com/1.5/42871/info/collections - sync started, verifyLogin does: this.resource(this.infoURL).get();
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/42871/storage/meta/global
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/42871/info/collections -> still in verifyLogin - this._remoteSetup()
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/42871/storage/crypto/keys
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/42871/info/collections - EngineSynchronizer.sync()
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/42871/storage/meta/global
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/42871/storage/clients?full=1

And best I can tell, that's actually the least the code can do.  Somewhat strangely, verifyLogin causes 2 such calls - one via an explicit fetch of this.infoURL, and the other via _remoteSetup() - I say "strangely" as _remoteSetup() takes an optional param which is an infoResponse, but verifyLogin() doesn't pass it even though it has that response - see the attached patch.

For the first sync in your log, I see:

https://sync-2-us-east-1.sync.services.mozilla.com/1.5/17300/info/collections - wrong UID - bug 973749 
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/17291/info/collections - worked - verifyLogin doing .get
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/17291/storage/meta/global
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/17291/info/collections - verifyLogin calling this._remoteSetup
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/17291/storage/crypto/keys
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/17291/info/collections - EngineSynchronizer.sync()
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/17291/storage/meta/global
// and here things also differ from mine: 
// 1392251257400	Sync.Service	INFO	Sync IDs differ. Local is OoPwx_ycqEJi, remote is V852tSwDKqmp
// The line above is causing us to do *another* verifyLogin, which as above, causes 2 /collections requests
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/17291/info/collections
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/17291/info/collections
// and then it looks like mine.
https://sync-1-us-east-1.sync.services.mozilla.com/1.5/17291/storage/meta/global

This attachment prevents verifyLogin causing 2 such requests - any thoughts on that?  Also, any insights into exactly what "Sync IDs differ." mean?  I *guess* it has to do with syncing via a different user, which your log implies did actually happen - in which case this sounds "normal"...
Attachment #8378094 - Flags: feedback?(rnewman)
Assignee: nobody → mhammond
Comment on attachment 8378094 [details] [diff] [review]
Pass the info response to _remoteSetup

Review of attachment 8378094 [details] [diff] [review]:
-----------------------------------------------------------------

This implies that a fat swath of code hasn't been exercised for a very long time.

Before this lands we need to make sure that this clause:

    // Checking modified time of the meta record.
    if (infoResponse &&

works. Other than that, you win the "putting the chalk cross in the right place" award!
Attachment #8378094 - Flags: feedback?(rnewman) → feedback+
Comment on attachment 8378094 [details] [diff] [review]
Pass the info response to _remoteSetup

Review of attachment 8378094 [details] [diff] [review]:
-----------------------------------------------------------------

> Before this lands we need to make sure that this clause:
> // Checking modified time of the meta record.
> if (infoResponse &&

This appears to have test coverage via test_service_sync_remoteSetup.js - although remoteSetup() isn't directly called with an info response via that test, the test arranges to take every branch in the block in question - ie, during that test it both enters and does not enter the block:

    if (infoResponse &&
        (infoResponse.obj.meta != this.metaModified) &&
        (!meta || !meta.isNew)) {

Inside that block it also takes both paths in the inner condition:

      if (!this.recordManager.response.success || !newMeta) {

Is there some other explicit testing necessary?  Requesting review on the off-chance these tests are already suitable :)
Attachment #8378094 - Flags: review?(rnewman)
(In reply to Mark Hammond [:markh] from comment #3)

> Is there some other explicit testing necessary?  Requesting review on the
> off-chance these tests are already suitable :)

Is TPS working yet? (Even if only for Sync 1.1.)

Scenarios that need testing:

* Routine syncing
* Sync, then upload a new meta/global remotely, then sync again
* Sync such that meta/global needs to be changed at the end of the sync (enable new engine?), then sync again.

Not easy to test the failure cases, alas.
Priority: -- → P1
Comment on attachment 8378094 [details] [diff] [review]
Pass the info response to _remoteSetup

Review of attachment 8378094 [details] [diff] [review]:
-----------------------------------------------------------------

I'm going to r+ this, but we *must* have QA do manual testing here before uplift or merge. Even TPS won't adequately cover these cases.
Attachment #8378094 - Flags: review?(rnewman) → review+
Adding some QA to look at this:
General: edwong
Desktop: tracy
Android: AaronMT
markh, remember to run xpcshell-test before landing stuff ;P

Looking at that log:

21:10:52     INFO -  Sync.Service	DEBUG	Caching URLs under storage user base: http://localhost:54222/1.1/johndoe/
21:10:52     INFO -  Sync.Test.Server	DEBUG	SyncServer: Handling request: GET /1.1/johndoe/info/collections
21:10:52     INFO -  Sync.Resource	DEBUG	mesg: GET success 200 http://localhost:54222/1.1/johndoe/info/collections
21:10:52     INFO -  Sync.Resource	DEBUG	GET success 200 http://localhost:54222/1.1/johndoe/info/collections
21:10:52     INFO -  Sync.Service	DEBUG	Fetching global metadata record
21:10:52     INFO -  Sync.Test.Server	DEBUG	SyncServer: Handling request: GET /1.1/johndoe/storage/meta/global
21:10:52     INFO -  Sync.Resource	DEBUG	mesg: GET fail 401 http://localhost:54222/1.1/johndoe/storage/meta/global
21:10:52     INFO -  Sync.Resource	DEBUG	GET fail 401 http://localhost:54222/1.1/johndoe/storage/meta/global
21:10:52     INFO -  Sync.Service	DEBUG	Clearing cached meta record. metaModified is 1393909850.75, setting to undefined
21:10:52     INFO -  Sync.Test.Server	DEBUG	SyncServer: Handling request: GET /1.1/johndoe/storage/meta/global
21:10:52     INFO -  Sync.Resource	DEBUG	mesg: GET fail 401 http://localhost:54222/1.1/johndoe/storage/meta/global
21:10:52     INFO -  Sync.Resource	DEBUG	GET fail 401 http://localhost:54222/1.1/johndoe/storage/meta/global

This shouldn't occur. If we get a 401 during a sync, it's not just a failure, it's an auth failure, and we shouldn't just clear the cache and try again.

21:10:52     INFO -  Sync.Service	DEBUG	No meta/global record on the server. Creating one.
21:10:52     INFO -  Sync.Resource	DEBUG	PUT Length: 78

... and this is definitely not the right response to a 401!

If the FxA code is doing the same thing -- and it probably is -- then we'll be doing really nutty things in response to clock skew errors.

File a separate bug, Mark?
Is there an update here?
Priority: P1 → P2
I think we've determined it is only 1 additional request, but we might as well try and help services out by getting this fixed for 30.
Summary: Sync is making an insane number of info/collections and meta/global requests → Sync is making an additional info/collections request
hrmph - new try with the exact same patch worked: https://tbpl.mozilla.org/?tree=Try&rev=66469d959a44.

Henrik, are you able to run TPS with this patch and ensure it doesn't cause problems in that environment?
Flags: needinfo?(hskupin)
Attachment #8375205 - Attachment mime type: text/x-log → text/plain
Flags: needinfo?(hskupin)
I don't know all the details about this patch and the underlying sync code, but checking the trace output while running tps tests all is looking fine to me. I don't see any major breakage beside the usual intermittent failures we still have.
Thanks Henri - so given that and the green try...

https://hg.mozilla.org/integration/fx-team/rev/0a127bfc0c74
https://hg.mozilla.org/mozilla-central/rev/0a127bfc0c74
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Ok if this is still wanted for uplift all the way to Beta, let's make sure we have some QA on it first - Tracy, Aaron, edwong again as per comment 6
Flags: needinfo?(twalker)
Flags: needinfo?(edwong)
Flags: needinfo?(aaron.train)
(In reply to Lukas Blakk [:lsblakk] from comment #16)
> Ok if this is still wanted for uplift all the way to Beta, let's make sure
> we have some QA on it first - Tracy, Aaron, edwong again as per comment 6

TBH I think it would be fine if this only made it to 31 - its not a user-visible bug.
Aaron and Tracy, if ya'll could do some smoke testing on this... I'll do a sweep on Mac Fx Nightly as well.
Note: as per comment #4, the failure case is not easy to test.  

I ran the scenarios in comment #4.  No issues.Given that and comment #13, marking this verified.
Status: RESOLVED → VERIFIED
Flags: needinfo?(twalker)
User impact description in bug 972100 comment 17 suggests we don't need to track this. We can still uplift given an approval request.
Karl, can you take do some smoke testing on android nightly, verify this as fixed, then we can uplift this.
Flags: needinfo?(kthiessen)
Flags: needinfo?(edwong)
Flags: needinfo?(aaron.train)
I will smoke test this on Android Nightly today.
Flags: needinfo?(kthiessen)
Parts of this are really difficult to test.  However, with the same caveats that :tracy expressed in comment #19, I think this is ready to go.
per comment 19
Keywords: qawanted
Whiteboard: [qa+] → [qa!]
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.